sqoop 常用脚本整理

Sqoop是一款开源的工具,主要用于在HADOOP(Hive)与传统的数据库(mysql、postgresql…)间进行数据的传递,可以将一个关系型数据库(例如 : MySQL ,Oracle ,Postgres等)中的数据导进到Hadoop的HDFS,NOSQL中,也可以将HDFS的数据导进到关系型数据库中。
Sqoop项目开始于2009年,最早是作为Hadoop的一个第三方模块存在,后来为了让使用者能够快速部署,也为了让开发人员能够更快速的迭代开发,Sqoop独立成为一个Apache项目。
sqoop文章

1、sqoop help

  $ sqoop help
  15/07/10 16:04:23 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.2.0
  usage: sqoop COMMAND [ARGS]

  Available commands:
    codegen            Generate code to interact with database records
    create-hive-table  Import a table definition into Hive
    eval               Evaluate a SQL statement and display the results
    export             Export an HDFS directory to a database table
    help               List available commands
    import             Import a table from a database to HDFS
    import-all-tables  Import tables from a database to HDFS
    import-mainframe   Import datasets from a mainframe server to HDFS
    job                Work with saved jobs
    list-databases     List available databases on a server
    list-tables        List available tables in a database
    merge              Merge results of incremental imports
    metastore          Run a standalone Sqoop metastore
    version            Display version information

  See 'sqoop help COMMAND' for information on a specific command.

2、sqoop command —help

  $ sqoop export --help
    15/07/10 16:10:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.2.0
    usage: sqoop export [GENERIC-ARGS] [TOOL-ARGS]

    Common arguments:
       --connect <jdbc-uri>                         Specify JDBC connect
                                                    string
       --connection-manager <class-name>            Specify connection manager
                                                    class name
       --connection-param-file <properties-file>    Specify connection
                                                    parameters file
       --driver <class-name>                        Manually specify JDBC
                                                    driver class to use
       --hadoop-home <hdir>                         Override
                                                    $HADOOP_MAPRED_HOME_ARG
       --hadoop-mapred-home <dir>                   Override

  .....省略.........

3、rdbms data to hdfs

  sqoop export --connect jdbc:oracle:thin:@hostname:orcl --username tmp - -password test --table test --fields-terminated-by "\001" --input- lines-terminated-by '\n' --input-null-string '\\N' --input-null-non-string   '\\N' --verbose --export-dir /user/hive/warehouse/bigdata.db/test

*报错由于jdk版本1.8:
Error: TMP_GNJSP_4_S : Unsupported major.minor version 52.0

导数据实例

列表oracle下的所有表

  $ sqoop list-tables --connect jdbc:oracle:thin:@hostname:1521:ORCL --username pu_test --password test|grep f_1

创建目录

  $ hadoop fs -mkdir /user/hadoop/Interface/DATA_MONTH/f_1

通过oraoop的oracle优化插件导数据,提升效率

  $ sqoop import -D oraoop.jdbc.url.verbatim=true --connect jdbc:oracle:thin:@hostnmae:ORCL --username PU_TEST --password test --table f_1 --delete-target-dir --target-dir /user/hadoop/Interface/DATA_MONTH/f_1 --fetch-size 5000 --fields-terminated-by '\001' --lines-terminated-by '\n' --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N' -m 9

4、hdfs data to rdbms

导数据实例

list oracle all data console:

sqoop list-tables —connect jdbc:oracle:thin:@135.33.6.59:1521:fxgk —username pu_wt —password tydic

   $ cat sq_test.sh 
   export LANG="zh_CN.UTF-8"
   CONNECTURL=jdbc:oracle:thin:@hostname:ORCL
   ORACLENAME=test
   ORACLEPASSWORD=test
   oralceTableName=testaction
   columns=a_id,a_name,t_type_id,t_type_name
   comTerminated='\t'
   hdfsPath=/apps/hive/test.db/tb_dim_action/

   sqoop export --connect $CONNECTURL --username $ORACLENAME --password   $ORACLEPASSWORD --export-dir $hdfsPath  --num-mappers 1 --table   $oralceTableName --columns $columns --input-fields-terminated-by    ${comTerminated}

hdfs to oracle

  sqoop export --connect jdbc:oracle:thin:@hostname:ORCL --username test --password test --table ta_tmp -fields-terminated-by "\001" --input-lines-terminated-by '\n' --input-null-string '\\N' --input-null-non-string '\\N' --export-dir /apps/hive/test.db/ta_tmp

sqoop mr poll

 -Dmapreduce.job.queuename=default

sqoop to parquetfile

create table test(meid string,phone_type string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' STORED AS parquet LOCATION 'hdfs://mycluster/user/etlplat/test';

sqoop import --connect jdbc:oracle:thin:@ip:1521:orcl --username web --password abc --table xxx --delete-target-dir --target-dir /user/xxx/test --fetch-size  5000 --fields-terminated-by '\001' --lines-terminated-by '\n' --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N' --as-parquetfile -m  1

QA

1、SQOOP java.sql.SQLRecoverableException: IO Error: Connection reset
manager.SqlManager: Error executing statement:

# 不生效不知道为何
-D mapred.child.java.opts="\-Djava.security.egd=file:/dev/../dev/urandom" 

# $JAVA_HOME/jre/lib/security目录下的java.security文件,修改securerandom.source=file:/dev/../dev/urandom解决

vi ./usr/java/jdk1.7.0_67-cloudera/jre/lib/security/java.security
securerandom.source=file:/dev/../dev/urandom

2、Sqoop import job is failing with java.lang.OutOfMemoryError: unable to create new native thread

echo '* - nofile 32768' >> /etc/security/limits.conf 
echo '* - nproc 65536' >> /etc/security/limits.conf

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_installing_manually_book/content/ref-729d1fb0-6d1b-459f-a18a-b5eba4540ab5.1.html

http://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html

原创文章,转载请注明: 转载自Itweet的博客
本博客的文章集合: http://www.itweet.cn/blog/archive/