Sqoop是一款开源的工具,主要用于在HADOOP(Hive)与传统的数据库(mysql、postgresql…)间进行数据的传递,可以将一个关系型数据库(例如 : MySQL ,Oracle ,Postgres等)中的数据导进到Hadoop的HDFS,NOSQL中,也可以将HDFS的数据导进到关系型数据库中。
Sqoop项目开始于2009年,最早是作为Hadoop的一个第三方模块存在,后来为了让使用者能够快速部署,也为了让开发人员能够更快速的迭代开发,Sqoop独立成为一个Apache项目。
sqoop文章

1、sqoop help

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ sqoop help
15/07/10 16:04:23 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.2.0
usage: sqoop COMMAND [ARGS]

Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information

See 'sqoop help COMMAND' for information on a specific command.

2、sqoop command –help

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ sqoop export --help
15/07/10 16:10:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.2.0
usage: sqoop export [GENERIC-ARGS] [TOOL-ARGS]

Common arguments:
--connect <jdbc-uri> Specify JDBC connect
string
--connection-manager <class-name> Specify connection manager
class name
--connection-param-file <properties-file> Specify connection
parameters file
--driver <class-name> Manually specify JDBC
driver class to use
--hadoop-home <hdir> Override
$HADOOP_MAPRED_HOME_ARG
--hadoop-mapred-home <dir> Override

.....省略.........

3、rdbms data to hdfs

1
sqoop export --connect jdbc:oracle:thin:@hostname:orcl --username tmp - -password test --table test --fields-terminated-by "\001" --input- lines-terminated-by '\n' --input-null-string '\\N' --input-null-non-string   '\\N' --verbose --export-dir /user/hive/warehouse/bigdata.db/test

*报错由于jdk版本1.8:
Error: TMP_GNJSP_4_S : Unsupported major.minor version 52.0

导数据实例

列表oracle下的所有表

1
$ sqoop list-tables --connect jdbc:oracle:thin:@hostname:1521:ORCL --username pu_test --password test|grep f_1

创建目录

1
$ hadoop fs -mkdir /user/hadoop/Interface/DATA_MONTH/f_1

通过oraoop的oracle优化插件导数据,提升效率

1
$ sqoop import -D oraoop.jdbc.url.verbatim=true --connect jdbc:oracle:thin:@hostnmae:ORCL --username PU_TEST --password test --table f_1 --delete-target-dir --target-dir /user/hadoop/Interface/DATA_MONTH/f_1 --fetch-size 5000 --fields-terminated-by '\001' --lines-terminated-by '\n' --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N' -m 9

4、hdfs data to rdbms

导数据实例

list oracle all data console:

sqoop list-tables –connect jdbc:oracle:thin:@135.33.6.59:1521:fxgk –username pu_wt –password tydic

1
2
3
4
5
6
7
8
9
10
11
$ cat sq_test.sh 
export LANG="zh_CN.UTF-8"
CONNECTURL=jdbc:oracle:thin:@hostname:ORCL
ORACLENAME=test
ORACLEPASSWORD=test
oralceTableName=testaction
columns=a_id,a_name,t_type_id,t_type_name
comTerminated='\t'
hdfsPath=/apps/hive/test.db/tb_dim_action/

sqoop export --connect $CONNECTURL --username $ORACLENAME --password $ORACLEPASSWORD --export-dir $hdfsPath --num-mappers 1 --table $oralceTableName --columns $columns --input-fields-terminated-by ${comTerminated}

hdfs to oracle

1
sqoop export --connect jdbc:oracle:thin:@hostname:ORCL --username test --password test --table ta_tmp -fields-terminated-by "\001" --input-lines-terminated-by '\n' --input-null-string '\\N' --input-null-non-string '\\N' --export-dir /apps/hive/test.db/ta_tmp

sqoop mr poll

1
-Dmapreduce.job.queuename=default

sqoop to parquetfile

1
2
3
create table test(meid string,phone_type string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' STORED AS parquet LOCATION 'hdfs://mycluster/user/etlplat/test';

sqoop import --connect jdbc:oracle:thin:@ip:1521:orcl --username web --password abc --table xxx --delete-target-dir --target-dir /user/xxx/test --fetch-size 5000 --fields-terminated-by '\001' --lines-terminated-by '\n' --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N' --as-parquetfile -m 1

QA

1、SQOOP java.sql.SQLRecoverableException: IO Error: Connection reset
manager.SqlManager: Error executing statement:

1
2
3
4
5
6
7
# 不生效不知道为何
-D mapred.child.java.opts="\-Djava.security.egd=file:/dev/../dev/urandom"

# $JAVA_HOME/jre/lib/security目录下的java.security文件,修改securerandom.source=file:/dev/../dev/urandom解决

vi ./usr/java/jdk1.7.0_67-cloudera/jre/lib/security/java.security
securerandom.source=file:/dev/../dev/urandom

2、Sqoop import job is failing with java.lang.OutOfMemoryError: unable to create new native thread

1
2
echo '* - nofile 32768' >> /etc/security/limits.conf 
echo '* - nproc 65536' >> /etc/security/limits.conf

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_installing_manually_book/content/ref-729d1fb0-6d1b-459f-a18a-b5eba4540ab5.1.html

http://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html