hive install
1、安装mysql-5.5
** From server2
- (1)、rpm包安装mysql
1
2
3
4
5
6# rpm -qa | grep mysql [查询是否自带mysql]
# rpm -e mysql-libs-5.1.71-1.el6.x86_64 --nodeps [不验证依赖卸载]
# rpm -qa | grep mysql [再次查看卸载是否成功]
# rpm -i MySQL-server-5.5.40-1.linux2.6.x86_64.rpm [安装服务端]
# mysqld_safe & [后台启动,jobs查看]
# rpm -i MySQL-client-5.5.40-1.linux2.6.x86_64.rpm [安装客户端]
or
1 | $ yum install mysql-server |
- (2)、修改数据库配置信息
1 | # mysql_secure_installation |
- (3)、登录mysql数据库验证
1 | # mysql -uroot -padmin |
- (4)创建数据库并授权
1
2
3
4
5
6# mysql -uroot -padmin
mysql> create database hive;
[授权hive在任何位置(%)远程可以登陆]
mysql> grant all on hive.* to 'hive'@'%' identified by 'hive';
mysql> grant all on hive.* to 'hive'@'server2' identified by 'hive';
mysql> flush privileges; [刷新权限]
- (5)使用的是SQLyog数据库可视化软件连接验证
2、hive-0.13.1的安装
- (1)、解压
1
# tar -zxvf apache-hive-0.13.1-bin.tar.gz -C /usr/local/
- (2)、scp hive-0.13.1 [hive安装在server2]
1
# scp -rq hive-0.13.1 server2:/usr/local/
(3)、cp配置文件
1
2
3
4# cp hive-exec-log4j.properties.template hive-exec-log4j.properties
# cp hive-log4j.properties.template hive-log4j.properties
# cp hive-env.sh.template hive-env.sh
# cp hive-default.xml.template hive-site.xml(4)、配置hive-config.sh
1
2
3
4vi /usr/local/hive-0.13.1/bin/hive-config.sh
export JAVA_HOME=/usr/local/jdk1.7.0_45
export HIVE_HOME=/usr/local/hive-0.13.1
export HADOOP_HOME=/usr/local/hadoop-2.4.0
(5)、配置hive-site.xml [mysql做metastore]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97//连接地址,基本配置
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://server2:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
//mysql驱动
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
//用户名
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
//用户密码
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
</property>
//默认数据库位置
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
//hive远程metastore的thrift地址
<property>
<name>hive.metastore.uris</name>
<value>thrift://server2:9083</value>
</property>
//hiveserver2的配置
<property>
<name>hive.support.concurrency</name>
<description>Enable Hive's Table Lock Manager Service</description>
<value>true</value>
</property>
<property>
<name>hive.zookeeper.quorum</name>
<description>Zookeeper quorum used by Hive's Table Lock Manager</description>
<value>server1,server2,server3</value>
</property>
<property>
<name>hive.zookeeper.client.port</name>
<value>2181</value>
<description>The port of zookeeper servers to talk to. This is only needed for read/write locks.</description>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>server2</value>
<description>Bind host on which to run the HiveServer2 Thrift interface. Can be overridden by setting $HIVE_SERVER2_THRIFT_BIND_HOST</description>
</property>
//关闭推测式执行,一些优化项
<property>
<name>hive.mapred.reduce.tasks.speculative.execution</name>
<value>false</value>
</property>
<property>
<name>mapreduce.reduce.speculative</name>
<value>false</value>
</property>
小表mapjoin
<property>
<name>hive.ignore.mapjoin.hint</name>
<value>false</value>
</property>
<property>
<name>hive.mapjoin.smalltable.filesize</name>
<value>500000000</value>
</property>
并行执行
<property>
<name>hive.exec.parallel</name>
<value>true</value>
</property>
<property>
<name>hive.exec.parallel.thread.number</name>
<value>16</value>
</property>
客户端显示
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
关闭自动统计
<property>
<name>hive.stats.autogather</name>
<value>false</value>
</property>(6)、cp mysql-connector-java-5.1.26-bin.jar 到hive-0.13.1/lib
(7)、启动metastore
1
# hive-0.13.1/bin/hive --service metastore & [jobs查看]
- (8)、启动hive
1
# hive-0.13.1/bin/hive
(9)、启动hiveserver2 [jdbc服务]
1
2
3
4
5
6# hive-0.13.1/bin/hive --service hiveserver2 --hiveconf hive.server2.thrift.port=14000 start &
[hsu@server2 ~]$ lsof -i :14000
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 3460 hsu 306u IPv4 29485 0t0 TCP server2:scotty-ft->server1:18243 (ESTABLISHED)
java 3460 hsu 313u IPv4 29484 0t0 TCP server2:scotty-ft (LISTEN)(10)、测试hiveserver2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88使用beeline控制台连接hiveserver2:
$ /usr/local/hive-0.13.1/beeline> !connect jdbc:hive2://server2:14000 -uhsu -phsu org.apache.hive.jdbc.HiveDriver
$ beeline --help
Usage: java org.apache.hive.cli.beeline.BeeLine
-u <database url> the JDBC URL to connect to
-n <username> the username to connect as
-p <password> the password to connect as
-d <driver class> the driver class to use
# hive1.2.1测试
!connect jdbc:hive2://server2:14000 -udefault -nhsu -phsu -dorg.apache.hive.jdbc.HiveDriver
[hsu@server1 ~]$ beeline
Beeline version 1.2.1 by Apache Hive
beeline> !connect jdbc:hive2://server2:14000
Connecting to jdbc:hive2://server2:14000
Enter username for jdbc:hive2://server2:14000:
Enter password for jdbc:hive2://server2:14000:
Error: Failed to open new session: java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: Permission denied: user=anonymous, access=EXECUTE, inode="/tmp":hsu:supergroup:drwx------
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:208)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:171)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6512)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4140)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:838)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:821)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) (state=,code=0)
解决:
[hsu@server2 ~]$ hadoop fs -ls /tmp
Found 2 items
drwx-wx-wx - hsu supergroup 0 2015-08-13 14:57 /tmp/hive
drwx------ - hsu supergroup 0 2015-08-06 13:03 /tmp/hsu
[hsu@server2 ~]$ hadoop fs -chmod 777 /tmp
[hsu@server2 ~]$ hadoop fs -ls /
drwxrwxrwx - hsu supergroup 0 2015-08-13 14:57 /tmp
0: jdbc:hive2://server2:14000> !connect jdbc:hive2://server2:14000
Connecting to jdbc:hive2://server2:14000
Enter username for jdbc:hive2://server2:14000: hsu
Enter password for jdbc:hive2://server2:14000: ***
Connected to: Apache Hive (version 1.2.1)
Driver: Hive JDBC (version 1.2.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
1: jdbc:hive2://server2:14000> create table test(id int);
No rows affected (20.356 seconds)
1: jdbc:hive2://server2:14000>
1: jdbc:hive2://server2:14000> insert into table test values (1), (2), (3);
INFO : Number of reduce tasks is set to 0 since there's no reduce operator
WARN : Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
INFO : number of splits:1
INFO : Submitting tokens for job: job_1439448136822_0001
INFO : The url to track the job: http://server1:23188/proxy/application_1439448136822_0001/
INFO : Starting Job = job_1439448136822_0001, Tracking URL = http://server1:23188/proxy/application_1439448136822_0001/
INFO : Kill Command = /home/hsu/hadoop/bin/hadoop job -kill job_1439448136822_0001
INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
INFO : 2015-08-13 16:00:31,907 Stage-1 map = 0%, reduce = 0%
INFO : 2015-08-13 16:01:22,538 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.56 sec
INFO : MapReduce Total cumulative CPU time: 1 seconds 560 msec
INFO : Ended Job = job_1439448136822_0001
INFO : Stage-3 is selected by condition resolver.
INFO : Stage-2 is filtered out by condition resolver.
INFO : Stage-4 is filtered out by condition resolver.
INFO : Moving data to: hdfs://mycluster/user/hive/warehouse/test/.hive-staging_hive_2015-08-13_15-59-12_208_7322597341621072670-1/-ext-10000 from hdfs://mycluster/user/hive/warehouse/test/.hive-staging_hive_2015-08-13_15-59-12_208_7322597341621072670-1/-ext-10002
INFO : Loading data to table default.test from hdfs://mycluster/user/hive/warehouse/test/.hive-staging_hive_2015-08-13_15-59-12_208_7322597341621072670-1/-ext-10000
1: jdbc:hive2://server2:14000> select * from test;
+----------+--+
| test.id |
+----------+--+
| 1 |
| 2 |
| 3 |
+----------+--+
3 rows selected (0.324 seconds)(11)、hive安装目录加入环境变量中
1 | # vi /etc/profile |
3、hive的语法介绍
- (1) 创建
1 | hive (default)> create table t1(int id); [创建表] |
- (2) 加载数据
1
2hive (default)> load data local inpath '/usr/local/testdata/id' into table t1; [linux本地加载数据到hive,copy数据]
hive (default)> load data local inpath '/usr/local/name' into table t2; [linux本地加载数据到hive,copy数据]
- (3) 删除表
1 | hive (default)> drop table t1; [受控表managed table,从hdfs删除数据,metastore元数据信息删除] |
4 受控表(MANAGED_TABLE)包括内部表,分区表,桶表。
- (5) 分区表[数据在不同文件目录中,扫描范围]
1 | hive (default)> create table t3(id int,name string) partitioned by(grade int) row format delimited fields terminated by '\t'; [创建分区表] |
- (6) 桶表(bucket table)
1
2
3
4
5
6表链接中使用,提高效率
hive (default)> create table bucket_test(id int,name string) clustered by(id) into 3 buckets; [按照id分为3个桶]
hive (default)> set hive.enforce.bucketing=true; [启用桶表]
insert overwrite table buckets select ... from ... [插入数据]
hive (default)> insert overwrite table bucket_test select id,name from t2;
- (7) 外部表EXTERNAL_TABLE[删除表,只删除表定义,对HDFS的数据不会删除]{外部分区表}
1
2
3# hadoop fs -put name /testdata/external_teble
hive (default)> create external table external_test(id int,name string) row format delimited fields terminated by '\t' location '/testdata/external_table';
hive (default)> select id,name from external_test order by id desc;
5、命令行工具
(1) # hive
(2) # hive -e "select * from t2";
(3) # hive -e "select * from t2" >> a
(4) # hive -S -e "select * from t2" >> a
(5) # hive --hiveconf hive.querylog.location=/usr/local/hive-0.13.1/logs
(6) hive (default)> set hive.querylog.location;
hive.querylog.location=/tmp/root
(7) hive -f file
(8) source file
(9) /root/.hiverc 和 /root/.hivehistory
搜狗数据
use sougoulibs;
create external table sogou(dt string, websession string, word string, s_seq int, c_seq int, website string)
row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile location '/labs/sogou/';
6、hcatlog
参考:https://cwiki.apache.org/confluence/display/Hive/HCatalog
7、hive-1.2.1 cli启动报错
1 | mysql> select * from VERSION; |
1 | [hsu@server2 ~]$ hive |
解决:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18[hsu@server2 ~]$ ls hadoop/share/hadoop/yarn/lib/jline-0.9.94.jar
hadoop/share/hadoop/yarn/lib/jline-0.9.94.jar
[hsu@server2 ~]$ ls hive/lib/jline-2.12.jar
hive/lib/jline-2.12.jar
[hsu@server2 ~]$ mv hadoop/share/hadoop/yarn/lib/jline-0.9.94.jar hadoop/share/hadoop/yarn/lib/jline-0.9.94.jar.bak
[hsu@server2 ~]$ cp hive/lib/jline-2.12.jar hadoop/share/hadoop/yarn/lib/
[hsu@server2 ~]$ ls hadoop/share/hadoop/yarn/lib/jline-*
hadoop/share/hadoop/yarn/lib/jline-0.9.94.jar.bak hadoop/share/hadoop/yarn/lib/jline-2.12.jar
[hsu@server2 ~]$ hive
Logging initialized using configuration in file:/home/hsu/apache-hive-1.2.1-bin/conf/hive-log4j.properties
hive (default)> show tables;
OK
tab_name
Time taken: 2.386 seconds
由于使用的jline版本不一致导致问题~!
7. Hive 2.0.1 FAQ
启动报错:
1
2
3 Required table missing : "`VERSION`" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.schema.autoCreateTables"
org.datanucleus.store.rdbms.exceptions.MissingTableException: Required table missing : "`VERSION`" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.schema.autoCreateTables"
at org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:606)
解决:
在hive-site.xml文件中加入如下配置解决
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20<property>
<name>datanucleus.readOnlyDatastore</name>
<value>false</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>false</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateTables</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateColumns</name>
<value>true</value>
</property>
hive 2.0.1 beeline权限验证问题
- https://community.hortonworks.com/questions/4905/error-while-running-hive-queries-from-zeppelin.html
- http://stackoverflow.com/questions/25073792/error-e0902-exception-occured-user-root-is-not-allowed-to-impersonate-root
1
2
3
4
5
6
7# core-site.xml添加,并重启集群hdfs & yarn
<property>
<name>hadoop.proxyuser.hadoop.hosts</name><value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name><value>*</value>
</property>
hive 2.0.1 and tez 0.7.1 兼容性问题
1、整合hive 2.0.1 and tez 0.7.1发现一些class包没用被加载,提示是没用找到类。
2、通过降低hive版本为 1.2.1发现没用任何问题,可以正常执行tez任务!
3、社区查看hive2.0.1依赖的tez版本是0.8.3,打算重新手动编译在测试是否可行,待验证。
4、这一点可以说明tez这个包各个版本直接接口的兼容性非常差,导致升级代价大,难以维护!