Hive的各种调优设置
1、reduce个数
1 2 3 4 5
| .hive.exec.reduces.bytes.per.reducer .mapred.reduce.tasks=-1 CDH5: hive (default)> set mapred.reduce.tasks; mapred.reduce.tasks=-1
|
2、权限问题
1 2 3 4
| .hive.warehouse.subdir.inherit.perms CDH5:默认是true hive (default)> set hive.warehouse.subdir.inherit.perms; hive.warehouse.subdir.inherit.perms=true
|
3、HiveServer2的内存问题
1 2
| .设置-Xmx越大越好 -Xmx=2048m 甚至-Xmx=4g
|
4、关闭推测式执行
1 2 3 4 5 6 7 8
| CDH5 default: .mapred.map.tasks.speculative.execution=true .mapred.reduce.tasks.speculative.execution=true update back: hive (default)> set mapred.map.tasks.speculative.execution; mapred.map.tasks.speculative.execution=false hive (default)> set mapred.reduce.tasks.speculative.execution; mapred.reduce.tasks.speculative.execution=false
|
5、客户端
1 2
| . hive.cli.print.current.db . hive.cli.print.header
|
6、并行执行
1 2 3 4 5 6 7
| .hive.exec.parallel .hive.exec.parallel.thread.number hive (default)> set hive.exec.parallel; cdh5: hive.exec.parallel=true hive (default)> set hive.exec.parallel.thread.number; hive.exec.parallel.thread.number=16
|
7、MapJoin
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| .hive.auto.convert.join .hive.mapjoin.smalltable.filesize .hive.ignore.mapjoin.hint #忽略MAPJOIN写法,而是自动检查是否转换 CDH5: hive (default)> set hive.auto.convert.join; hive.auto.convert.join=true hive (default)> set hive.mapjoin.smalltable.filesize; hive.mapjoin.smalltable.filesize=500000000 hive (default)> set hive.ignore.mapjoin.hint; hive.ignore.mapjoin.hint=false
例子: hive (hsu_tmp)> select /*+MAPJOIN(tableX)*/ tableA.id,tableX.id from tableA join tableX on tableA.id=tableX.id; #tableX自动转换为小表,如果这个表过大,会出现oom问题
|
8、LocalModel [全局关闭local执行模式]
1 2 3 4 5 6 7 8 9 10
| .hive.exec.mode.local.auto .hive.exec.mode.local.auto.input.files.max .hive.exec.mode.local.auto.inputbytes.max CDH5: hive (hsu_tmp)> set hive.exec.mode.local.auto; hive.exec.mode.local.auto=false hive (hsu_tmp)> set hive.exec.mode.local.auto.input.files.max; hive.exec.mode.local.auto.input.files.max=4 hive (hsu_tmp)> set hive.exec.mode.local.auto.inputbytes.max; hive.exec.mode.local.auto.inputbytes.max=134217728
|
9、conf setting
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
| 推测式执行 <property> <name>hive.mapred.reduce.tasks.speculative.execution</name> <value>false</value> </property> <property> <name>mapreduce.reduce.speculative</name> <value>false</value> </property> 小表mapjoin <property> <name>hive.ignore.mapjoin.hint</name> <value>false</value> </property> <property> <name>hive.mapjoin.smalltable.filesize</name> <value>500000000</value> </property> 并行执行 <property> <name>hive.exec.parallel</name> <value>true</value> </property> <property> <name>hive.exec.parallel.thread.number</name> <value>16</value> </property> 客户端显示 <property> <name>hive.cli.print.current.db</name> <value>true</value> </property> <property> <name>hive.cli.print.header</name> <value>true</value> </property> 关闭自动统计 <property> <name>hive.stats.autogather</name> <value>false</value> </property>
|
10、查询表可以带引号来过滤
1 2 3 4 5 6 7 8 9 10 11
| hive (default)> show tables '*sample*'; OK tab_name sample_07 sample_08 Time taken: 0.057 seconds, Fetched: 2 row(s) hive (default)> show tables 'e*'; OK tab_name etl_data_source Time taken: 0.056 seconds, Fetched: 1 row(s)
|
- thriftserver并发支持多少个会话session,默认100个
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| <property> <name>hive.server2.thrift.min.worker.threads</name> <value>0</value> </property> <property> <name>hive.server2.thrift.max.worker.threads</name> <value>1024</value> </property> <property> <name>hive.server2.async.exec.threads</name> <value>100</value> </property>
|