hive各种调优设置

Hive的各种调优设置

1、reduce个数

  .hive.exec.reduces.bytes.per.reducer
  .mapred.reduce.tasks=-1
  CDH5:
    hive (default)> set mapred.reduce.tasks;
    mapred.reduce.tasks=-1

2、权限问题

  .hive.warehouse.subdir.inherit.perms
  CDH5:默认是true
    hive (default)>  set hive.warehouse.subdir.inherit.perms;
    hive.warehouse.subdir.inherit.perms=true

3、HiveServer2的内存问题

  .设置-Xmx越大越好
    -Xmx=2048m 甚至-Xmx=4g

4、关闭推测式执行

  CDH5 default:
  .mapred.map.tasks.speculative.execution=true
  .mapred.reduce.tasks.speculative.execution=true
  update back:
    hive (default)> set mapred.map.tasks.speculative.execution;
    mapred.map.tasks.speculative.execution=false
    hive (default)> set mapred.reduce.tasks.speculative.execution;
    mapred.reduce.tasks.speculative.execution=false

5、客户端

  . hive.cli.print.current.db
  . hive.cli.print.header

6、并行执行

  .hive.exec.parallel
  .hive.exec.parallel.thread.number
  hive (default)>  set hive.exec.parallel;
  cdh5:
    hive.exec.parallel=true
    hive (default)> set hive.exec.parallel.thread.number;
    hive.exec.parallel.thread.number=16

7、MapJoin

  .hive.auto.convert.join
  .hive.mapjoin.smalltable.filesize
  .hive.ignore.mapjoin.hint                   #忽略MAPJOIN写法,而是自动检查是否转换
  CDH5:
    hive (default)> set hive.auto.convert.join;
    hive.auto.convert.join=true
    hive (default)> set hive.mapjoin.smalltable.filesize;
    hive.mapjoin.smalltable.filesize=500000000
    hive (default)> set hive.ignore.mapjoin.hint;
    hive.ignore.mapjoin.hint=false                  

    例子:
      hive (hsu_tmp)> select /*+MAPJOIN(tableX)*/ tableA.id,tableX.id from tableA join tableX on tableA.id=tableX.id;     
      #tableX自动转换为小表,如果这个表过大,会出现oom问题

8、LocalModel [全局关闭local执行模式]

  .hive.exec.mode.local.auto
  .hive.exec.mode.local.auto.input.files.max
  .hive.exec.mode.local.auto.inputbytes.max
  CDH5:
    hive (hsu_tmp)> set hive.exec.mode.local.auto;
    hive.exec.mode.local.auto=false
    hive (hsu_tmp)> set hive.exec.mode.local.auto.input.files.max;
    hive.exec.mode.local.auto.input.files.max=4
    hive (hsu_tmp)> set hive.exec.mode.local.auto.inputbytes.max;
    hive.exec.mode.local.auto.inputbytes.max=134217728

9、conf setting

  推测式执行
  <property>
  <name>hive.mapred.reduce.tasks.speculative.execution</name>
  <value>false</value>
  </property>
  <property>
  <name>mapreduce.reduce.speculative</name>
  <value>false</value>
  </property>
  小表mapjoin
  <property>
  <name>hive.ignore.mapjoin.hint</name>
  <value>false</value>
  </property>
  <property>
  <name>hive.mapjoin.smalltable.filesize</name>
  <value>500000000</value>
  </property>
  并行执行
  <property>
  <name>hive.exec.parallel</name>
  <value>true</value>
  </property>
  <property>
  <name>hive.exec.parallel.thread.number</name>
  <value>16</value>
  </property>
  客户端显示
  <property>
  <name>hive.cli.print.current.db</name>
  <value>true</value>
  </property>
  <property>
  <name>hive.cli.print.header</name>
  <value>true</value>
  </property>
  关闭自动统计
  <property>
  <name>hive.stats.autogather</name>
  <value>false</value>
  </property>

10、查询表可以带引号来过滤

  hive (default)> show tables '*sample*';
  OK
  tab_name
  sample_07
  sample_08
  Time taken: 0.057 seconds, Fetched: 2 row(s) 
  hive (default)> show tables 'e*';
  OK
  tab_name
  etl_data_source
  Time taken: 0.056 seconds, Fetched: 1 row(s)
  1. thriftserver并发支持多少个会话session,默认100个

    <property>
    <name>hive.server2.thrift.min.worker.threads</name>
    <value>0</value>
    </property>
    
    <property>
    <name>hive.server2.thrift.max.worker.threads</name>
    <value>1024</value>
    </property>
    
    <property>
    <name>hive.server2.async.exec.threads</name>
    <value>100</value>
    </property>
    

原创文章,转载请注明: 转载自Itweet的博客
本博客的文章集合: http://www.itweet.cn/blog/archive/