tez-use

为了更高效地运行存在依赖关系的作业(比如Pig和Hive产生的MapReduce作业),减少磁盘和网络IO,Hortonworks开发了DAG计算框架Tez。Tez是从MapReduce计算框架演化而来的通用DAG计算框架,可作为MapReduceR/Pig/Hive等系统的底层数据处理引擎,它天生融入Hadoop 2.0中的资源管理平台YARN,且由Hadoop 2.0核心人员精心打造,势必将会成为计算框架中的后起之秀。本文将重点介绍Tez的最新进展。

Tez有以下几个特色:

  • (1) 丰富的数据流(dataflow,NOT Streaming!)编程接口;
  • (2) 扩展性良好的“Input-Processor-Output”运行模型;
  • (3) 简化数据部署(充分利用了YARN框架,Tez本身仅是一个客户端编程库,无需事先部署相关服务)
  • (4) 性能优于MapReduce
  • (5) 优化的资源管理(直接运行在资源管理系统YARN之上)
  • (6) 动态生成物理数据流(dataflow)

DAG task


git获取源码

    $ git clone git@github.com:apache/tez.git

一、安装必要软件

1、安装java

    # wget http://download.oracle.com/otn-pub/java/jdk/7u79-b15/jdk-7u79-linux-x64.tar.gz
    # tar -zxvf jdk-7u79-linux-x64.tar.gz 
    # ln -s jdk1.7.0_79 jdk

    加入环境变量
    # source ~/.bash_profile 
    # java -version 
    java version "1.7.0_79"
    Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
    Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)

2、安装apache maven

    # wget http://archive.apache.org/dist/maven/maven-3/3.0.5/binaries/apache-maven-3.0.5-bin.tar.gz
    # tar -zxf apache-maven-3.0.5-bin.tar.gz
    # ln -s apache-maven-3.0.5 maven
    # echo "export MAVEN_HOME=/usr/local/maven" >> ~/.bash_profile
    # echo "export PATH=$MAVEN_HOME/bin:$PATH:$HOME/bin" >> ~/.bash_profile
    # source ~/.bash_profile 
    # mvn -version
    Apache Maven 3.0.5 (r01de14724cdef164cd33c7c8c2fe155faf9602da; 2013-02-19 21:51:28+0800)
    Maven home: /usr/local/maven
    Java version: 1.7.0_79, vendor: Oracle Corporation
    Java home: /usr/local/jdk1.7.0_79/jre
    Default locale: en_US, platform encoding: UTF-8
    OS name: "linux", version: "2.6.32-504.el6.x86_64", arch: "amd64", family: "unix"

3、Protocol Buffers 2.5.0

    # tar -zxvf protobuf-2.5.0.tar.gz 
    # cd protobuf-2.5.0
    # ./configure --prefix=/usr/local/protobuf-2.5/
    # make && make install
    # ln -s protobuf-2.5 protobuf
    #在~/.bash_porfile添加 export PATH=/usr/local/protobuf/bin
    # source ~/.bash_profile 
    # protoc --version
    libprotoc 2.5.0

注意: 如果是ubuntu系统,编译报错,执行sudo apt-get install build-essential,来安装许多基本库,然后编译即可顺利完成!

4、编译tez

github获取某个release版本

  # git clone https://github.com/apache/tez.git
  # cd tez/

  修改pom中hadoop版本为<hadoop.version>2.4.0</hadoop.version>

  # git tag
    release-0.2.0-rc0
    release-0.3.0-incubating
    release-0.3.0-incubating-rc0
    release-0.3.0-incubating-rc1
    release-0.4.0-incubating
    release-0.4.1-incubating
    release-0.5.0
    release-0.5.1
    release-0.5.2
    release-0.5.3
    release-0.5.4
    release-0.6.0
    release-0.6.1
    release-0.7.0

    # git show-branch
    [master] TEZ-2610: yarn-swimlanes for DAGs that use containers of a previous DAG (gopalv)

5、 change hadoop-version为你的版本

Build tez

    # mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true

    报错1:
    [ERROR] Failed to execute goal com.github.eirslett:frontend-maven-plugin:0.0.22:install-node-and-npm (install node and npm) on project tez-ui: Could not download npm: Could not download http://registry.npmjs.org/npm/-/npm-1.3.8.tgz: Unknown host registry.npmjs.org: Name or service not known -> [Help 1]
    [ERROR] 
    [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
    [ERROR] Re-run Maven using the -X switch to enable full debug logging.
    [ERROR] 
    [ERROR] For more information about the errors and possible solutions, please read the following articles:
    [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
    [ERROR] 
    [ERROR] After correcting the problems, you can resume the build with the command
    [ERROR]   mvn <goals> -rf :tez-ui

    解决:https://issues.apache.org/jira/browse/TEZ-2229
    # cd tez-ui/
    then I changed tez-ui/pom.xml, add '--allow-root' argument:
    <configuration>
    <workingDirectory>$
    {webappDir}
    </workingDirectory>
    <executable>$
    {node.executable}
    </executable>
    <arguments>
    <argument>node_modules/bower/bin/bower</argument>
    <argument>install</argument>
    <argument>--remove-unnecessary-resolutions=false</argument>
    <argument>--allow-root</argument>
    </arguments>
    </configuration>

    --通过手动安装tez后解决问题如下
    # wget http://mirror.bit.edu.cn/apache/tez/0.7.0/apache-tez-0.7.0-src.tar.gz
    # cd apache-tez-0.7.0-src
    # mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true

    # yum -y install gcc make gcc-c++ openssl 
    # wget http://nodejs.org/dist/v0.10.26/node-v0.10.26.tar.gz
    # tar -zxvf node-v0.10.26.tar.gz 
    # make && make install

    # node -v
      v0.10.26

    --继续编译tez
    # mvn -X clean package -DskipTests=true -Dmaven.javadoc.skip=true

    报错2:
    [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project tez-yarn-timeline-history-with-acls: Compilation failure: Compilation failure:
    [ERROR] symbol:   class TimelineDomain

    原因:由于修改了pom中hadoop.version=2.4.0,而tez-0.7默认是2.6.0版本

    --继续编译
    # mvn -X clean package -DskipTests=true -Dmaven.javadoc.skip=true    
    [INFO] ------------------------------------------------------------------------
    [INFO] Reactor Summary:
    [INFO] 
    [INFO] tez ............................................... SUCCESS [3.563s]
    [INFO] tez-api ........................................... SUCCESS[24.276s]
    [INFO] tez-common ........................................ SUCCESS [1.977s]
    [INFO] tez-runtime-internals ............................. SUCCESS [3.696s]
    [INFO] tez-runtime-library ............................... SUCCESS[11.011s]
    [INFO] tez-mapreduce ..................................... SUCCESS [4.218s]
    [INFO] tez-examples ...................................... SUCCESS [1.187s]
    [INFO] tez-dag ........................................... SUCCESS[10.370s]
    [INFO] tez-tests ......................................... SUCCESS [3.100s]
    [INFO] tez-ui ............................................ SUCCESS [14.876s]
    [INFO] tez-plugins ....................................... SUCCESS [0.097s]
    [INFO] tez-yarn-timeline-history ......................... SUCCESS [2.782s]
    [INFO] tez-yarn-timeline-history-with-acls ............... SUCCESS [2.044s]
    [INFO] tez-mbeans-resource-calculator .................... SUCCESS [2.047s]
    [INFO] tez-tools ......................................... SUCCESS [0.263s]
    [INFO] tez-dist .......................................... SUCCESS [1:17:56.763s]
    [INFO] Tez ............................................... SUCCESS [0.032s]
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time: 1:19:24.309s
    [INFO] Finished at: Sat Jul 18 07:33:17 HKT 2015
    [INFO] Final Memory: 43M/202M
    [INFO] ------------------------------------------------------------------------

    # tree -L 1
    .
    |-- BUILDING.txt
    |-- CHANGES.txt
    |-- INSTALL.md -> docs/src/site/markdown/install.md
    |-- KEYS
    |-- LICENSE.txt
    |-- NOTICE.txt
    |-- README.md
    |-- Tez_DOAP.rdf
    |-- build-tools
    |-- docs
    |-- pom.xml
    |-- target
    |-- tez-api
    |-- tez-common
    |-- tez-dag
    |-- tez-dist
    |-- tez-examples
    |-- tez-mapreduce
    |-- tez-plugins
    |-- tez-runtime-internals
    |-- tez-runtime-library
    |-- tez-tests
    |-- tez-tools
    `-- tez-ui

    # tar zcf apache-tez-0.7.0.tar.gz apache-tez-0.7.0-src
    # sz apache-tez-0.7.0.tar.gz

总结:编译阶段不断重试,国内网络大家都懂,各种包下载不惜来,多重试几次就ok了

Tez版本兼容情况
https://cwiki.apache.org/confluence/display/TEZ/Version+Compatibility

二、tez install

1、上传编译后的Tez的tarball到hdfs

    # rz
    rz waiting to receive.
    ¿ª' zmodem ´«?¡£  °´ Ctrl+C ??¡£
    Transferring apache-tez-0.7.0.tar.gz...
      100%  128772 KB 11706 KB/s 00:00:11       0 Errors

    # tar -zxvf apache-tez-0.7.0.tar.gz 
    # cd tez-dist/target/
    # tree -L 2
    .
    |-- archive-tmp
    |-- maven-archiver
    |   `-- pom.properties
    |-- tez-0.7.0
    |   |-- lib
    |   |-- tez-api-0.7.0.jar
    |   |-- tez-common-0.7.0.jar
    |   |-- tez-dag-0.7.0.jar
    |   |-- tez-examples-0.7.0.jar
    |   |-- tez-mapreduce-0.7.0.jar
    |   |-- tez-mbeans-resource-calculator-0.7.0.jar
    |   |-- tez-runtime-internals-0.7.0.jar
    |   |-- tez-runtime-library-0.7.0.jar
    |   |-- tez-tests-0.7.0.jar
    |   |-- tez-ui-0.7.0.war
    |   |-- tez-yarn-timeline-history-0.7.0.jar
    |   `-- tez-yarn-timeline-history-with-acls-0.7.0.jar
    |-- tez-0.7.0-minimal.tar.gz
    |-- tez-0.7.0.tar.gz
    `-- tez-dist-0.7.0-tests.jar

    # ls apache-tez-0.7.0-src/tez-dist/target/
     archive-tmp  maven-archiver  tez-0.7.0  tez-0.7.0-minimal.tar.gz  tez-0.7.0.tar.gz  tez-dist-0.7.0-tests.jar

    # hadoop version
    Hadoop 2.4.0
    Subversion Unknown -r Unknown
    Compiled by root on 2014-07-27T04:58Z
    Compiled with protoc 2.5.0
    From source with checksum 375b2832a6641759c6eaf6e3e998147
    This command was run using /usr/local/hadoop-2.4.0/share/hadoop/common/hadoop-common-2.4.0.jar

    # hadoop fs -mkdir -p /apps/tez-0.7.0
    # hadoop fs -copyFromLocal apache-tez-0.7.0-src/tez-dist/target/tez-0.7.0.tar.gz /apps/tez-0.7.0

2、解压tez部署包

    # tar -zxvf tez-0.7.0.tar.gz 
    # cp apache-tez-0.7.0-src/tez-dist/target/tez-0.7.0.tar.gz . 
    # mkdir tez-0.7.0  
    # tar zxf tez-0.7.0.tar.gz -C tez-0.7.0
    # ln -s tez-0.7.0 tez

3、配置环境变量

    export TEZ_HOME=/usr/local/tez
    export TEZ_CONF_DIR=$TEZ_HOME/conf
    export TEZ_JARS=$TEZ_HOME

    export HADOOP_CLASSPATH=${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*

4、配置文件

    # mkdir tez/conf
    # cat tez/conf/tez-site.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
      <property>
      <name>tez.lib.uris</name>
      <value>${fs.defaultFS}/apps/tez-0.7.0/tez-0.7.0.tar.gz</value>
      </property>
    </configuration>

5、修改 mapred-site.xml

    <configuration>
    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn-tez</value>
    </property>
    </configuration>

6、example of using an MRR job in the tez-examples.jar

    # hadoop fs -cat /testdata/external_table/external_teble  
    1       zhangsan
    2       lisi
    3       wanwu
    4       zhaoliu
    5       da
    6       abc

    # hadoop fs -put tez-0.7.0 /apps

    # hadoop jar tez-examples-0.7.0.jar orderedwordcount  /apps/tez-0.7.0/conf/tez-site.xml /output
    INFO client.TezClientUtils: Using tez.lib.uris value from configuration: hdfs://mycluster/apps/tez-0.7.0,hdfs://mycluster/apps/tez-0.7.0/lib/
    INFO client.DAGClientImpl: DAG initialized: CurrentState=Running

    报错:
    15/07/18 02:17:38 INFO examples.OrderedWordCount: DAG diagnostics: [Application application_1437156593890_0002 failed 2 times due to AM Container for appattempt_1437156593890_0002_000002 exited with  exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: 
    org.apache.hadoop.util.Shell$ExitCodeException: 
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
        at org.apache.hadoop.util.Shell.run(Shell.java:418)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)


    Container exited with a non-zero exit code 1
    .Failing this attempt.. Failing the application.]

7、hive on tez

hive版本说明:hive-0.14+

版本为hive-0.13.1和hadoop2.4测试失败

修改hive的执行引擎为 Tez
 <property>
   <name>hive.execution.engine</name>
   <value>tez</value>
 </property>

# cp tez-0.7.0/tez-*.jar hive/lib/
# cp tez-0.7.0/lib/* hive/lib/

hive (default)> set hive.execution.engine;      
hive.execution.engine=mr
hive (default)> set hive.execution.engine=tez;  
hive (default)> select count(*) from order_test;
Total jobs = 1
Launching Job 1 out of 1
Exception in thread "Thread-6" java.lang.NoSuchMethodError: org.apache.tez.mapreduce.hadoop.MRHelpers.updateEnvironmentForMRAM(Lorg/apache/hadoop/conf/Configuration;Ljava/util/Map;)V

升级hadoop版本为2.6,hive版本为1.2.1测试

$ mv hadoop-2.6.0/share/hadoop/yarn/lib/jline-0.9.94.jar hadoop-2.6.0/share/hadoop/yarn/lib/jline-0.9.94.jar.bak  

[hsu@server2 ~]$ hive

Logging initialized using configuration in file:/home/hsu/apache-hive-1.2.1-bin/conf/hive-log4j.properties
hive (default)> show tables;
OK
tab_name
test
Time taken: 2.33 seconds, Fetched: 1 row(s)
hive (default)> select * from test;
OK
test.id
1
2
3
Time taken: 0.648 seconds, Fetched: 3 row(s)

hive (default)> set hive.execution.engine;
hive.execution.engine=tez
hive (default)> select count(*) from test;
Query ID = hsu_20150813161742_a64d46a7-7349-41b3-be27-9a91bb490a8b
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1439448136822_0002)
--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 51.17 s    
--------------------------------------------------------------------------------
OK
_c0
3
Time taken: 54.354 seconds, Fetched: 1 row(s)

#队列使用
set tez.queue.name=eda;

hive (default)>  set tez.queue.name=test;
hive (default)> select count(*) from test;
Query ID = hsu_20150813162458_b0ce5d5f-09d2-40ae-8b6a-90f1f4cebe91
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id application_1439448136822_0003)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 32.71 s    
--------------------------------------------------------------------------------
OK
_c0
3
Time taken: 84.191 seconds, Fetched: 1 row(s)

hive (default)> set hive.execution.engine;
hive.execution.engine=tez
hive (default)> set hive.execution.engine=mr;
hive (default)> select count(*) from test;
Query ID = hsu_20150813162723_31559eda-c8ab-4ca6-a445-b75afc908d28
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1439448136822_0004, Tracking URL = http://server1:23188/proxy/application_1439448136822_0004/
Kill Command = /home/hsu/hadoop/bin/hadoop job  -kill job_1439448136822_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2015-08-13 16:28:33,107 Stage-1 map = 0%,  reduce = 0%
2015-08-13 16:29:20,688 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.99 sec
2015-08-13 16:29:49,215 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.27 sec
MapReduce Total cumulative CPU time: 2 seconds 270 msec
Ended Job = job_1439448136822_0004
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 2.27 sec   HDFS Read: 210 HDFS Write: 2 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 270 msec
OK
_c0
3
Time taken: 148.332 seconds, Fetched: 1 row(s)
    1. hive on tez,直接配置hive支持tez,其实是在yarn上面启动一个app,这个app永久运行,接受不同sql,除非退出hive cli此app才会结束
    1. 可以通过tez提高的ui查看任务执行情况,需要单独配置tezUI!
    1. tez runing app在yarn上面和spark on yarn很类似,而且tez,spark底层原理也很相似!
    1. 在切换队列时,会重启一个app运行!
    1. 在切换底层为mr引擎的时候也会重启app运行,但是tez那个app也没结束,等待mr app运行结束后,两个一起结束!

8、排查问题

  • 1、取消所有tez依赖单独运行

    • 结论:可以成功运行yarn任务,断定是不兼容问题,由于我编译tez使用了2.6.0
    • 编译通过,而是用2.4.0的包编译失败!目前集群hadoop版本2.4.0!
  • 2、升级hadoop版本为2.6.0再次测试

升级为2.6.0以后测试成功,说明tez兼容性还是比较差。

    $ hadoop jar /usr/local/tez/tez-examples-0.7.0.jar orderedwordcount  /apps/tez- 0.7.0/conf/tez-site.xml /output
    15/07/20 13:53:52 INFO client.TezClient: Tez Client Version: [ component=tez-   api, version=0.7.0, revision=${buildNumber}, SCM-URL=scm:git:https://git-wip-us    .apache.org/repos/asf/tez.git, buildTime=20150718-0613 ]
    15/07/20 13:53:52 INFO client.RMProxy: Connecting to ResourceManager at itr-    mastertest01/192.168.2.200:8032
    15/07/20 13:54:13 INFO examples.OrderedWordCount: Running OrderedWordCount
    15/07/20 13:54:13 INFO client.TezClient: Submitting DAG application with id:    application_1437371604990_0001
    15/07/20 13:54:13 INFO client.TezClientUtils: Using tez.lib.uris value from     configuration: hdfs://mycluster/apps/tez-0.7.0,hdfs://mycluster/apps/tez-0.7.0/ lib/
    15/07/20 13:54:13 INFO client.TezClient: Stage directory /tmp/hsu/tez/staging   doesn't exist and is created
    15/07/20 13:54:14 INFO client.TezClient: Tez system stage directory hdfs:// mycluster/tmp/hsu/tez/staging/.tez/application_1437371604990_0001 doesn't    exist and is created
    15/07/20 13:54:14 INFO client.TezClient: Submitting DAG to YARN,    applicationId=application_1437371604990_0001, dagName=OrderedWordCount
    15/07/20 13:54:14 INFO impl.YarnClientImpl: Submitted application   application_1437371604990_0001
    15/07/20 13:54:14 INFO client.TezClient: The url to track the Tez AM: http://   server1:8088/proxy/application_1437371604990_0001/
    15/07/20 13:54:14 INFO client.RMProxy: Connecting to ResourceManager at itr-    mastertest01/192.168.2.200:8032
    15/07/20 13:54:15 INFO client.DAGClientImpl: Waiting for DAG to start running
    15/07/20 13:55:06 INFO client.DAGClientImpl: DAG initialized:   CurrentState=Running
    15/07/20 13:55:08 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0%   TotalTasks: 3 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
    15/07/20 13:55:08 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
    15/07/20 13:55:08 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
    15/07/20 13:55:08 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
    15/07/20 13:55:13 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0%   TotalTasks: 3 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
    15/07/20 13:55:13 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
    15/07/20 13:55:13 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
    15/07/20 13:55:13 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
    15/07/20 13:55:19 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0%   TotalTasks: 3 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
    15/07/20 13:55:19 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
    15/07/20 13:55:19 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
    15/07/20 13:55:19 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
    15/07/20 13:55:24 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0%   TotalTasks: 3 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
    15/07/20 13:55:24 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
    15/07/20 13:55:24 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
    15/07/20 13:55:24 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
    15/07/20 13:55:29 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0%   TotalTasks: 3 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
    15/07/20 13:55:29 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
    15/07/20 13:55:29 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
    15/07/20 13:55:29 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
    15/07/20 13:55:34 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0%   TotalTasks: 3 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
    15/07/20 13:55:34 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
    15/07/20 13:55:34 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
    15/07/20 13:55:34 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
    15/07/20 13:55:39 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0%   TotalTasks: 3 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
    15/07/20 13:55:39 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed:    0
    15/07/20 13:55:39 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
    15/07/20 13:55:39 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
    15/07/20 13:55:44 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0%   TotalTasks: 3 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
    15/07/20 13:55:44 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed:    0
    15/07/20 13:55:44 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
    15/07/20 13:55:44 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
    15/07/20 13:55:49 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0%   TotalTasks: 3 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
    15/07/20 13:55:49 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed:    0
    15/07/20 13:55:49 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
    15/07/20 13:55:49 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
    15/07/20 13:55:54 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0%   TotalTasks: 3 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
    15/07/20 13:55:54 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed:    0
    15/07/20 13:55:54 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
    15/07/20 13:55:54 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
    15/07/20 13:55:59 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0%   TotalTasks: 3 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
    15/07/20 13:55:59 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed:    0
    15/07/20 13:55:59 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
    15/07/20 13:55:59 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
    15/07/20 13:56:01 INFO client.DAGClientImpl: DAG: State: RUNNING Progress:  33.33% TotalTasks: 3 Succeeded: 1 Running: 1 Failed: 0 Killed: 0
    15/07/20 13:56:01 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0  Killed: 0
    15/07/20 13:56:01 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed:    0
    15/07/20 13:56:01 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
    15/07/20 13:56:03 INFO client.DAGClientImpl: DAG: State: RUNNING Progress:  66.67% TotalTasks: 3 Succeeded: 2 Running: 1 Failed: 0 Killed: 0
    15/07/20 13:56:03 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0  Killed: 0
    15/07/20 13:56:03 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0  Killed: 0
    15/07/20 13:56:03 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
    15/07/20 13:56:04 INFO client.DAGClientImpl: DAG: State: SUCCEEDED Progress:    100% TotalTasks: 3 Succeeded: 3 Running: 0 Failed: 0 Killed: 0
    15/07/20 13:56:04 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0  Killed: 0
    15/07/20 13:56:04 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0  Killed: 0
    15/07/20 13:56:04 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
    15/07/20 13:56:04 INFO client.DAGClientImpl: DAG completed. FinalState=SUCCEEDED

$ hadoop jar tez-examples-0.7.0.jar orderedwordcount -DUSE_TEZ_SESSION=true /test /output

  • 3、 yarn底层替换为tez,hive直接执行,效果如下!
      [hsu@server1 ~]$ hive
      Logging initialized using configuration in file:/home/hsu/apache-hive-1.2.1-bin /conf/hive-log4j.properties
      hive (default)> show tables;
      OK
      tab_name
      test
      Time taken: 1.106 seconds, Fetched: 1 row(s)
      hive (default)> select count(1) from test;
      Query ID = hsu_20150813170521_77b18935-4f38-4f84-8da7-6774b9aff194
      Total jobs = 1
      Launching Job 1 out of 1
      Number of reduce tasks determined at compile time: 1
      In order to change the average load for a reducer (in bytes):
        set hive.exec.reducers.bytes.per.reducer=<number>
      In order to limit the maximum number of reducers:
        set hive.exec.reducers.max=<number>
      In order to set a constant number of reducers:
        set mapreduce.job.reduces=<number>
      Starting Job = job_1439456630574_0001, Tracking URL = http://itr-   mastertest01:23188/proxy/application_1439456630574_0001/
      Kill Command = /home/hsu/hadoop/bin/hadoop job  -kill job_1439456630574_0001
      Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
      2015-08-13 17:06:36,372 Stage-1 map = 0%,  reduce = 0%
      2015-08-13 17:07:26,923 Stage-1 map = 100%,  reduce = 100%
      Ended Job = job_1439456630574_0001
      MapReduce Jobs Launched: 
      Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
      Total MapReduce CPU Time Spent: 0 msec
      OK
      _c0
      3
      Time taken: 126.726 seconds, Fetched: 1 row(s)
    

注意:yarn页面:app tyep 直接变成TEZ,但是如上执行还是mr执行程序日志~!这里比较难理解具体是mr执行的还是tez执行的!还得去底层跟踪确认!

(1)、yarn mr替换为tez,hive不做任务配置,提交任务到yarn,效果如下图:

(2)、不同tez版本支持的hadoop版本需要对应
(3)、编译时,我手动制定hadoop.version它有回出现yarn模块兼容性编译通不过。而默认的它指定的hadoop版本可以成功编译!
(4)、tez从另一个方面讲还不算成熟!~
(5)、支持ResourceManager HA

  • 3、目前xxx on yarn模式情况

(1) tez on yarn模式主要是hdp在推,相信未来和yarn集成兼容性各种问题都能迎刃而解!

(2) spark engine on yarn模式,主要是cdh在推,目前已经完成合并到hive主干!

ps:官网安装文档都说,自己可以修改hadoop.version进行编译,实际修改编译失败,比较费解,也说明tez的兼容性不好!

9、tez ui

(1) 配置tez-site.xml

$ cat tez-site.xml         
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>   

  <property>        
  <name>tez.lib.uris</name>        
  <value>${fs.defaultFS}/apps/tez-0.7.0,${fs.defaultFS}/apps/tez-0.7.0/lib/</value>   
  </property>

 <property>
  <description>Enable Tez to use the Timeline Server for History Logging</description>
   <name>tez.history.logging.service.class</name>
   <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
 </property>

 <property>
  <description>URL for where the Tez UI is hosted</description>
  <!--tomcat 9999 端口-->
  <name>tez.tez-ui.history-url.base</name>
  <value>http://server2:9999/tez-ui/</value>
 </property>

<property>
    <description>Publish configuration information to Timeline server.</description>
    <name>tez.runtime.convert.user-payload.to.history-text</name>
    <value>true</value>
</property>

<property>
    <name>tez.task.generate.counters.per.io</name>
    <value>true</value>
</property>
</configuration>

(2) 配置yarn-site.xml

<property>
  <description>Indicate to clients whether Timeline service is enabled or not. If enabled, the TimelineClient library used by end-users will post entities and events to the Timeline server.</description>
  <name>yarn.timeline-service.enabled</name>
  <value>true</value>
</property>
<property>
  <description>The hostname of the Timeline service web application.</description>
  <name>yarn.timeline-service.hostname</name>
  <value>server2</value>
</property>
<property>
  <description>Enables cross-origin support (CORS) for web services where cross-origin web response headers are needed. For example, javascript making a web services request to the timeline server.</description>
  <name>yarn.timeline-service.http-cross-origin.enabled</name>
  <value>true</value>
</property>
<property>
  <description>Publish YARN information to Timeline Server</description>
  <name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
  <value>true</value>
</property>
      <property>
       <description>The http address of the Timeline service web application.</description>
       <name>yarn.timeline-service.webapp.address</name>
       <value>${yarn.timeline-service.hostname}:8188</value>
     </property>
<property>
   <description>The https address of the Timeline service web application.</description>
  <name>yarn.timeline-service.webapp.https.address</name>
  <value>${yarn.timeline-service.hostname}:2191</value>
</property>

(3) 启动hadoop timeline

[hsu@server2 hadoop]$ yarn-daemon.sh start timelineserver

[hsu@server2 ~]$ lsof -i :8188
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
java    7189  hsu  281u  IPv6  71981      0t0  TCP server2:8188 (LISTEN)

(4) tez ui war包解压放到tomcat的webapps下

## 在server2上部署tomcat

$ tar -zxvf apache-tomcat-8.0.24.tar.gz 
[hsu@server2 ~]$ ln -s apache-tomcat-8.0.24 tomcat

[hsu@server2 ~]$ cp /usr/local/tez/tez-ui-0.7.0.war ./tomcat/webapps/

[hsu@server2 webapps]$ mkdir tez-ui           
[hsu@server2 webapps]$ mv tez-ui-0.7.0.war tez-ui/

[hsu@server2 webapps]$ cd tez-ui/
[hsu@server2 tez-ui]$ jar xvf tez-ui-0.7.0.war 

## 配置scripts/config.js文件

--Configuring the Timeline Server URL and Resource Manager UI URL 

 timelineBaseUrl: 'http://server1:8188',
 RMWebUrl: 'http://server1:23188',

(5) 启动tomcat

[hsu@server2 ~]$ tomcat/bin/startup.sh 
Using CATALINA_BASE:   /home/hsu/tomcat
Using CATALINA_HOME:   /home/hsu/tomcat
Using CATALINA_TMPDIR: /home/hsu/tomcat/temp
Using JRE_HOME:        /usr/local/jdk1.7.0_45
Using CLASSPATH:       /home/hsu/tomcat/bin/bootstrap.jar:/home/hsu/tomcat/bin/tomcat-juli.jar
Tomcat started.

#修改端口为9999
[hsu@server2 ~]$ tomcat/bin/shutdown.sh 
Using CATALINA_BASE:   /home/hsu/tomcat
Using CATALINA_HOME:   /home/hsu/tomcat
Using CATALINA_TMPDIR: /home/hsu/tomcat/temp
Using JRE_HOME:        /usr/local/jdk1.7.0_45
Using CLASSPATH:       /home/hsu/tomcat/bin/bootstrap.jar:/home/hsu/tomcat/bin/tomcat-juli.jar

[hsu@server2 ~]$ tomcat/bin/startup.sh 
[hsu@server2 ~]$ lsof -i :9999          
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
java    7824  hsu   43u  IPv6  75557      0t0  TCP *:distinct (LISTEN)

(6) 验证

1、webui验证:http://server2:9999/tez-ui/

tez-runing
tez-runting_dag

2、后台执行sql验证前台

注意:resourcemanager节点必须为timelineserver配置的节点,如果RS配置HA的需要特别注意!

hive 2.0.1 and tez 0.7.1 兼容性问题

1、整合hive 2.0.1 and tez 0.7.1发现一些class包没用被加载,提示是没用找到类。
2、通过降低hive版本为 1.2.1发现没用任何问题,可以正常执行tez任务!
3、社区查看hive2.0.1依赖的tez版本是0.8.3,打算重新手动编译在测试是否可行,待验证。
4、这一点可以说明tez这个包各个版本直接接口的兼容性非常差,导致升级代价大,难以维护!

参考:http://zh.hortonworks.com/hadoop-tutorial/supercharging-interactive-queries-hive-tez/

http://tez.apache.org/tez-ui.html

http://zh.hortonworks.com/blog/evaluating-hive-with-tez-as-a-fast-query-engine

http://tez.apache.org/install.html

原创文章,转载请注明: 转载自Itweet的博客
本博客的文章集合: http://www.itweet.cn/blog/archive/