Zeppelin是一个Web笔记形式的交互式数据查询分析工具,可以在线用scala和SQL对数据进行查询分析并生成报表。Zeppelin的后台数据引擎可以是Spark(目前只有Spark),开发者可以通过实现更多的解释器来为Zeppelin添加数据引擎。

本文,介绍zeppelin 编译安装,入门使用介绍。

Requirements

  • Git
  • Java 1.7
  • Tested on Mac OSX, Ubuntu 14.X, CentOS 6.X, Windows 7 Pro SP1
  • Maven (if you want to build from the source code)
  • Node.js Package Manager (npm, downloaded by Maven during build phase)

编译环境构建

git install

1
2
[root@gitlab-machine ~]# git version
git version 1.7.1

install jdk

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
[root@gitlab-machine ~]# wget http://download.oracle.com/otn-pub/java/jdk/7u79-b15/jdk-7u79-linux-x64.tar.gz

[root@gitlab-machine ~]# tar -zxf jdk-7u79-linux-x64.tar.gz -C /opt/

[root@gitlab-machine ~]# cd /opt/

[root@gitlab-machine opt]# ln -s jdk1.7.0_79 jdk

[root@gitlab-machine opt]# tail -5 ~/.bash_profile
export JAVA_HOME=/opt/jdk

export PATH=.:$JAVA_HOME/bin:$PATH

export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

[root@gitlab-machine opt]# source ~/.bash_profile
[root@gitlab-machine opt]# java -version
java version "1.7.0_79"
Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)

install maven

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[root@gitlab-machine opt]# wget http://www.eu.apache.org/dist/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz

[root@gitlab-machine opt]# tar -zxf apache-maven-3.3.3-bin.tar.gz

[root@gitlab-machine opt]# ln -s apache-maven-3.3.3 maven

[root@gitlab-machine opt]# echo "export MAVEN_HOME=/opt/maven" >> ~/.bash_profile

[root@gitlab-machine opt]# echo "export PATH=$MAVEN_HOME/bin:$PATH:$HOME/bin" >> ~/.bash_profile

[root@gitlab-machine opt]# source ~/.bash_profile
[root@gitlab-machine opt]# mvn -version
Apache Maven 3.3.3 (7994120775791599e205a5524ec3e0dfe41d4a06; 2015-04-22T19:57:37+08:00)
Maven home: /opt/maven
Java version: 1.7.0_79, vendor: Oracle Corporation
Java home: /opt/jdk1.7.0_79/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "2.6.32-504.el6.x86_64", arch: "amd64", family: "unix"

install node.js

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
yum install http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

yum repolist

[root@gitlab-machine opt]# yum search nodejs npm|wc -l
21

[root@gitlab-machine opt]# sudo yum install nodejs npm --enablerepo=epel

[root@gitlab-machine opt]# node -v
v0.10.42

[root@gitlab-machine opt]# npm -v
1.3.6

[root@gitlab-machine opt]# cd /data/

build zeppline

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
[root@gitlab-machine opt]# cd /data/

[root@gitlab-machine data]# wget https://github.com/apache/zeppelin/archive/v0.5.6.zip

[root@gitlab-machine data]# unzip v0.5.6.zip

[root@gitlab-machine data]# cd zeppelin-0.5.6/

[root@gitlab-machine zeppelin-0.5.6]# nohup mvn clean package -Pspark-1.6 -Phadoop-2.6 -Pyarn -Ppyspark -DskipTests > nohup.out &

[root@gitlab-machine zeppelin-0.5.6]# jobs
[1]+ Running nohup mvn clean package -Pspark-1.6 -Phadoop-2.6 -Pyarn -Ppyspark -DskipTests > nohup.out &

[root@gitlab-machine ~]# tail -f /data/zeppelin-0.5.6/nohup.out
[INFO] Reactor Build Order:
[INFO]
[INFO] Zeppelin
[INFO] Zeppelin: Interpreter
[INFO] Zeppelin: Zengine
[INFO] Zeppelin: Spark dependencies
[INFO] Zeppelin: Spark
[INFO] Zeppelin: Markdown interpreter
[INFO] Zeppelin: Angular interpreter
[INFO] Zeppelin: Shell interpreter
[INFO] Zeppelin: Hive interpreter
[INFO] Zeppelin: Apache Phoenix Interpreter
[INFO] Zeppelin: PostgreSQL interpreter
[INFO] Zeppelin: Tajo interpreter
[INFO] Zeppelin: Flink
[INFO] Zeppelin: Apache Ignite interpreter
[INFO] Zeppelin: Kylin interpreter
[INFO] Zeppelin: Lens interpreter
[INFO] Zeppelin: Cassandra
[INFO] Zeppelin: Elasticsearch interpreter
[INFO] Zeppelin: web Application
[INFO] Zeppelin: Server
[INFO] Zeppelin: Packaging distribution

Package

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
    $ export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=1024m"

$ nohup mvn clean package -Pbuild-distr -Pspark-1.6 -Phadoop-2.6 -Pyarn -Ppyspark -DskipTests > nohup.out &

[INFO] Reading assembly descriptor: src/assemble/distribution.xml
[INFO] Copying files to /data/zeppelin-0.5.6/zeppelin-distribution/target/zeppelin-0.5.6-incubating
[INFO] Building tar: /data/zeppelin-0.5.6/zeppelin-distribution/target/zeppelin-0.5.6-incubating.tar.gz
[INFO]
[INFO] --- maven-site-plugin:3.4:attach-descriptor (attach-descriptor) @ zeppelin-distribution ---
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Zeppelin ........................................... SUCCESS [ 6.620 s]
[INFO] Zeppelin: Interpreter .............................. SUCCESS [ 13.075 s]
[INFO] Zeppelin: Zengine .................................. SUCCESS [ 9.248 s]
[INFO] Zeppelin: Spark dependencies ....................... SUCCESS [ 44.619 s]
[INFO] Zeppelin: Spark .................................... SUCCESS [ 46.444 s]
[INFO] Zeppelin: Markdown interpreter ..................... SUCCESS [ 0.620 s]
[INFO] Zeppelin: Angular interpreter ...................... SUCCESS [ 0.423 s]
[INFO] Zeppelin: Shell interpreter ........................ SUCCESS [ 0.470 s]
[INFO] Zeppelin: Hive interpreter ......................... SUCCESS [ 1.962 s]
[INFO] Zeppelin: Apache Phoenix Interpreter ............... SUCCESS [ 4.581 s]
[INFO] Zeppelin: PostgreSQL interpreter ................... SUCCESS [ 0.453 s]
[INFO] Zeppelin: Tajo interpreter ......................... SUCCESS [ 1.529 s]
[INFO] Zeppelin: Flink .................................... SUCCESS [ 7.911 s]
[INFO] Zeppelin: Apache Ignite interpreter ................ SUCCESS [ 0.604 s]
[INFO] Zeppelin: Kylin interpreter ........................ SUCCESS [ 0.323 s]
[INFO] Zeppelin: Lens interpreter ......................... SUCCESS [ 2.798 s]
[INFO] Zeppelin: Cassandra ................................ SUCCESS [01:14 min]
[INFO] Zeppelin: Elasticsearch interpreter ................ SUCCESS [ 2.570 s]
[INFO] Zeppelin: web Application .......................... SUCCESS [01:02 min]
[INFO] Zeppelin: Server ................................... SUCCESS [01:58 min]
[INFO] Zeppelin: Packaging distribution ................... SUCCESS [01:02 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 07:42 min
[INFO] Finished at: 2016-06-14T10:58:05+08:00
[INFO] Final Memory: 171M/466M
[INFO] ------------------------------------------------------------------------

zeppelin deploy

可以开始启动zeppline程序,在web可视化界面利用大数据技术,数据分析探索。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@gitlab-machine ]# cp /data/zeppelin-0.5.6/zeppelin-distribution/target/zeppelin-0.5.6-incubating.tar.gz  -C /data/deploy/

[root@gitlab-machine zeppelin-0.5.6-incubating]# bin/zeppelin-daemon.sh start
Log dir doesn't exist, create /data/deploy/zeppelin-0.5.6-incubating/logs
Pid dir doesn't exist, create /data/deploy/zeppelin-0.5.6-incubating/run
Zeppelin start [ OK ]

[root@gitlab-machine zeppelin-0.5.6-incubating]# bin/zeppelin-daemon.sh start
Zeppelin start [ OK ]
[root@gitlab-machine zeppelin-0.5.6-incubating]# ls logs/
zeppelin-root-gitlab-machine.log zeppelin-root-gitlab-machine.out

[root@gitlab-machine zeppelin-0.5.6-incubating]# lsof -i :8080
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 38243 root 597u IPv6 388928 0t0 TCP *:webcache (LISTEN)

zeppelin configuartion

涉及的配置文件有两个:
zeppelin-env.sh.template
zeppelin-site.xml.template
复制出来直接修改相关参数即可,详情请参考:http://zeppelin.apache.org/docs/0.5.6-incubating/

FAQ

‘npm install –color=false’ failed

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
[INFO] Zeppelin: web Application .......................... FAILURE [12:32 min]
[INFO] Zeppelin: Server ................................... SKIPPED
[INFO] Zeppelin: Packaging distribution ................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 02:04 h
[INFO] Finished at: 2016-06-13T18:12:54+08:00
[INFO] Final Memory: 129M/410M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal com.github.eirslett:frontend-maven-plugin:0.0.23:npm (npm install) on project zeppelin-web: Failed to run task: 'npm install --color=false' failed. (error code 1) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command

解决:
$ cd zeppelin-web

$ npm install
> [email protected] install /data/zeppelin-0.5.6/zeppelin-web/node_modules/karma-phantomjs-launcher/node_modules/phantomjs
> node install.js

PhantomJS not found on PATH
Downloading https://github.com/Medium/phantomjs/releases/download/v1.9.19/phantomjs-1.9.8-linux-x86_64.tar.bz2
Saving to /data/zeppelin-0.5.6/zeppelin-web/node_modules/karma-phantomjs-launcher/node_modules/phantomjs/phantomjs/phantomjs-1.9.8-linux-x86_64.tar.bz2 ##这个文件无法下载,去这个地址下载:“https://github-cloud.s3.amazonaws.com”出现连接"Connection timed out.“错误。重新执行多次即可通过,这个是世界2大互联网之间联系微妙的原因啦。
Receiving...
[==--------------------------------------] 4%

‘grunt –no-color’ failed.

1
[ERROR] Failed to execute goal com.github.eirslett:frontend-maven-plugin:0.0.23:grunt (grunt build) on project zeppelin-web: Failed to run task: 'grunt --no-color' failed. (error code 3) -> [Help 1]

解决:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
    [root@gitlab-machine zeppelin-0.5.6]# cd zeppelin-web/
[root@gitlab-machine zeppelin-web]# grunt build
ERROR [launcher]: No binary for PhantomJS browser on your platform.
Please, set "PHANTOMJS_BIN" env variable.
Warning: Task "karma:unit" failed. Use --force to continue.

1. 初步查看日志,可以看到是由于上一个错误的库phantomjs引起的错误。
2. 进一步查看日志如下,发现在执行npm install可以通过了,而执行“bower --allow-root install”开始报错。
[INFO] --- frontend-maven-plugin:0.0.23:npm (npm install) @ zeppelin-web ---
[INFO] Running 'npm install --color=false' in /data/zeppelin-0.5.6/zeppelin-web
[INFO]
[INFO] --- frontend-maven-plugin:0.0.23:bower (bower install) @ zeppelin-web ---
[INFO] Running 'bower --allow-root install' in /data/zeppelin-0.5.6/zeppelin-web
[ERROR] [{
[ERROR] "level": "info",
[ERROR] "id": "cached",
[ERROR] "message": "https://github.com/angular/bower-angular.git#1.3.8",

[root@gitlab-machine zeppelin-0.5.6]# cd zeppelin-web/
[root@gitlab-machine zeppelin-web]# bower install

最终修改pom.xml信息解决问题,网上这么说的,编译过后启动web页面首页显示有些许问题,无奈啊,前端太渣,这个项目用到了grunt,bower,node,npm等前端技术,这个web模块比较难通过,tez-ui也一样,相对来说编译都不那么顺利。
<plugin>
<execution>
<id>install node and npm</id>
<goals>
<goal>install-node-and-npm</goal>
</goals>
<configuration>
<nodeVersion>v0.10.18</nodeVersion>
<npmVersion>1.3.8</npmVersion>
</configuration>
</execution>

<execution>
<id>npm install</id>
<goals>
<goal>npm</goal>
</goals>
</execution>

<execution>
<id>bower install</id>
<goals>
<goal>bower</goal>
</goals>
<configuration>
<arguments>--allow-root install</arguments>
</configuration>
</execution>

<execution>
<id>grunt build</id>
<goals>
<goal>grunt</goal>
</goals>
<configuration>
<arguments>--no-color --force</arguments>
</configuration>
</execution>

</executions>
</plugin>

Address already in use

需要注意的是,Zeppelin默认是在8080端口上启动相关的web服务的,在你服务器上,如果这个端口已经被占用了,那么会导致Zeppelin启动失败,并在日志里面抛出以下的异常;ps:我的这个报错就因为我在主机上部署了gitlab导致端口被占用:

1
2
3
ERROR [2016-06-14 11:05:35,848] ({main} ZeppelinServer.java[main]:112) - Error while running jettyServer
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)

Installing Zeppelin on an Ambari-Managed Cluster

1
参考:http://hortonworks.com/hadoop-tutorial/apache-zeppelin-hdp-2-4/

参考地址:https://github.com/apache/zeppelin/
http://zeppelin.apache.org/docs/0.5.6-incubating/install/install.html
http://zeppelin.apache.org/docs/0.5.6-incubating/manual/interpreters.html
https://www.tutorialspoint.com/zookeeper/zookeeper_cli.htm