Apache 软件基金会发起了一项名为“Drill”的开源项目。Apache Drill 实现了 Google’s Dremel.
Apache Drill 在基于 SQL 的数据分析和商业智能(BI)上引入了 JSON 文件模型,这使得用户能查询固定架构,演化架构,以及各种格式和数据存储中的模式无关(schema-free)数据。该体系架构中关系查询引擎和数据库的构建是有先决条件的,即假设所有数据都有一个简单的静态架构。
Apache Drill的架构是独一无二的。它是唯一一个支持复杂和无模式数据的柱状执行
引擎(columnar execution engine),也是唯一一个能在查询执行期间进行数据驱动 查询(和重新编译,也称之为 schema discovery)的执行引擎(execution engine)。这些独一无二的性能使得 Apache Drill 在 JSON 文件模式下能实现记录断点性能(record-breaking performance)。

数据结构:

兼容已有的 SQL 环境和 Apache Hive:

支持多框架:

单机模式

交互窗口-drill-embedded

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# wget http://getdrill.org/drill/download/apache-drill-1.0.0.tar.gz
# tar zxf apache-drill-1.0.0.tar.gz
# cd apache-drill-1.0.0

# bin/drill-embedded
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Jul 18, 2015 3:41:25 PM org.glassfish.jersey.server.ApplicationHandler initialize
INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 01:25:26...
apache drill 1.0.0
"drill baby drill"
0: jdbc:drill:zk=local> use cp;
+-------+---------------------------------+
| ok | summary |
+-------+---------------------------------+
| true | Default schema changed to [cp] |
+-------+---------------------------------+
1 row selected (0.186 seconds)

0: jdbc:drill:zk=local> select employee_id,full_name,first_name,last_name from `employee.json` limit 2;
+--------------+------------------+-------------+------------+
| employee_id | full_name | first_name | last_name |
+--------------+------------------+-------------+------------+
| 1 | Sheri Nowmer | Sheri | Nowmer |
| 2 | Derrick Whelply | Derrick | Whelply |
+--------------+------------------+-------------+------------+
2 rows selected (0.426 seconds)

0: jdbc:drill:zk=local> !quit
Closing: org.apache.drill.jdbc.DrillJdbc41Factory$DrillJdbc41Connection

交互窗口-sqlline

1
2
3
4
5
6
7
8
9
10
#  bin/sqlline -u jdbc:drill:zk=local
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Jul 18, 2015 3:48:12 PM org.glassfish.jersey.server.ApplicationHandler initialize
INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 01:25:26...
apache drill 1.0.0
"a drill in the hand is better than two in the bush"
0: jdbc:drill:zk=local>

0: jdbc:drill:zk=local> !quit
Closing: org.apache.drill.jdbc.DrillJdbc41Factory$DrillJdbc41Connection

后台进程方式运行drill

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# bin/drillbit.sh start
starting drillbit, logging to /usr/local/apache-drill-1.0.0/log/drillbit.out

# cat drillbit.out
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000006e0000000, 4294967296, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 4294967296 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /usr/local/apache-drill-1.0.0/hs_err_pid5344.log
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000006e0000000, 4294967296, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 4294967296 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /usr/local/apache-drill-1.0.0/hs_err_pid5524.log

增加内存或者修改
DRILL_MAX_DIRECT_MEMORY="2G" default: 8g
DRILL_HEAP="1G" default: 4g

# bin/drillbit.sh start
starting drillbit, logging to /usr/local/apache-drill-1.0.0/log/drillbit.out

# ps auwx |grep dirll #验证后台进程

# jps -lm
2099 org.apache.hadoop.hdfs.tools.DFSZKFailoverController
1610 org.apache.zookeeper.server.quorum.QuorumPeerMain /usr/local/ zookeeper-3.4.6/bin/../conf/zoo.cfg
6105 sun.tools.jps.Jps -lm
1775 org.apache.hadoop.hdfs.server.namenode.NameNode
5936 org.apache.drill.exec.server.Drillbit
1962 org.apache.hadoop.hdfs.qjournal.server.JournalNode

第一个是drillbit的后台进程, 第二个是使用sqlline或者dril-embbed启动的客户端进程

Storage Plugin

http://drill.apache.org/docs/plugin-configuration-basics/

默认只有cp和dfs是enable的. 在Diabled Storage Plugins中点击某个插件的Enable, 就可以使用这个存储插件了!

1、添加HDFS插件

点击Create,在Configuration中输入hdfs的存储插件配置信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
{
"type": "file",
"enabled": true,
"connection": "hdfs://itr-mastertest01:9000/",
"workspaces": {
"root": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null
}
},
"formats": {
"csv": {
"type": "text",
"extensions": [
"csv"
],
"delimiter": ","
},
"tsv": {
"type": "text",
"extensions": [
"tsv"
],
"delimiter": "\t"
},
"parquet": {
"type": "parquet"
}
}
}

2、设置dfs/hdfs工作目录

  • dfs

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    {
    "type": "file",
    "enabled": true,
    "connection": "file:///",
    "workspaces": {
    "root": {
    "location": "/",
    "writable": false,
    "defaultInputFormat": null
    },
    "tmp": {
    "location": "/tmp",
    "writable": true,
    "defaultInputFormat": null
    },
    "work": {
    "location": "/usr/local/apache-drill-1.0.0/",
    "writable": true,
    "defaultInputFormat": null
    }
    },
    "formats": {
    "csv": {
    "type": "text",
    "extensions": [
    "csv"
    ],
    "delimiter": ","
    },
    "tsv": {
    "type": "text",
    "extensions": [
    "tsv"
    ],
    "delimiter": "\t"
    },
    "parquet": {
    "type": "parquet"
    }
    }
    }
  • hdfs

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    {
    "type": "file",
    "enabled": true,
    "connection": "hdfs://mycluster/",
    "workspaces": {
    "root": {
    "location": "/",
    "writable": true,
    "defaultInputFormat": null
    }
    },
    "formats": {
    "csv": {
    "type": "text",
    "extensions": [
    "csv"
    ],
    "delimiter": ","
    },
    "tsv": {
    "type": "text",
    "extensions": [
    "tsv"
    ],
    "delimiter": "\t"
    },
    "parquet": {
    "type": "parquet"
    }
    }
    }
    对于hdfs也可以自定义一个自己的工作空间比如work=/user/hive/. 则定位到/user/hive/下, 直接使用hdfs.work进行查询
  • hive

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    {
    "type": "hive",
    "enabled": true,
    "configProps": {
    "hive.metastore.uris": "thrift://itr-mastertest02:9083",
    "javax.jdo.option.ConnectionURL": "jdbc:mysql://itr-mastertest02:3306/hive?characterEncoding=UTF-8",
    "hive.metastore.warehouse.dir": "/user/hive/warehouse",
    "fs.default.name": "hdfs://mycluster",
    "hive.metastore.sasl.enabled": "false"
    }
    }
  • hbase

    1
    2
    3
    4
    5
    6
    7
    8
    9
    {
    "type": "hbase",
    "config": {
    "hbase.zookeeper.quorum": "itr-mastertest01,itr-mastertest02,itr-nodetest01",
    "hbase.zookeeper.property.clientPort": "2181"
    },
    "size.calculator.enabled": false,
    "enabled": true
    }
  • rdbms

    1
    2
    3
    4
    5
    6
    7
    8
    {
    "type": "jdbc",
    "driver": "com.mysql.jdbc.Driver",
    "url": "jdbc:mysql://itr-mastertest01:3306",
    "username": "root",
    "password": "admin",
    "enabled": true
    }

3、测试dfs/hdfs

测试部分放到分布式中

分布式

1、修改配置文件

1
2
3
4
5
# tail -4 drill-override.conf  
drill.exec: {
cluster-id: "drillbits1",
zk.connect: "itr-mastertest01:2181,itr-mastertest02:2181,itr-nodetest01:2181"
}

每台节点的cluster-id都是一样的. 保证了所有的节点组成一个集群.把安装目录复制到所有节点相同目录下!

2、启动集群

  • 然后在每台机器上都启动bin/drillbit.sh start
  • 随便访问任意一台机器的8047端口, 都可以列出集群中的所有drill服务,并且看到节点

3、客户端连接

Drill提供了一些工具, 包括第三方工具也提供了访问Drill数据的方法.
主要是Drill和其他SQL DB一样提供了一个ODBC Driver.
参考: https://drill.apache.org/docs/interfaces-introduction/
例:jdbc:drill:zk=10.10.100.30:5181,10.10.100.31:5181,10.10.100.32:5181/drill/drillbits1;schema=hive

  • 连接本地ZK

    1
    bin/sqlline -u jdbc:drill:zk=local
  • 连接ZK集群(sqlline)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    # drill/bin/sqlline -u jdbc:drill:zk=itr-mastertest01:2181,itr-mastertest02:2181,itr-nodetest01:2181

    `如果是集群模式, 也可以不跟上zk地址: bin/sqlline -u jdbc:drill:zk 会自动读取drill-override.conf的配置`

    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select employee_id,full_name from cp.`employee.json` limit 2;
    +--------------+------------------+
    | employee_id | full_name |
    +--------------+------------------+
    | 1 | Sheri Nowmer |
    | 2 | Derrick Whelply |
    +--------------+------------------+
    2 rows selected (0.457 seconds)

dfs源

1
2
3
4
5
6
7
8
9
10
11
12
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from dfs.`sample-data/region.parquet` limit 2;
Error: PARSE ERROR: From line 1, column 15 to line 1, column 17: Table 'dfs.sample-data/region.parquet' not found
[Error Id: 9d20015d-023a-4d06-80d2-78760a037aed on itr-mastertest02:31010] (state=,code=0)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from dfs.work.`sample-data/region.parquet` limit 2;
+--------------+----------+-----------------------+
| R_REGIONKEY | R_NAME | R_COMMENT |
+--------------+----------+-----------------------+
| 0 | AFRICA | lar deposits. blithe |
| 1 | AMERICA | hs use ironic, even |
+--------------+----------+-----------------------+
2 rows selected (0.517 seconds)

hdfs源

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
# hadoop fs -ls /data         
Found 1 items
-rw-r--r-- 2 root supergroup 455 2015-07-18 22:35 /data/regionsSF.parquet

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select count(*) from hdfs.`/data`;
+---------+
| EXPR$0 |
+---------+
| 5 |
+---------+
1 row selected (0.773 seconds)
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select count(*) from hdfs.`/data/regionsSF.parquet`;
+---------+
| EXPR$0 |
+---------+
| 5 |
+---------+
1 row selected (0.146 seconds)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select count(*) from hdfs.`hdfs://mycluster/data/regionsSF.parquet`;
+---------+
| EXPR$0 |
+---------+
| 5 |
+---------+
1 row selected (0.173 seconds)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> use hdfs.work;
+-------+----------------------------------------+
| ok | summary |
+-------+----------------------------------------+
| true | Default schema changed to [hdfs.work] |
+-------+----------------------------------------+
1 row selected (0.095 seconds)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> use sys;
+-------+----------------------------------+
| ok | summary |
+-------+----------------------------------+
| true | Default schema changed to [sys] |
+-------+----------------------------------+
1 row selected (0.136 seconds)
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> show tables;
+---------------+-------------+
| TABLE_SCHEMA | TABLE_NAME |
+---------------+-------------+
| sys | boot |
| sys | drillbits |
| sys | memory |
| sys | options |
| sys | threads |
| sys | version |
+---------------+-------------+
6 rows selected (0.138 seconds)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> use hdfs;
+-------+-----------------------------------+
| ok | summary |
+-------+-----------------------------------+
| true | Default schema changed to [hdfs] |
+-------+-----------------------------------+
1 row selected (0.094 seconds)
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> use tmp;
+-------+---------------------------------------+
| ok | summary |
+-------+---------------------------------------+
| true | Default schema changed to [hdfs.tmp] |
+-------+---------------------------------------+
1 row selected (0.1 seconds)

hive 数据源

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> use hive;
+-------+-----------------------------------+
| ok | summary |
+-------+-----------------------------------+
| true | Default schema changed to [hive] |
+-------+-----------------------------------+
1 row selected (0.804 seconds)
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> show tables;
+---------------+---------------------+
| TABLE_SCHEMA | TABLE_NAME |
+---------------+---------------------+
| hive.default | order_test |
| hive.default | order_test_parquet |
+---------------+---------------------+
2 rows selected (0.462 seconds)

查询报错:
Error: SYSTEM ERROR: java.net.UnknownHostException: mycluster
[Error Id: 10ba19e9-35fd-4457-8fa5-8c6f41313b3a on itr-mastertest02:31010] (state=,code=0)
select count(*) from hdfs.`testdata/db_case1/order_created`;
Error: PARSE ERROR: From line 1, column 22 to line 1, column 25: Table 'hdfs.testdata/db_case1/order_created' not found

解决:
# vim drill-env.sh
export HADOOP_HOME="/usr/local/hadoop"

#text table
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from hive.`order_test`;
+-----------------+-----------------------------+
| ordernumber | event_time |
+-----------------+-----------------------------+
| 10703007267488 | 2014-05-01 06:01:12.334+01 |
| 10101043505096 | 2014-05-01 07:28:12.342+01 |
| 10103043509747 | 2014-05-01 07:50:12.33+01 |
| 10103043501575 | 2014-05-01 09:27:12.33+01 |
| 10104043514061 | 2014-05-01 09:03:12.324+01 |
+-----------------+-----------------------------+
5 rows selected (0.819 seconds)

#parquet table
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from hive.`order_test_parquet`;
+-----------------+-----------------------------+
| ordernumber | event_time |
+-----------------+-----------------------------+
| 10703007267488 | 2014-05-01 06:01:12.334+01 |
| 10101043505096 | 2014-05-01 07:28:12.342+01 |
| 10103043509747 | 2014-05-01 07:50:12.33+01 |
| 10103043501575 | 2014-05-01 09:27:12.33+01 |
| 10104043514061 | 2014-05-01 09:03:12.324+01 |
+-----------------+-----------------------------+
5 rows selected (0.522 seconds)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> show databases;
+---------------------+
| SCHEMA_NAME |
+---------------------+
| INFORMATION_SCHEMA |
| cp.default |
| dfs.default |
| dfs.root |
| dfs.tmp |
| dfs.work |
| hdfs.default |
| hdfs.root |
| hdfs.tmp |
| hdfs.work |
| hive.default |
| hive.hsu |
| sys |
+---------------------+
13 rows selected (0.14 seconds)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select count(*) from hive.`default`.order_test;
+---------+
| EXPR$0 |
+---------+
| 5 |
+---------+
1 row selected (0.27 seconds)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> create table hive.`default`.test as select * from hive.`default`.order_test;
Error: PARSE ERROR: Unable to create or drop tables/views. Schema [hive. default] is immutable.

参考:http://drill.apache.org/docs/querying-hive/

hbase数据源

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
hbase(main):001:0> list
TABLE
ABC
SYSTEM.CATALOG
SYSTEM.SEQUENCE
SYSTEM.STATS
TEST
TEST01
6 row(s) in 2.9470 seconds

=> ["ABC", "SYSTEM.CATALOG", "SYSTEM.SEQUENCE", "SYSTEM.STATS", "TEST", "TEST01"]

hbase(main):008:0> scan 'TEST01'
ROW COLUMN+CELL
user1|ts1 column=sf:c1, timestamp=1426423837228, value=sku1
1 row(s) in 0.0840 seconds

hbase(main):009:0> scan 'TEST'
ROW COLUMN+CELL
\x80\x00\x00\x01 column=0:MYCOLUMN, timestamp=1426421982186, value=Hello
\x80\x00\x00\x01 column=0:_0, timestamp=1426421982186, value=
\x80\x00\x00\x02 column=0:MYCOLUMN, timestamp=1426421982186, value=World!
\x80\x00\x00\x02 column=0:_0, timestamp=1426421982186, value=
2 row(s) in 0.0290 seconds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> use hbase;
+-------+------------------------------------+
| ok | summary |
+-------+------------------------------------+
| true | Default schema changed to [hbase] |
+-------+------------------------------------+
1 row selected (2.393 seconds)
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> show tables;
+---------------+------------------+
| TABLE_SCHEMA | TABLE_NAME |
+---------------+------------------+
| hbase | ABC |
| hbase | SYSTEM.CATALOG |
| hbase | SYSTEM.SEQUENCE |
| hbase | SYSTEM.STATS |
| hbase | TEST |
| hbase | TEST01 |
+---------------+------------------+
6 rows selected (5.415 seconds)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from hbase.TEST01;
+-------------+-------+--------------------+
| row_key | lf | sf |
+-------------+-------+--------------------+
| [B@a32b2dc | null | {"c1":"c2t1MQ=="} |
+-------------+-------+--------------------+
1 row selected (1.825 seconds)
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select count(*) from hbase.TEST01;
+---------+
| EXPR$0 |
+---------+
| 1 |
+---------+
1 row selected (2.244 seconds)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select cast(row_key as VarChar(20)),lf,sf from hbase.TEST01;
+------------+-----+--------------------+
| EXPR$0 | lf | sf |
+------------+-----+--------------------+
| user1|ts1 | {} | {"c1":"c2t1MQ=="} |
+------------+-----+--------------------+
1 row selected (2.305 seconds)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select count(*) from hbase.`SYSTEM.CATALOG`;
+---------+
| EXPR$0 |
+---------+
| 79 |
+---------+
1 row selected (0.815 seconds)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select CONVERT_FROM(row_key,'UTF8') from hbase.TEST;
+---------+
| EXPR$0 |
+---------+
| ? |
| ? |
+---------+
2 rows selected (0.699 seconds)
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from hbase.TEST;
+--------------+----------------------------------+
| row_key | 0 |
+--------------+----------------------------------+
| [B@5b6c4eba | {"MYCOLUMN":"SGVsbG8=","_0":""} |
| [B@23a4c43a | {"MYCOLUMN":"V29ybGQh","_0":""} |
+--------------+----------------------------------+
2 rows selected (0.554 seconds)

MYSQL

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
0: jdbc:drill:zk=bigdata-test-server-02:2181> use mysql.test;
+-------+-----------------------------------------+
| ok | summary |
+-------+-----------------------------------------+
| true | Default schema changed to [mysql.test] |
+-------+-----------------------------------------+
1 row selected (0.35 seconds)
0: jdbc:drill:zk=bigdata-test-server-02:2181> show tables;
+---------------+-------------+
| TABLE_SCHEMA | TABLE_NAME |
+---------------+-------------+
| mysql.test | books |
| mysql.test | t1 |
+---------------+-------------+
2 rows selected (0.418 seconds)

0: jdbc:drill:zk=bigdata-test-server-02:2181> select * from t1;
+----+
| a |
+----+
| 1 |
| 2 |
+----+
2 rows selected (1.36 seconds)

0: jdbc:drill:zk=bigdata-test-server-02:2181> describe books;
+---------------+--------------------+--------------+
| COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
+---------------+--------------------+--------------+
| id | INTEGER | YES |
| isbn | CHARACTER VARYING | YES |
| category | CHARACTER VARYING | YES |
| publish_date | TIMESTAMP | NO |
| publisher | CHARACTER VARYING | YES |
| price | REAL | YES |
+---------------+--------------------+--------------+
6 rows selected (0.239 seconds)
1
2
3
4
5
6
7
8
# jps -lm
2099 org.apache.hadoop.hdfs.tools.DFSZKFailoverController
14871 sun.tools.jps.Jps -lm
1610 org.apache.zookeeper.server.quorum.QuorumPeerMain /usr/local/zookeeper-3.4.6/bin/../conf/zoo.cfg
14613 org.apache.hadoop.hbase.master.HMaster start
11327 org.apache.drill.exec.server.Drillbit
1775 org.apache.hadoop.hdfs.server.namenode.NameNode
1962 org.apache.hadoop.hdfs.qjournal.server.JournalNode

Drill Default Input Format

参考:http://drill.apache.org/docs/drill-default-input-format/

You can define one default input format per workspace. If you do not define a default input format, and Drill cannot detect the file format, the query fails. You can define a default input format for any of the file types that Drill supports. Currently, Drill supports the following types:

  • Avro
  • CSV, TSV, or PSV
  • Parquet
  • JSON
  • MapR-DB*

Querying Compressed Files

参考:http://drill.apache.org/docs/querying-plain-text-files/#query-the-gz-file-directly

SELECT COLUMNS[0],
COLUMNS[1],
COLUMNS[2]
FROM dfs./Users/drilluser/Downloads/googlebooks-eng-all-5gram-20120701-zo. ts.gz
WHERE ((columns[0] = ‘Zoological Journal of the Linnean’)
AND (columns[2] > 250))
LIMIT 10;

Querying HBase

http://drill.apache.org/docs/querying-hbase/

Querying the INFORMATION SCHEMA

You can query the following INFORMATION_SCHEMA tables:

  • SCHEMATA
  • CATALOGS
  • TABLES
  • COLUMNS
  • VIEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> SELECT CATALOG_NAME, SCHEMA_NAME as all_my_data_sources FROM INFORMATION_SCHEMA.SCHEMATA ORDER BY SCHEMA_NAME;
+---------------+----------------------+
| CATALOG_NAME | all_my_data_sources |
+---------------+----------------------+
| DRILL | INFORMATION_SCHEMA |
| DRILL | cp.default |
| DRILL | dfs.default |
| DRILL | dfs.root |
| DRILL | dfs.tmp |
| DRILL | dfs.work |
| DRILL | hbase |
| DRILL | hdfs.default |
| DRILL | hdfs.root |
| DRILL | hive.default |
| DRILL | hive.hsu |
| DRILL | sys |
+---------------+----------------------+
12 rows selected (0.256 seconds)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> SHOW DATABASES;
+---------------------+
| SCHEMA_NAME |
+---------------------+
| INFORMATION_SCHEMA |
| cp.default |
| dfs.default |
| dfs.root |
| dfs.tmp |
| dfs.work |
| hbase |
| hdfs.default |
| hdfs.root |
| hive.default |
| hive.hsu |
| sys |
+---------------------+
12 rows selected (0.383 seconds)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE FROM INFORMATION_SCHEMA.`TABLES` ORDER BY TABLE_NAME DESC;
+---------------------+---------------------+-------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE |
+---------------------+---------------------+-------------+
| sys | version | TABLE |
| sys | threads | TABLE |
| hive.default | order_test_parquet | TABLE |
| hive.default | order_test | TABLE |
| sys | options | TABLE |
| sys | memory | TABLE |
| sys | drillbits | TABLE |
| sys | boot | TABLE |
| INFORMATION_SCHEMA | VIEWS | TABLE |
| hbase | TEST01 | TABLE |
| hbase | TEST | TABLE |
| INFORMATION_SCHEMA | TABLES | TABLE |
| hbase | SYSTEM.STATS | TABLE |
| hbase | SYSTEM.SEQUENCE | TABLE |
| hbase | SYSTEM.CATALOG | TABLE |
| INFORMATION_SCHEMA | SCHEMATA | TABLE |
| INFORMATION_SCHEMA | COLUMNS | TABLE |
| INFORMATION_SCHEMA | CATALOGS | TABLE |
| hbase | ABC | TABLE |
+---------------------+---------------------+-------------+
19 rows selected (1.125 seconds)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from hive.`default`.order_test;
+-----------------+-----------------------------+
| ordernumber | event_time |
+-----------------+-----------------------------+
| 10703007267488 | 2014-05-01 06:01:12.334+01 |
1 rows selected (0.253 seconds)
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> SELECT COLUMN_NAME, DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'order_test' and TABLE_SCHEMA = 'hive.default' AND COLUMN_NAME LIKE '%ordernumber';
+--------------+------------+
| COLUMN_NAME | DATA_TYPE |
+--------------+------------+
| ordernumber | VARCHAR |
+--------------+------------+
1 row selected (1.481 seconds)

Querying System Tables

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> show databases;
+---------------------+
| SCHEMA_NAME |
+---------------------+
| INFORMATION_SCHEMA |
| cp.default |
| dfs.default |
| dfs.root |
| dfs.tmp |
| dfs.work |
| hbase |
| hdfs.default |
| hdfs.root |
| hive.default |
| hive.hsu |
| sys |
+---------------------+
12 rows selected (0.137 seconds)
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> use sys;
+-------+----------------------------------+
| ok | summary |
+-------+----------------------------------+
| true | Default schema changed to [sys] |
+-------+----------------------------------+
1 row selected (0.089 seconds)
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> show tables;
+---------------+-------------+
| TABLE_SCHEMA | TABLE_NAME |
+---------------+-------------+
| sys | boot |
| sys | drillbits |
| sys | memory |
| sys | options |
| sys | threads |
| sys | version |
+---------------+-------------+
6 rows selected (0.127 seconds)
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from drillbits;
+-------------------+------------+---------------+------------+----------+
| hostname | user_port | control_port | data_port | current |
+-------------------+------------+---------------+------------+----------+
| itr-mastertest02 | 31010 | 31011 | 31012 | true |
| itr-mastertest01 | 31010 | 31011 | 31012 | false |
| itr-nodetest02 | 31010 | 31011 | 31012 | false |
| itr-nodetest01 | 31010 | 31011 | 31012 | false |
+-------------------+------------+---------------+------------+----------+
4 rows selected (2.484 seconds)
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from options where type='SYSTEM' limit 10;
+-------------------------------------------------+----------+---------+----------+-------------+-------------+-----------+------------+
| name | kind | type | status | num_val | string_val | bool_val | float_val |
+-------------------------------------------------+----------+---------+----------+-------------+-------------+-----------+------------+
| drill.exec.functions.cast_empty_string_to_null | BOOLEAN | SYSTEM | DEFAULT | null | null | false | null |
| drill.exec.storage.file.partition.column.label | STRING | SYSTEM | DEFAULT | null | dir | null | null |
| exec.errors.verbose | BOOLEAN | SYSTEM | DEFAULT | null | null | false | null |
| exec.java_compiler | STRING | SYSTEM | DEFAULT | null | DEFAULT | null | null |
| exec.java_compiler_debug | BOOLEAN | SYSTEM | DEFAULT | null | null | true | null |
| exec.java_compiler_janino_maxsize | LONG | SYSTEM | DEFAULT | 262144 | null | null | null |
| exec.max_hash_table_size | LONG | SYSTEM | DEFAULT | 1073741824 | null | null | null |
| exec.min_hash_table_size | LONG | SYSTEM | DEFAULT | 65536 | null | null | null |
| exec.queue.enable | BOOLEAN | SYSTEM | DEFAULT | null | null | false | null |
| exec.queue.large | LONG | SYSTEM | DEFAULT | 10 | null | null | null |
+-------------------------------------------------+----------+---------+----------+-------------+-------------+-----------+------------+
10 rows selected (2.842 seconds)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from boot limit 10;
+--------------------------------------+----------+-------+---------+------------+-------------------------+-----------+------------+
| name | kind | type | status | num_val | string_val | bool_val | float_val |
+--------------------------------------+----------+-------+---------+------------+-------------------------+-----------+------------+
| awt.toolkit | STRING | BOOT | BOOT | null | "sun.awt.X11.XToolkit" | null | null |
| drill.client.supports-complex-types | BOOLEAN | BOOT | BOOT | null | null | true | null |
| drill.exec.buffer.size | STRING | BOOT | BOOT | null | "6" | null | null |
| drill.exec.buffer.spooling.delete | BOOLEAN | BOOT | BOOT | null | null | true | null |
| drill.exec.buffer.spooling.size | LONG | BOOT | BOOT | 100000000 | null | null | null |
| drill.exec.cluster-id | STRING | BOOT | BOOT | null | "drillbits1" | null | null |
| drill.exec.compile.cache_max_size | LONG | BOOT | BOOT | 1000 | null | null | null |
| drill.exec.compile.compiler | STRING | BOOT | BOOT | null | "DEFAULT" | null | null |
| drill.exec.compile.debug | BOOLEAN | BOOT | BOOT | null | null | true | null |
| drill.exec.compile.janino_maxsize | LONG | BOOT | BOOT | 262144 | null | null | null |
+--------------------------------------+----------+-------+---------+------------+-------------------------+-----------+------------+
10 rows selected (0.138 seconds)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from threads;
+-------------------+------------+----------------+---------------+
| hostname | user_port | total_threads | busy_threads |
+-------------------+------------+----------------+---------------+
| itr-mastertest02 | 31010 | 36 | 34 |
| itr-nodetest02 | 31010 | 34 | 31 |
| itr-nodetest01 | 31010 | 27 | 27 |
| itr-mastertest01 | 31010 | 37 | 32 |
+-------------------+------------+----------------+---------------+
4 rows selected (9.853 seconds)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from memory;
+-------------------+------------+---------------+-------------+-----------------+---------------------+-------------+
| hostname | user_port | heap_current | heap_max | direct_current | jvm_direct_current | direct_max |
+-------------------+------------+---------------+-------------+-----------------+---------------------+-------------+
| itr-mastertest02 | 31010 | 347630584 | 1073741824 | 11798941 | 33931076 | 2147483648 |
| itr-nodetest02 | 31010 | 182035960 | 1073741824 | 7750365 | 16813963 | 2147483648 |
| itr-nodetest01 | 31010 | 274726896 | 1073741824 | 7750365 | 16778020 | 2147483648 |
| itr-mastertest01 | 31010 | 455562488 | 1073741824 | 7751085 | 17151349 | 2147483648 |
+-------------------+------------+---------------+-------------+-----------------+---------------------+-------------+
4 rows selected (0.904 seconds)

监控WebUI

http://drill.apache.org/docs/monitoring-and-canceling-queries-in-the-drill-web-ui/

log-and-debug

http://drill.apache.org/docs/log-and-debug-introduction/

troubleshooting

http://drill.apache.org/docs/troubleshooting/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> ALTER SESSION SET `exec.errors.verbose`=true;
+-------+-------------------------------+
| ok | summary |
+-------+-------------------------------+
| true | exec.errors.verbose updated. |
+-------+-------------------------------+
1 row selected (4.051 seconds)
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> SELECT hostname FROM sys.drillbits WHERE `current` = true;
+-----------------+
| hostname |
+-----------------+
| itr-nodetest02 |
+-----------------+
1 row selected (1.246 seconds)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> SELECT commit_id FROM sys.version;
+-------------------------------------------+
| commit_id |
+-------------------------------------------+
| 13a0c3c70991f29ca240adcac32b11b0d4fef21e |
+-------------------------------------------+
1 row selected (1.624 seconds)

注意:

当指定的zk是一个全新的ZK, 之前如果使用zk=local在本次新的zk会话中
Storage-Plugin的信息都丢失.
因为我们指定的zookeeper集群是全新的. 所以drill还没有往里面写入任何数据.
这是因为在web ui上对Storage Plugin进行update或者create的数据都会写入到对应的zookeeper节点上!
当我们在界面上update hive, 并且enable后, 通过show databases就可以看到hive里的表.

  • 本文利用了一天时间完成,从零开始部署使用,转载请注明出处www.itweet.cn

参考:http://www.yankay.com/google-dremel-rationale/

相关资源:http://geek.csdn.net/news/detail/32952
http://www.quora.com/Apache-Drill