Apache 软件基金会发起了一项名为“Drill”的开源项目。Apache Drill 实现了 Google’s Dremel. Apache Drill 在基于 SQL 的数据分析和商业智能(BI)上引入了 JSON 文件模型,这使得用户能查询固定架构,演化架构,以及各种格式和数据存储中的模式无关(schema-free)数据。该体系架构中关系查询引擎和数据库的构建是有先决条件的,即假设所有数据都有一个简单的静态架构。 Apache Drill的架构是独一无二的。它是唯一一个支持复杂和无模式数据的柱状执行 引擎(columnar execution engine),也是唯一一个能在查询执行期间进行数据驱动 查询(和重新编译,也称之为 schema discovery)的执行引擎(execution engine)。这些独一无二的性能使得 Apache Drill 在 JSON 文件模式下能实现记录断点性能(record-breaking performance)。
数据结构:
兼容已有的 SQL 环境和 Apache Hive:
支持多框架:
单机模式 交互窗口-drill-embedded 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 # wget http://getdrill.org/drill/download/apache-drill-1.0.0.tar.gz # tar zxf apache-drill-1.0.0.tar.gz # cd apache-drill-1.0.0 # bin/drill-embedded Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0 Jul 18, 2015 3:41:25 PM org.glassfish.jersey.server.ApplicationHandler initialize INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 01:25:26... apache drill 1.0.0 "drill baby drill" 0: jdbc:drill:zk=local> use cp; +-------+---------------------------------+ | ok | summary | +-------+---------------------------------+ | true | Default schema changed to [cp] | +-------+---------------------------------+ 1 row selected (0.186 seconds) 0: jdbc:drill:zk=local> select employee_id,full_name,first_name,last_name from `employee.json` limit 2; +--------------+------------------+-------------+------------+ | employee_id | full_name | first_name | last_name | +--------------+------------------+-------------+------------+ | 1 | Sheri Nowmer | Sheri | Nowmer | | 2 | Derrick Whelply | Derrick | Whelply | +--------------+------------------+-------------+------------+ 2 rows selected (0.426 seconds) 0: jdbc:drill:zk=local> !quit Closing: org.apache.drill.jdbc.DrillJdbc41Factory$DrillJdbc41Connection
交互窗口-sqlline 1 2 3 4 5 6 7 8 9 10 # bin/sqlline -u jdbc:drill:zk=local Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0 Jul 18, 2015 3:48:12 PM org.glassfish.jersey.server.ApplicationHandler initialize INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 01:25:26... apache drill 1.0.0 "a drill in the hand is better than two in the bush" 0: jdbc:drill:zk=local> 0: jdbc:drill:zk=local> !quit Closing: org.apache.drill.jdbc.DrillJdbc41Factory$DrillJdbc41Connection
后台进程方式运行drill 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 # bin/drillbit.sh start starting drillbit, logging to /usr/local/apache-drill-1.0.0/log/drillbit.out # cat drillbit.out Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000006e0000000, 4294967296, 0) failed; error='Cannot allocate memory' (errno=12) # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (malloc) failed to allocate 4294967296 bytes for committing reserved memory. # An error report file with more information is saved as: # /usr/local/apache-drill-1.0.0/hs_err_pid5344.log Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000006e0000000, 4294967296, 0) failed; error='Cannot allocate memory' (errno=12) # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (malloc) failed to allocate 4294967296 bytes for committing reserved memory. # An error report file with more information is saved as: # /usr/local/apache-drill-1.0.0/hs_err_pid5524.log 增加内存或者修改 DRILL_MAX_DIRECT_MEMORY="2G" default: 8g DRILL_HEAP="1G" default: 4g # bin/drillbit.sh start starting drillbit, logging to /usr/local/apache-drill-1.0.0/log/drillbit.out # ps auwx |grep dirll #验证后台进程 # jps -lm 2099 org.apache.hadoop.hdfs.tools.DFSZKFailoverController 1610 org.apache.zookeeper.server.quorum.QuorumPeerMain /usr/local/ zookeeper-3.4.6/bin/../conf/zoo.cfg 6105 sun.tools.jps.Jps -lm 1775 org.apache.hadoop.hdfs.server.namenode.NameNode 5936 org.apache.drill.exec.server.Drillbit 1962 org.apache.hadoop.hdfs.qjournal.server.JournalNode
第一个是drillbit的后台进程, 第二个是使用sqlline或者dril-embbed启动的客户端进程
Storage Plugin
http://drill.apache.org/docs/plugin-configuration-basics/
默认只有cp和dfs是enable的. 在Diabled Storage Plugins中点击某个插件的Enable, 就可以使用这个存储插件了!
1、添加HDFS插件 点击Create,在Configuration中输入hdfs的存储插件配置信息:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 { "type": "file", "enabled": true, "connection": "hdfs://itr-mastertest01:9000/", "workspaces": { "root": { "location": "/tmp", "writable": true, "defaultInputFormat": null } }, "formats": { "csv": { "type": "text", "extensions": [ "csv" ], "delimiter": "," }, "tsv": { "type": "text", "extensions": [ "tsv" ], "delimiter": "\t" }, "parquet": { "type": "parquet" } } }
2、设置dfs/hdfs工作目录
dfs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 { "type": "file", "enabled": true, "connection": "file:///", "workspaces": { "root": { "location": "/", "writable": false, "defaultInputFormat": null }, "tmp": { "location": "/tmp", "writable": true, "defaultInputFormat": null }, "work": { "location": "/usr/local/apache-drill-1.0.0/", "writable": true, "defaultInputFormat": null } }, "formats": { "csv": { "type": "text", "extensions": [ "csv" ], "delimiter": "," }, "tsv": { "type": "text", "extensions": [ "tsv" ], "delimiter": "\t" }, "parquet": { "type": "parquet" } } }
hdfs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 { "type": "file", "enabled": true, "connection": "hdfs://mycluster/", "workspaces": { "root": { "location": "/", "writable": true, "defaultInputFormat": null } }, "formats": { "csv": { "type": "text", "extensions": [ "csv" ], "delimiter": "," }, "tsv": { "type": "text", "extensions": [ "tsv" ], "delimiter": "\t" }, "parquet": { "type": "parquet" } } } 对于hdfs也可以自定义一个自己的工作空间比如work=/user/hive/. 则定位到/user/hive/下, 直接使用hdfs.work进行查询
hive
1 2 3 4 5 6 7 8 9 10 11 { "type": "hive", "enabled": true, "configProps": { "hive.metastore.uris": "thrift://itr-mastertest02:9083", "javax.jdo.option.ConnectionURL": "jdbc:mysql://itr-mastertest02:3306/hive?characterEncoding=UTF-8", "hive.metastore.warehouse.dir": "/user/hive/warehouse", "fs.default.name": "hdfs://mycluster", "hive.metastore.sasl.enabled": "false" } }
hbase
1 2 3 4 5 6 7 8 9 { "type": "hbase", "config": { "hbase.zookeeper.quorum": "itr-mastertest01,itr-mastertest02,itr-nodetest01", "hbase.zookeeper.property.clientPort": "2181" }, "size.calculator.enabled": false, "enabled": true }
rdbms
1 2 3 4 5 6 7 8 { "type": "jdbc", "driver": "com.mysql.jdbc.Driver", "url": "jdbc:mysql://itr-mastertest01:3306", "username": "root", "password": "admin", "enabled": true }
3、测试dfs/hdfs 测试部分放到分布式中
分布式 1、修改配置文件 1 2 3 4 5 # tail -4 drill-override.conf drill.exec: { cluster-id: "drillbits1", zk.connect: "itr-mastertest01:2181,itr-mastertest02:2181,itr-nodetest01:2181" }
每台节点的cluster-id都是一样的. 保证了所有的节点组成一个集群.把安装目录复制到所有节点相同目录下!
2、启动集群
然后在每台机器上都启动bin/drillbit.sh start
随便访问任意一台机器的8047端口, 都可以列出集群中的所有drill服务,并且看到节点
3、客户端连接 Drill提供了一些工具, 包括第三方工具也提供了访问Drill数据的方法. 主要是Drill和其他SQL DB一样提供了一个ODBC Driver. 参考: https://drill.apache.org/docs/interfaces-introduction/ 例:jdbc:drill:zk=10.10.100.30:5181,10.10.100.31:5181,10.10.100.32:5181/drill/drillbits1;schema=hive
连接本地ZK
1 bin/sqlline -u jdbc:drill:zk=local
连接ZK集群(sqlline)
1 2 3 4 5 6 7 8 9 10 11 12 # drill/bin/sqlline -u jdbc:drill:zk=itr-mastertest01:2181,itr-mastertest02:2181,itr-nodetest01:2181 `如果是集群模式, 也可以不跟上zk地址: bin/sqlline -u jdbc:drill:zk 会自动读取drill-override.conf的配置` 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select employee_id,full_name from cp.`employee.json` limit 2; +--------------+------------------+ | employee_id | full_name | +--------------+------------------+ | 1 | Sheri Nowmer | | 2 | Derrick Whelply | +--------------+------------------+ 2 rows selected (0.457 seconds)
dfs源 1 2 3 4 5 6 7 8 9 10 11 12 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from dfs.`sample-data/region.parquet` limit 2; Error: PARSE ERROR: From line 1, column 15 to line 1, column 17: Table 'dfs.sample-data/region.parquet' not found [Error Id: 9d20015d-023a-4d06-80d2-78760a037aed on itr-mastertest02:31010] (state=,code=0) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from dfs.work.`sample-data/region.parquet` limit 2; +--------------+----------+-----------------------+ | R_REGIONKEY | R_NAME | R_COMMENT | +--------------+----------+-----------------------+ | 0 | AFRICA | lar deposits. blithe | | 1 | AMERICA | hs use ironic, even | +--------------+----------+-----------------------+ 2 rows selected (0.517 seconds)
hdfs源 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 # hadoop fs -ls /data Found 1 items -rw-r--r-- 2 root supergroup 455 2015-07-18 22:35 /data/regionsSF.parquet 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select count(*) from hdfs.`/data`; +---------+ | EXPR$0 | +---------+ | 5 | +---------+ 1 row selected (0.773 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select count(*) from hdfs.`/data/regionsSF.parquet`; +---------+ | EXPR$0 | +---------+ | 5 | +---------+ 1 row selected (0.146 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select count(*) from hdfs.`hdfs://mycluster/data/regionsSF.parquet`; +---------+ | EXPR$0 | +---------+ | 5 | +---------+ 1 row selected (0.173 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> use hdfs.work; +-------+----------------------------------------+ | ok | summary | +-------+----------------------------------------+ | true | Default schema changed to [hdfs.work] | +-------+----------------------------------------+ 1 row selected (0.095 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> use sys; +-------+----------------------------------+ | ok | summary | +-------+----------------------------------+ | true | Default schema changed to [sys] | +-------+----------------------------------+ 1 row selected (0.136 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> show tables; +---------------+-------------+ | TABLE_SCHEMA | TABLE_NAME | +---------------+-------------+ | sys | boot | | sys | drillbits | | sys | memory | | sys | options | | sys | threads | | sys | version | +---------------+-------------+ 6 rows selected (0.138 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> use hdfs; +-------+-----------------------------------+ | ok | summary | +-------+-----------------------------------+ | true | Default schema changed to [hdfs] | +-------+-----------------------------------+ 1 row selected (0.094 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> use tmp; +-------+---------------------------------------+ | ok | summary | +-------+---------------------------------------+ | true | Default schema changed to [hdfs.tmp] | +-------+---------------------------------------+ 1 row selected (0.1 seconds)
hive 数据源 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> use hive; +-------+-----------------------------------+ | ok | summary | +-------+-----------------------------------+ | true | Default schema changed to [hive] | +-------+-----------------------------------+ 1 row selected (0.804 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> show tables; +---------------+---------------------+ | TABLE_SCHEMA | TABLE_NAME | +---------------+---------------------+ | hive.default | order_test | | hive.default | order_test_parquet | +---------------+---------------------+ 2 rows selected (0.462 seconds) 查询报错: Error: SYSTEM ERROR: java.net.UnknownHostException: mycluster [Error Id: 10ba19e9-35fd-4457-8fa5-8c6f41313b3a on itr-mastertest02:31010] (state=,code=0) select count(*) from hdfs.`testdata/db_case1/order_created`; Error: PARSE ERROR: From line 1, column 22 to line 1, column 25: Table 'hdfs.testdata/db_case1/order_created' not found 解决: # vim drill-env.sh export HADOOP_HOME="/usr/local/hadoop" #text table 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from hive.`order_test`; +-----------------+-----------------------------+ | ordernumber | event_time | +-----------------+-----------------------------+ | 10703007267488 | 2014-05-01 06:01:12.334+01 | | 10101043505096 | 2014-05-01 07:28:12.342+01 | | 10103043509747 | 2014-05-01 07:50:12.33+01 | | 10103043501575 | 2014-05-01 09:27:12.33+01 | | 10104043514061 | 2014-05-01 09:03:12.324+01 | +-----------------+-----------------------------+ 5 rows selected (0.819 seconds) #parquet table 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from hive.`order_test_parquet`; +-----------------+-----------------------------+ | ordernumber | event_time | +-----------------+-----------------------------+ | 10703007267488 | 2014-05-01 06:01:12.334+01 | | 10101043505096 | 2014-05-01 07:28:12.342+01 | | 10103043509747 | 2014-05-01 07:50:12.33+01 | | 10103043501575 | 2014-05-01 09:27:12.33+01 | | 10104043514061 | 2014-05-01 09:03:12.324+01 | +-----------------+-----------------------------+ 5 rows selected (0.522 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> show databases; +---------------------+ | SCHEMA_NAME | +---------------------+ | INFORMATION_SCHEMA | | cp.default | | dfs.default | | dfs.root | | dfs.tmp | | dfs.work | | hdfs.default | | hdfs.root | | hdfs.tmp | | hdfs.work | | hive.default | | hive.hsu | | sys | +---------------------+ 13 rows selected (0.14 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select count(*) from hive.`default`.order_test; +---------+ | EXPR$0 | +---------+ | 5 | +---------+ 1 row selected (0.27 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> create table hive.`default`.test as select * from hive.`default`.order_test; Error: PARSE ERROR: Unable to create or drop tables/views. Schema [hive. default] is immutable.
参考:http://drill.apache.org/docs/querying-hive/
hbase数据源 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 hbase(main):001:0> list TABLE ABC SYSTEM.CATALOG SYSTEM.SEQUENCE SYSTEM.STATS TEST TEST01 6 row(s) in 2.9470 seconds => ["ABC", "SYSTEM.CATALOG", "SYSTEM.SEQUENCE", "SYSTEM.STATS", "TEST", "TEST01"] hbase(main):008:0> scan 'TEST01' ROW COLUMN+CELL user1|ts1 column=sf:c1, timestamp=1426423837228, value=sku1 1 row(s) in 0.0840 seconds hbase(main):009:0> scan 'TEST' ROW COLUMN+CELL \x80\x00\x00\x01 column=0:MYCOLUMN, timestamp=1426421982186, value=Hello \x80\x00\x00\x01 column=0:_0, timestamp=1426421982186, value= \x80\x00\x00\x02 column=0:MYCOLUMN, timestamp=1426421982186, value=World! \x80\x00\x00\x02 column=0:_0, timestamp=1426421982186, value= 2 row(s) in 0.0290 seconds
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> use hbase; +-------+------------------------------------+ | ok | summary | +-------+------------------------------------+ | true | Default schema changed to [hbase] | +-------+------------------------------------+ 1 row selected (2.393 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> show tables; +---------------+------------------+ | TABLE_SCHEMA | TABLE_NAME | +---------------+------------------+ | hbase | ABC | | hbase | SYSTEM.CATALOG | | hbase | SYSTEM.SEQUENCE | | hbase | SYSTEM.STATS | | hbase | TEST | | hbase | TEST01 | +---------------+------------------+ 6 rows selected (5.415 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from hbase.TEST01; +-------------+-------+--------------------+ | row_key | lf | sf | +-------------+-------+--------------------+ | [B@a32b2dc | null | {"c1":"c2t1MQ=="} | +-------------+-------+--------------------+ 1 row selected (1.825 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select count(*) from hbase.TEST01; +---------+ | EXPR$0 | +---------+ | 1 | +---------+ 1 row selected (2.244 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select cast(row_key as VarChar(20)),lf,sf from hbase.TEST01; +------------+-----+--------------------+ | EXPR$0 | lf | sf | +------------+-----+--------------------+ | user1|ts1 | {} | {"c1":"c2t1MQ=="} | +------------+-----+--------------------+ 1 row selected (2.305 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select count(*) from hbase.`SYSTEM.CATALOG`; +---------+ | EXPR$0 | +---------+ | 79 | +---------+ 1 row selected (0.815 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select CONVERT_FROM(row_key,'UTF8') from hbase.TEST; +---------+ | EXPR$0 | +---------+ | ? | | ? | +---------+ 2 rows selected (0.699 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from hbase.TEST; +--------------+----------------------------------+ | row_key | 0 | +--------------+----------------------------------+ | [B@5b6c4eba | {"MYCOLUMN":"SGVsbG8=","_0":""} | | [B@23a4c43a | {"MYCOLUMN":"V29ybGQh","_0":""} | +--------------+----------------------------------+ 2 rows selected (0.554 seconds)
MYSQL 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 0: jdbc:drill:zk=bigdata-test-server-02:2181> use mysql.test; +-------+-----------------------------------------+ | ok | summary | +-------+-----------------------------------------+ | true | Default schema changed to [mysql.test] | +-------+-----------------------------------------+ 1 row selected (0.35 seconds) 0: jdbc:drill:zk=bigdata-test-server-02:2181> show tables; +---------------+-------------+ | TABLE_SCHEMA | TABLE_NAME | +---------------+-------------+ | mysql.test | books | | mysql.test | t1 | +---------------+-------------+ 2 rows selected (0.418 seconds) 0: jdbc:drill:zk=bigdata-test-server-02:2181> select * from t1; +----+ | a | +----+ | 1 | | 2 | +----+ 2 rows selected (1.36 seconds) 0: jdbc:drill:zk=bigdata-test-server-02:2181> describe books; +---------------+--------------------+--------------+ | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | +---------------+--------------------+--------------+ | id | INTEGER | YES | | isbn | CHARACTER VARYING | YES | | category | CHARACTER VARYING | YES | | publish_date | TIMESTAMP | NO | | publisher | CHARACTER VARYING | YES | | price | REAL | YES | +---------------+--------------------+--------------+ 6 rows selected (0.239 seconds)
1 2 3 4 5 6 7 8 # jps -lm 2099 org.apache.hadoop.hdfs.tools.DFSZKFailoverController 14871 sun.tools.jps.Jps -lm 1610 org.apache.zookeeper.server.quorum.QuorumPeerMain /usr/local/zookeeper-3.4.6/bin/../conf/zoo.cfg 14613 org.apache.hadoop.hbase.master.HMaster start 11327 org.apache.drill.exec.server.Drillbit 1775 org.apache.hadoop.hdfs.server.namenode.NameNode 1962 org.apache.hadoop.hdfs.qjournal.server.JournalNode
参考:http://drill.apache.org/docs/drill-default-input-format/
You can define one default input format per workspace. If you do not define a default input format, and Drill cannot detect the file format, the query fails. You can define a default input format for any of the file types that Drill supports. Currently, Drill supports the following types:
Avro
CSV, TSV, or PSV
Parquet
JSON
MapR-DB*
Querying Compressed Files
参考:http://drill.apache.org/docs/querying-plain-text-files/#query-the-gz-file-directly
SELECT COLUMNS[0], COLUMNS[1], COLUMNS[2] FROM dfs./Users/drilluser/Downloads/googlebooks-eng-all-5gram-20120701-zo.
ts.gz
WHERE ((columns[0] = ‘Zoological Journal of the Linnean’) AND (columns[2] > 250)) LIMIT 10;
Querying HBase
http://drill.apache.org/docs/querying-hbase/
You can query the following INFORMATION_SCHEMA tables:
SCHEMATA
CATALOGS
TABLES
COLUMNS
VIEWS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> SELECT CATALOG_NAME, SCHEMA_NAME as all_my_data_sources FROM INFORMATION_SCHEMA.SCHEMATA ORDER BY SCHEMA_NAME; +---------------+----------------------+ | CATALOG_NAME | all_my_data_sources | +---------------+----------------------+ | DRILL | INFORMATION_SCHEMA | | DRILL | cp.default | | DRILL | dfs.default | | DRILL | dfs.root | | DRILL | dfs.tmp | | DRILL | dfs.work | | DRILL | hbase | | DRILL | hdfs.default | | DRILL | hdfs.root | | DRILL | hive.default | | DRILL | hive.hsu | | DRILL | sys | +---------------+----------------------+ 12 rows selected (0.256 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> SHOW DATABASES; +---------------------+ | SCHEMA_NAME | +---------------------+ | INFORMATION_SCHEMA | | cp.default | | dfs.default | | dfs.root | | dfs.tmp | | dfs.work | | hbase | | hdfs.default | | hdfs.root | | hive.default | | hive.hsu | | sys | +---------------------+ 12 rows selected (0.383 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE FROM INFORMATION_SCHEMA.`TABLES` ORDER BY TABLE_NAME DESC; +---------------------+---------------------+-------------+ | TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | +---------------------+---------------------+-------------+ | sys | version | TABLE | | sys | threads | TABLE | | hive.default | order_test_parquet | TABLE | | hive.default | order_test | TABLE | | sys | options | TABLE | | sys | memory | TABLE | | sys | drillbits | TABLE | | sys | boot | TABLE | | INFORMATION_SCHEMA | VIEWS | TABLE | | hbase | TEST01 | TABLE | | hbase | TEST | TABLE | | INFORMATION_SCHEMA | TABLES | TABLE | | hbase | SYSTEM.STATS | TABLE | | hbase | SYSTEM.SEQUENCE | TABLE | | hbase | SYSTEM.CATALOG | TABLE | | INFORMATION_SCHEMA | SCHEMATA | TABLE | | INFORMATION_SCHEMA | COLUMNS | TABLE | | INFORMATION_SCHEMA | CATALOGS | TABLE | | hbase | ABC | TABLE | +---------------------+---------------------+-------------+ 19 rows selected (1.125 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from hive.`default`.order_test; +-----------------+-----------------------------+ | ordernumber | event_time | +-----------------+-----------------------------+ | 10703007267488 | 2014-05-01 06:01:12.334+01 | 1 rows selected (0.253 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> SELECT COLUMN_NAME, DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'order_test' and TABLE_SCHEMA = 'hive.default' AND COLUMN_NAME LIKE '%ordernumber'; +--------------+------------+ | COLUMN_NAME | DATA_TYPE | +--------------+------------+ | ordernumber | VARCHAR | +--------------+------------+ 1 row selected (1.481 seconds)
Querying System Tables 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> show databases; +---------------------+ | SCHEMA_NAME | +---------------------+ | INFORMATION_SCHEMA | | cp.default | | dfs.default | | dfs.root | | dfs.tmp | | dfs.work | | hbase | | hdfs.default | | hdfs.root | | hive.default | | hive.hsu | | sys | +---------------------+ 12 rows selected (0.137 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> use sys; +-------+----------------------------------+ | ok | summary | +-------+----------------------------------+ | true | Default schema changed to [sys] | +-------+----------------------------------+ 1 row selected (0.089 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> show tables; +---------------+-------------+ | TABLE_SCHEMA | TABLE_NAME | +---------------+-------------+ | sys | boot | | sys | drillbits | | sys | memory | | sys | options | | sys | threads | | sys | version | +---------------+-------------+ 6 rows selected (0.127 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from drillbits; +-------------------+------------+---------------+------------+----------+ | hostname | user_port | control_port | data_port | current | +-------------------+------------+---------------+------------+----------+ | itr-mastertest02 | 31010 | 31011 | 31012 | true | | itr-mastertest01 | 31010 | 31011 | 31012 | false | | itr-nodetest02 | 31010 | 31011 | 31012 | false | | itr-nodetest01 | 31010 | 31011 | 31012 | false | +-------------------+------------+---------------+------------+----------+ 4 rows selected (2.484 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from options where type='SYSTEM' limit 10; +-------------------------------------------------+----------+---------+----------+-------------+-------------+-----------+------------+ | name | kind | type | status | num_val | string_val | bool_val | float_val | +-------------------------------------------------+----------+---------+----------+-------------+-------------+-----------+------------+ | drill.exec.functions.cast_empty_string_to_null | BOOLEAN | SYSTEM | DEFAULT | null | null | false | null | | drill.exec.storage.file.partition.column.label | STRING | SYSTEM | DEFAULT | null | dir | null | null | | exec.errors.verbose | BOOLEAN | SYSTEM | DEFAULT | null | null | false | null | | exec.java_compiler | STRING | SYSTEM | DEFAULT | null | DEFAULT | null | null | | exec.java_compiler_debug | BOOLEAN | SYSTEM | DEFAULT | null | null | true | null | | exec.java_compiler_janino_maxsize | LONG | SYSTEM | DEFAULT | 262144 | null | null | null | | exec.max_hash_table_size | LONG | SYSTEM | DEFAULT | 1073741824 | null | null | null | | exec.min_hash_table_size | LONG | SYSTEM | DEFAULT | 65536 | null | null | null | | exec.queue.enable | BOOLEAN | SYSTEM | DEFAULT | null | null | false | null | | exec.queue.large | LONG | SYSTEM | DEFAULT | 10 | null | null | null | +-------------------------------------------------+----------+---------+----------+-------------+-------------+-----------+------------+ 10 rows selected (2.842 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from boot limit 10; +--------------------------------------+----------+-------+---------+------------+-------------------------+-----------+------------+ | name | kind | type | status | num_val | string_val | bool_val | float_val | +--------------------------------------+----------+-------+---------+------------+-------------------------+-----------+------------+ | awt.toolkit | STRING | BOOT | BOOT | null | "sun.awt.X11.XToolkit" | null | null | | drill.client.supports-complex-types | BOOLEAN | BOOT | BOOT | null | null | true | null | | drill.exec.buffer.size | STRING | BOOT | BOOT | null | "6" | null | null | | drill.exec.buffer.spooling.delete | BOOLEAN | BOOT | BOOT | null | null | true | null | | drill.exec.buffer.spooling.size | LONG | BOOT | BOOT | 100000000 | null | null | null | | drill.exec.cluster-id | STRING | BOOT | BOOT | null | "drillbits1" | null | null | | drill.exec.compile.cache_max_size | LONG | BOOT | BOOT | 1000 | null | null | null | | drill.exec.compile.compiler | STRING | BOOT | BOOT | null | "DEFAULT" | null | null | | drill.exec.compile.debug | BOOLEAN | BOOT | BOOT | null | null | true | null | | drill.exec.compile.janino_maxsize | LONG | BOOT | BOOT | 262144 | null | null | null | +--------------------------------------+----------+-------+---------+------------+-------------------------+-----------+------------+ 10 rows selected (0.138 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from threads; +-------------------+------------+----------------+---------------+ | hostname | user_port | total_threads | busy_threads | +-------------------+------------+----------------+---------------+ | itr-mastertest02 | 31010 | 36 | 34 | | itr-nodetest02 | 31010 | 34 | 31 | | itr-nodetest01 | 31010 | 27 | 27 | | itr-mastertest01 | 31010 | 37 | 32 | +-------------------+------------+----------------+---------------+ 4 rows selected (9.853 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from memory; +-------------------+------------+---------------+-------------+-----------------+---------------------+-------------+ | hostname | user_port | heap_current | heap_max | direct_current | jvm_direct_current | direct_max | +-------------------+------------+---------------+-------------+-----------------+---------------------+-------------+ | itr-mastertest02 | 31010 | 347630584 | 1073741824 | 11798941 | 33931076 | 2147483648 | | itr-nodetest02 | 31010 | 182035960 | 1073741824 | 7750365 | 16813963 | 2147483648 | | itr-nodetest01 | 31010 | 274726896 | 1073741824 | 7750365 | 16778020 | 2147483648 | | itr-mastertest01 | 31010 | 455562488 | 1073741824 | 7751085 | 17151349 | 2147483648 | +-------------------+------------+---------------+-------------+-----------------+---------------------+-------------+ 4 rows selected (0.904 seconds)
监控WebUI http://drill.apache.org/docs/monitoring-and-canceling-queries-in-the-drill-web-ui/
log-and-debug http://drill.apache.org/docs/log-and-debug-introduction/
troubleshooting http://drill.apache.org/docs/troubleshooting/ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> ALTER SESSION SET `exec.errors.verbose`=true; +-------+-------------------------------+ | ok | summary | +-------+-------------------------------+ | true | exec.errors.verbose updated. | +-------+-------------------------------+ 1 row selected (4.051 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> SELECT hostname FROM sys.drillbits WHERE `current` = true; +-----------------+ | hostname | +-----------------+ | itr-nodetest02 | +-----------------+ 1 row selected (1.246 seconds) 0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> SELECT commit_id FROM sys.version; +-------------------------------------------+ | commit_id | +-------------------------------------------+ | 13a0c3c70991f29ca240adcac32b11b0d4fef21e | +-------------------------------------------+ 1 row selected (1.624 seconds)
注意:
当指定的zk是一个全新的ZK, 之前如果使用zk=local在本次新的zk会话中 Storage-Plugin的信息都丢失. 因为我们指定的zookeeper集群是全新的. 所以drill还没有往里面写入任何数据. 这是因为在web ui上对Storage Plugin进行update或者create的数据都会写入到对应的zookeeper节点上! 当我们在界面上update hive, 并且enable后, 通过show databases就可以看到hive里的表.
参考:http://www.yankay.com/google-dremel-rationale/
相关资源:http://geek.csdn.net/news/detail/32952 http://www.quora.com/Apache-Drill