drill-use

Apache 软件基金会发起了一项名为“Drill”的开源项目。Apache Drill 实现了 Google’s Dremel.
Apache Drill 在基于 SQL 的数据分析和商业智能(BI)上引入了 JSON 文件模型,这使得用户能查询固定架构,演化架构,以及各种格式和数据存储中的模式无关(schema-free)数据。该体系架构中关系查询引擎和数据库的构建是有先决条件的,即假设所有数据都有一个简单的静态架构。
Apache Drill的架构是独一无二的。它是唯一一个支持复杂和无模式数据的柱状执行
引擎(columnar execution engine),也是唯一一个能在查询执行期间进行数据驱动 查询(和重新编译,也称之为 schema discovery)的执行引擎(execution engine)。这些独一无二的性能使得 Apache Drill 在 JSON 文件模式下能实现记录断点性能(record-breaking performance)。

数据结构:

兼容已有的 SQL 环境和 Apache Hive:

支持多框架:

单机模式

交互窗口-drill-embedded

    # wget http://getdrill.org/drill/download/apache-drill-1.0.0.tar.gz
    # tar zxf apache-drill-1.0.0.tar.gz 
    # cd apache-drill-1.0.0

    # bin/drill-embedded 
    Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M;    support was removed in 8.0
    Jul 18, 2015 3:41:25 PM org.glassfish.jersey.server.ApplicationHandler  initialize
    INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 01:25:26...
    apache drill 1.0.0 
    "drill baby drill"
    0: jdbc:drill:zk=local> use cp;
    +-------+---------------------------------+
    |  ok   |             summary             |
    +-------+---------------------------------+
    | true  | Default schema changed to [cp]  |
    +-------+---------------------------------+
    1 row selected (0.186 seconds)

    0: jdbc:drill:zk=local> select employee_id,full_name,first_name,last_name from `employee.json` limit 2;
    +--------------+------------------+-------------+------------+
    | employee_id  |    full_name     | first_name  | last_name  |
    +--------------+------------------+-------------+------------+
    | 1            | Sheri Nowmer     | Sheri       | Nowmer     |
    | 2            | Derrick Whelply  | Derrick     | Whelply    |
    +--------------+------------------+-------------+------------+
    2 rows selected (0.426 seconds)

    0: jdbc:drill:zk=local> !quit
    Closing: org.apache.drill.jdbc.DrillJdbc41Factory$DrillJdbc41Connection

交互窗口-sqlline

    #  bin/sqlline -u jdbc:drill:zk=local
    Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
    Jul 18, 2015 3:48:12 PM org.glassfish.jersey.server.ApplicationHandler  initialize
    INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 01:25:26...
    apache drill 1.0.0 
    "a drill in the hand is better than two in the bush"
    0: jdbc:drill:zk=local>

    0: jdbc:drill:zk=local> !quit
    Closing: org.apache.drill.jdbc.DrillJdbc41Factory$DrillJdbc41Connection

后台进程方式运行drill

    # bin/drillbit.sh start
    starting drillbit, logging to /usr/local/apache-drill-1.0.0/log/drillbit.out

    # cat drillbit.out 
    Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000006e0000000, 4294967296, 0) failed; error='Cannot allocate memory' (errno=12)
    #
    # There is insufficient memory for the Java Runtime Environment to continue.
    # Native memory allocation (malloc) failed to allocate 4294967296 bytes for committing reserved memory.
    # An error report file with more information is saved as:
    # /usr/local/apache-drill-1.0.0/hs_err_pid5344.log
    Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000006e0000000, 4294967296, 0) failed; error='Cannot allocate memory' (errno=12)
    #
    # There is insufficient memory for the Java Runtime Environment to continue.
    # Native memory allocation (malloc) failed to allocate 4294967296 bytes for committing reserved memory.
    # An error report file with more information is saved as:
    # /usr/local/apache-drill-1.0.0/hs_err_pid5524.log

    增加内存或者修改
    DRILL_MAX_DIRECT_MEMORY="2G" default: 8g
    DRILL_HEAP="1G" default: 4g

    #  bin/drillbit.sh start
    starting drillbit, logging to /usr/local/apache-drill-1.0.0/log/drillbit.out

    # ps auwx |grep dirll   #验证后台进程

    # jps -lm
    2099 org.apache.hadoop.hdfs.tools.DFSZKFailoverController
    1610 org.apache.zookeeper.server.quorum.QuorumPeerMain /usr/local/  zookeeper-3.4.6/bin/../conf/zoo.cfg
    6105 sun.tools.jps.Jps -lm
    1775 org.apache.hadoop.hdfs.server.namenode.NameNode
    5936 org.apache.drill.exec.server.Drillbit
    1962 org.apache.hadoop.hdfs.qjournal.server.JournalNode

第一个是drillbit的后台进程, 第二个是使用sqlline或者dril-embbed启动的客户端进程

Storage Plugin

http://drill.apache.org/docs/plugin-configuration-basics/

默认只有cp和dfs是enable的. 在Diabled Storage Plugins中点击某个插件的Enable, 就可以使用这个存储插件了!

1、添加HDFS插件

点击Create,在Configuration中输入hdfs的存储插件配置信息:

    {
      "type": "file",
      "enabled": true,
      "connection": "hdfs://itr-mastertest01:9000/",
      "workspaces": {
        "root": {
          "location": "/tmp",
          "writable": true,
          "defaultInputFormat": null
        }
      },
      "formats": {
        "csv": {
          "type": "text",
          "extensions": [
            "csv"
          ],
          "delimiter": ","
        },
        "tsv": {
          "type": "text",
          "extensions": [
            "tsv"
          ],
          "delimiter": "\t"
        },
        "parquet": {
          "type": "parquet"
        }
      }
    }

2、设置dfs/hdfs工作目录

  • dfs

      {
        "type": "file",
        "enabled": true,
        "connection": "file:///",
        "workspaces": {
          "root": {
            "location": "/",
            "writable": false,
            "defaultInputFormat": null
          },
          "tmp": {
            "location": "/tmp",
            "writable": true,
            "defaultInputFormat": null
          },
          "work": {
            "location": "/usr/local/apache-drill-1.0.0/",
            "writable": true,
            "defaultInputFormat": null
          }
        },
        "formats": {
          "csv": {
            "type": "text",
            "extensions": [
              "csv"
            ],
            "delimiter": ","
          },
          "tsv": {
            "type": "text",
            "extensions": [
              "tsv"
            ],
            "delimiter": "\t"
          },
          "parquet": {
            "type": "parquet"
          }
        }
      }
    
  • hdfs

      {
        "type": "file",
        "enabled": true,
        "connection": "hdfs://mycluster/",
        "workspaces": {
          "root": {
            "location": "/",
            "writable": true,
            "defaultInputFormat": null
          }
        },
        "formats": {
          "csv": {
            "type": "text",
            "extensions": [
              "csv"
            ],
            "delimiter": ","
          },
          "tsv": {
            "type": "text",
            "extensions": [
              "tsv"
            ],
            "delimiter": "\t"
          },
          "parquet": {
            "type": "parquet"
          }
        }
      }
        对于hdfs也可以自定义一个自己的工作空间比如work=/user/hive/. 则定位到/user/hive/下,  直接使用hdfs.work进行查询
    
  • hive

    {
    "type": "hive",
    "enabled": true,
    "configProps": {
      "hive.metastore.uris": "thrift://itr-mastertest02:9083",
      "javax.jdo.option.ConnectionURL": "jdbc:mysql://itr-mastertest02:3306/hive?characterEncoding=UTF-8",
      "hive.metastore.warehouse.dir": "/user/hive/warehouse",
      "fs.default.name": "hdfs://mycluster",
      "hive.metastore.sasl.enabled": "false"
    }
    }
    
  • hbase

      {
        "type": "hbase",
        "config": {
          "hbase.zookeeper.quorum": "itr-mastertest01,itr-mastertest02,itr-nodetest01",
          "hbase.zookeeper.property.clientPort": "2181"
        },
        "size.calculator.enabled": false,
        "enabled": true
      }
    
  • rdbms

    {
    "type": "jdbc",
    "driver": "com.mysql.jdbc.Driver",
    "url": "jdbc:mysql://itr-mastertest01:3306",
    "username": "root",
    "password": "admin",
    "enabled": true
    }
    

3、测试dfs/hdfs

测试部分放到分布式中

分布式

1、修改配置文件

    # tail -4 drill-override.conf  
    drill.exec: {
      cluster-id: "drillbits1",
      zk.connect: "itr-mastertest01:2181,itr-mastertest02:2181,itr-nodetest01:2181"
    }

每台节点的cluster-id都是一样的. 保证了所有的节点组成一个集群.把安装目录复制到所有节点相同目录下!

2、启动集群

  • 然后在每台机器上都启动bin/drillbit.sh start
  • 随便访问任意一台机器的8047端口, 都可以列出集群中的所有drill服务,并且看到节点

3、客户端连接

Drill提供了一些工具, 包括第三方工具也提供了访问Drill数据的方法.
主要是Drill和其他SQL DB一样提供了一个ODBC Driver.
参考: https://drill.apache.org/docs/interfaces-introduction/
例:jdbc:drill:zk=10.10.100.30:5181,10.10.100.31:5181,10.10.100.32:5181/drill/drillbits1;schema=hive

  • 连接本地ZK

      bin/sqlline -u jdbc:drill:zk=local
    
  • 连接ZK集群(sqlline)

      # drill/bin/sqlline -u jdbc:drill:zk=itr-mastertest01:2181,itr-mastertest02:2181,itr-nodetest01:2181
    
      `如果是集群模式, 也可以不跟上zk地址: bin/sqlline -u jdbc:drill:zk 会自动读取drill-override.conf的配置`
    
      0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma>  select employee_id,full_name from cp.`employee.json` limit 2;
      +--------------+------------------+
      | employee_id  |    full_name     |
      +--------------+------------------+
      | 1            | Sheri Nowmer     |
      | 2            | Derrick Whelply  |
      +--------------+------------------+
      2 rows selected (0.457 seconds)
    

dfs源

    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from dfs.`sample-data/region.parquet` limit 2;
    Error: PARSE ERROR: From line 1, column 15 to line 1, column 17: Table 'dfs.sample-data/region.parquet' not found
    [Error Id: 9d20015d-023a-4d06-80d2-78760a037aed on itr-mastertest02:31010] (state=,code=0)

    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from dfs.work.`sample-data/region.parquet` limit 2;
    +--------------+----------+-----------------------+
    | R_REGIONKEY  |  R_NAME  |       R_COMMENT       |
    +--------------+----------+-----------------------+
    | 0            | AFRICA   | lar deposits. blithe  |
    | 1            | AMERICA  | hs use ironic, even   |
    +--------------+----------+-----------------------+
    2 rows selected (0.517 seconds)

hdfs源

    # hadoop fs -ls /data         
    Found 1 items
    -rw-r--r--   2 root supergroup        455 2015-07-18 22:35 /data/regionsSF.parquet

    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select count(*) from hdfs.`/data`;
    +---------+
    | EXPR$0  |
    +---------+
    | 5       |
    +---------+
    1 row selected (0.773 seconds)
    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select count(*) from hdfs.`/data/regionsSF.parquet`;
    +---------+
    | EXPR$0  |
    +---------+
    | 5       |
    +---------+
    1 row selected (0.146 seconds)

    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select count(*) from hdfs.`hdfs://mycluster/data/regionsSF.parquet`;
    +---------+
    | EXPR$0  |
    +---------+
    | 5       |
    +---------+
    1 row selected (0.173 seconds)

    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> use hdfs.work;
    +-------+----------------------------------------+
    |  ok   |                summary                 |
    +-------+----------------------------------------+
    | true  | Default schema changed to [hdfs.work]  |
    +-------+----------------------------------------+
    1 row selected (0.095 seconds)

    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> use sys;
    +-------+----------------------------------+
    |  ok   |             summary              |
    +-------+----------------------------------+
    | true  | Default schema changed to [sys]  |
    +-------+----------------------------------+
    1 row selected (0.136 seconds)
    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> show tables;
    +---------------+-------------+
    | TABLE_SCHEMA  | TABLE_NAME  |
    +---------------+-------------+
    | sys           | boot        |
    | sys           | drillbits   |
    | sys           | memory      |
    | sys           | options     |
    | sys           | threads     |
    | sys           | version     |
    +---------------+-------------+
    6 rows selected (0.138 seconds)

    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> use hdfs;
    +-------+-----------------------------------+
    |  ok   |              summary              |
    +-------+-----------------------------------+
    | true  | Default schema changed to [hdfs]  |
    +-------+-----------------------------------+
    1 row selected (0.094 seconds)
    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> use tmp;
    +-------+---------------------------------------+
    |  ok   |                summary                |
    +-------+---------------------------------------+
    | true  | Default schema changed to [hdfs.tmp]  |
    +-------+---------------------------------------+
    1 row selected (0.1 seconds)

hive 数据源

    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> use hive;
    +-------+-----------------------------------+
    |  ok   |              summary              |
    +-------+-----------------------------------+
    | true  | Default schema changed to [hive]  |
    +-------+-----------------------------------+
    1 row selected (0.804 seconds)
    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> show tables;
    +---------------+---------------------+
    | TABLE_SCHEMA  |     TABLE_NAME      |
    +---------------+---------------------+
    | hive.default  | order_test          |
    | hive.default  | order_test_parquet  |
    +---------------+---------------------+
    2 rows selected (0.462 seconds)    

    查询报错:
    Error: SYSTEM ERROR: java.net.UnknownHostException: mycluster
    [Error Id: 10ba19e9-35fd-4457-8fa5-8c6f41313b3a on itr-mastertest02:31010] (state=,code=0)
     select count(*) from hdfs.`testdata/db_case1/order_created`;
    Error: PARSE ERROR: From line 1, column 22 to line 1, column 25: Table 'hdfs.testdata/db_case1/order_created' not found

    解决:
    # vim drill-env.sh 
    export HADOOP_HOME="/usr/local/hadoop"

    #text table
    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from hive.`order_test`;
    +-----------------+-----------------------------+
    |   ordernumber   |         event_time          |
    +-----------------+-----------------------------+
    | 10703007267488  | 2014-05-01 06:01:12.334+01  |
    | 10101043505096  | 2014-05-01 07:28:12.342+01  |
    | 10103043509747  | 2014-05-01 07:50:12.33+01   |
    | 10103043501575  | 2014-05-01 09:27:12.33+01   |
    | 10104043514061  | 2014-05-01 09:03:12.324+01  |
    +-----------------+-----------------------------+
    5 rows selected (0.819 seconds)

    #parquet table
    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from hive.`order_test_parquet`;
    +-----------------+-----------------------------+
    |   ordernumber   |         event_time          |
    +-----------------+-----------------------------+
    | 10703007267488  | 2014-05-01 06:01:12.334+01  |
    | 10101043505096  | 2014-05-01 07:28:12.342+01  |
    | 10103043509747  | 2014-05-01 07:50:12.33+01   |
    | 10103043501575  | 2014-05-01 09:27:12.33+01   |
    | 10104043514061  | 2014-05-01 09:03:12.324+01  |
    +-----------------+-----------------------------+
    5 rows selected (0.522 seconds)

    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> show databases;
    +---------------------+
    |     SCHEMA_NAME     |
    +---------------------+
    | INFORMATION_SCHEMA  |
    | cp.default          |
    | dfs.default         |
    | dfs.root            |
    | dfs.tmp             |
    | dfs.work            |
    | hdfs.default        |
    | hdfs.root           |
    | hdfs.tmp            |
    | hdfs.work           |
    | hive.default        |
    | hive.hsu            |
    | sys                 |
    +---------------------+
    13 rows selected (0.14 seconds)

    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select count(*) from hive.`default`.order_test;
    +---------+
    | EXPR$0  |
    +---------+
    | 5       |
    +---------+
    1 row selected (0.27 seconds)

    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> create table hive.`default`.test as select * from hive.`default`.order_test;
    Error: PARSE ERROR: Unable to create or drop tables/views. Schema [hive.    default] is immutable.

参考:http://drill.apache.org/docs/querying-hive/

hbase数据源

hbase(main):001:0> list
TABLE                                                                                                                                    
ABC                                                                                                                                      
SYSTEM.CATALOG                                                                                                                           
SYSTEM.SEQUENCE                                                                                                                          
SYSTEM.STATS                                                                                                                             
TEST                                                                                                                                     
TEST01                                                                                                                                   
6 row(s) in 2.9470 seconds

=> ["ABC", "SYSTEM.CATALOG", "SYSTEM.SEQUENCE", "SYSTEM.STATS", "TEST", "TEST01"]

hbase(main):008:0> scan 'TEST01'
ROW                                 COLUMN+CELL                                                                                          
 user1|ts1                          column=sf:c1, timestamp=1426423837228, value=sku1                                                    
1 row(s) in 0.0840 seconds

hbase(main):009:0> scan 'TEST'
ROW                                 COLUMN+CELL                                                                                          
 \x80\x00\x00\x01                   column=0:MYCOLUMN, timestamp=1426421982186, value=Hello                                              
 \x80\x00\x00\x01                   column=0:_0, timestamp=1426421982186, value=                                                         
 \x80\x00\x00\x02                   column=0:MYCOLUMN, timestamp=1426421982186, value=World!                                             
 \x80\x00\x00\x02                   column=0:_0, timestamp=1426421982186, value=                                                         
2 row(s) in 0.0290 seconds
    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> use hbase;
    +-------+------------------------------------+
    |  ok   |              summary               |
    +-------+------------------------------------+
    | true  | Default schema changed to [hbase]  |
    +-------+------------------------------------+
    1 row selected (2.393 seconds)
    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> show tables;
    +---------------+------------------+
    | TABLE_SCHEMA  |    TABLE_NAME    |
    +---------------+------------------+
    | hbase         | ABC              |
    | hbase         | SYSTEM.CATALOG   |
    | hbase         | SYSTEM.SEQUENCE  |
    | hbase         | SYSTEM.STATS     |
    | hbase         | TEST             |
    | hbase         | TEST01           |
    +---------------+------------------+
    6 rows selected (5.415 seconds)

    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from hbase.TEST01;
    +-------------+-------+--------------------+
    |   row_key   |  lf   |         sf         |
    +-------------+-------+--------------------+
    | [B@a32b2dc  | null  | {"c1":"c2t1MQ=="}  |
    +-------------+-------+--------------------+
    1 row selected (1.825 seconds)
    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select count(*) from hbase.TEST01;
    +---------+
    | EXPR$0  |
    +---------+
    | 1       |
    +---------+
    1 row selected (2.244 seconds)

    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select cast(row_key as VarChar(20)),lf,sf from hbase.TEST01;
    +------------+-----+--------------------+
    |   EXPR$0   | lf  |         sf         |
    +------------+-----+--------------------+
    | user1|ts1  | {}  | {"c1":"c2t1MQ=="}  |
    +------------+-----+--------------------+
    1 row selected (2.305 seconds)

    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select count(*) from hbase.`SYSTEM.CATALOG`;
    +---------+
    | EXPR$0  |
    +---------+
    | 79      |
    +---------+
    1 row selected (0.815 seconds)

    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select CONVERT_FROM(row_key,'UTF8') from hbase.TEST;
    +---------+
    | EXPR$0  |
    +---------+
    | ?    |
    | ?    |
    +---------+
    2 rows selected (0.699 seconds)
    0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from hbase.TEST;
    +--------------+----------------------------------+
    |   row_key    |                0                 |
    +--------------+----------------------------------+
    | [B@5b6c4eba  | {"MYCOLUMN":"SGVsbG8=","_0":""}  |
    | [B@23a4c43a  | {"MYCOLUMN":"V29ybGQh","_0":""}  |
    +--------------+----------------------------------+
    2 rows selected (0.554 seconds)

MYSQL

0: jdbc:drill:zk=bigdata-test-server-02:2181> use mysql.test;
+-------+-----------------------------------------+
|  ok   |                 summary                 |
+-------+-----------------------------------------+
| true  | Default schema changed to [mysql.test]  |
+-------+-----------------------------------------+
1 row selected (0.35 seconds)
0: jdbc:drill:zk=bigdata-test-server-02:2181> show tables;
+---------------+-------------+
| TABLE_SCHEMA  | TABLE_NAME  |
+---------------+-------------+
| mysql.test    | books       |
| mysql.test    | t1          |
+---------------+-------------+
2 rows selected (0.418 seconds)

0: jdbc:drill:zk=bigdata-test-server-02:2181> select * from t1;
+----+
| a  |
+----+
| 1  |
| 2  |
+----+
2 rows selected (1.36 seconds)

0: jdbc:drill:zk=bigdata-test-server-02:2181> describe books;
+---------------+--------------------+--------------+
|  COLUMN_NAME  |     DATA_TYPE      | IS_NULLABLE  |
+---------------+--------------------+--------------+
| id            | INTEGER            | YES          |
| isbn          | CHARACTER VARYING  | YES          |
| category      | CHARACTER VARYING  | YES          |
| publish_date  | TIMESTAMP          | NO           |
| publisher     | CHARACTER VARYING  | YES          |
| price         | REAL               | YES          |
+---------------+--------------------+--------------+
6 rows selected (0.239 seconds)
    # jps -lm
    2099 org.apache.hadoop.hdfs.tools.DFSZKFailoverController
    14871 sun.tools.jps.Jps -lm
    1610 org.apache.zookeeper.server.quorum.QuorumPeerMain /usr/local/zookeeper-3.4.6/bin/../conf/zoo.cfg
    14613 org.apache.hadoop.hbase.master.HMaster start
    11327 org.apache.drill.exec.server.Drillbit
    1775 org.apache.hadoop.hdfs.server.namenode.NameNode
    1962 org.apache.hadoop.hdfs.qjournal.server.JournalNode

Drill Default Input Format

参考:http://drill.apache.org/docs/drill-default-input-format/

You can define one default input format per workspace. If you do not define a default input format, and Drill cannot detect the file format, the query fails. You can define a default input format for any of the file types that Drill supports. Currently, Drill supports the following types:

  • Avro
  • CSV, TSV, or PSV
  • Parquet
  • JSON
  • MapR-DB*

Querying Compressed Files

参考:http://drill.apache.org/docs/querying-plain-text-files/#query-the-gz-file-directly

SELECT COLUMNS[0],
COLUMNS[1],
COLUMNS[2]
FROM dfs./Users/drilluser/Downloads/googlebooks-eng-all-5gram-20120701-zo. ts.gz
WHERE ((columns[0] = ‘Zoological Journal of the Linnean’)
AND (columns[2] > 250))
LIMIT 10;

Querying HBase

http://drill.apache.org/docs/querying-hbase/

Querying the INFORMATION SCHEMA

You can query the following INFORMATION_SCHEMA tables:

  • SCHEMATA
  • CATALOGS
  • TABLES
  • COLUMNS
  • VIEWS
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> SELECT CATALOG_NAME, SCHEMA_NAME as all_my_data_sources FROM INFORMATION_SCHEMA.SCHEMATA ORDER BY SCHEMA_NAME;
+---------------+----------------------+
| CATALOG_NAME  | all_my_data_sources  |
+---------------+----------------------+
| DRILL         | INFORMATION_SCHEMA   |
| DRILL         | cp.default           |
| DRILL         | dfs.default          |
| DRILL         | dfs.root             |
| DRILL         | dfs.tmp              |
| DRILL         | dfs.work             |
| DRILL         | hbase                |
| DRILL         | hdfs.default         |
| DRILL         | hdfs.root            |
| DRILL         | hive.default         |
| DRILL         | hive.hsu             |
| DRILL         | sys                  |
+---------------+----------------------+
12 rows selected (0.256 seconds)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma>  SHOW DATABASES;
+---------------------+
|     SCHEMA_NAME     |
+---------------------+
| INFORMATION_SCHEMA  |
| cp.default          |
| dfs.default         |
| dfs.root            |
| dfs.tmp             |
| dfs.work            |
| hbase               |
| hdfs.default        |
| hdfs.root           |
| hive.default        |
| hive.hsu            |
| sys                 |
+---------------------+
12 rows selected (0.383 seconds)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE FROM INFORMATION_SCHEMA.`TABLES` ORDER BY TABLE_NAME DESC;
+---------------------+---------------------+-------------+
|    TABLE_SCHEMA     |     TABLE_NAME      | TABLE_TYPE  |
+---------------------+---------------------+-------------+
| sys                 | version             | TABLE       |
| sys                 | threads             | TABLE       |
| hive.default        | order_test_parquet  | TABLE       |
| hive.default        | order_test          | TABLE       |
| sys                 | options             | TABLE       |
| sys                 | memory              | TABLE       |
| sys                 | drillbits           | TABLE       |
| sys                 | boot                | TABLE       |
| INFORMATION_SCHEMA  | VIEWS               | TABLE       |
| hbase               | TEST01              | TABLE       |
| hbase               | TEST                | TABLE       |
| INFORMATION_SCHEMA  | TABLES              | TABLE       |
| hbase               | SYSTEM.STATS        | TABLE       |
| hbase               | SYSTEM.SEQUENCE     | TABLE       |
| hbase               | SYSTEM.CATALOG      | TABLE       |
| INFORMATION_SCHEMA  | SCHEMATA            | TABLE       |
| INFORMATION_SCHEMA  | COLUMNS             | TABLE       |
| INFORMATION_SCHEMA  | CATALOGS            | TABLE       |
| hbase               | ABC                 | TABLE       |
+---------------------+---------------------+-------------+
19 rows selected (1.125 seconds)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from hive.`default`.order_test;
+-----------------+-----------------------------+
|   ordernumber   |         event_time          |
+-----------------+-----------------------------+
| 10703007267488  | 2014-05-01 06:01:12.334+01  |
1 rows selected (0.253 seconds)
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> SELECT COLUMN_NAME, DATA_TYPE  FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'order_test' and TABLE_SCHEMA = 'hive.default' AND COLUMN_NAME LIKE '%ordernumber';
+--------------+------------+
| COLUMN_NAME  | DATA_TYPE  |
+--------------+------------+
| ordernumber  | VARCHAR    |
+--------------+------------+
1 row selected (1.481 seconds)

Querying System Tables

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> show databases;
+---------------------+
|     SCHEMA_NAME     |
+---------------------+
| INFORMATION_SCHEMA  |
| cp.default          |
| dfs.default         |
| dfs.root            |
| dfs.tmp             |
| dfs.work            |
| hbase               |
| hdfs.default        |
| hdfs.root           |
| hive.default        |
| hive.hsu            |
| sys                 |
+---------------------+
12 rows selected (0.137 seconds)
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma>  use sys;
+-------+----------------------------------+
|  ok   |             summary              |
+-------+----------------------------------+
| true  | Default schema changed to [sys]  |
+-------+----------------------------------+
1 row selected (0.089 seconds)
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> show tables;
+---------------+-------------+
| TABLE_SCHEMA  | TABLE_NAME  |
+---------------+-------------+
| sys           | boot        |
| sys           | drillbits   |
| sys           | memory      |
| sys           | options     |
| sys           | threads     |
| sys           | version     |
+---------------+-------------+
6 rows selected (0.127 seconds)
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from drillbits;
+-------------------+------------+---------------+------------+----------+
|     hostname      | user_port  | control_port  | data_port  | current  |
+-------------------+------------+---------------+------------+----------+
| itr-mastertest02  | 31010      | 31011         | 31012      | true     |
| itr-mastertest01  | 31010      | 31011         | 31012      | false    |
| itr-nodetest02    | 31010      | 31011         | 31012      | false    |
| itr-nodetest01    | 31010      | 31011         | 31012      | false    |
+-------------------+------------+---------------+------------+----------+
4 rows selected (2.484 seconds)
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from options where type='SYSTEM' limit 10;
+-------------------------------------------------+----------+---------+----------+-------------+-------------+-----------+------------+
|                      name                       |   kind   |  type   |  status  |   num_val   | string_val  | bool_val  | float_val  |
+-------------------------------------------------+----------+---------+----------+-------------+-------------+-----------+------------+
| drill.exec.functions.cast_empty_string_to_null  | BOOLEAN  | SYSTEM  | DEFAULT  | null        | null        | false     | null       |
| drill.exec.storage.file.partition.column.label  | STRING   | SYSTEM  | DEFAULT  | null        | dir         | null      | null       |
| exec.errors.verbose                             | BOOLEAN  | SYSTEM  | DEFAULT  | null        | null        | false     | null       |
| exec.java_compiler                              | STRING   | SYSTEM  | DEFAULT  | null        | DEFAULT     | null      | null       |
| exec.java_compiler_debug                        | BOOLEAN  | SYSTEM  | DEFAULT  | null        | null        | true      | null       |
| exec.java_compiler_janino_maxsize               | LONG     | SYSTEM  | DEFAULT  | 262144      | null        | null      | null       |
| exec.max_hash_table_size                        | LONG     | SYSTEM  | DEFAULT  | 1073741824  | null        | null      | null       |
| exec.min_hash_table_size                        | LONG     | SYSTEM  | DEFAULT  | 65536       | null        | null      | null       |
| exec.queue.enable                               | BOOLEAN  | SYSTEM  | DEFAULT  | null        | null        | false     | null       |
| exec.queue.large                                | LONG     | SYSTEM  | DEFAULT  | 10          | null        | null      | null       |
+-------------------------------------------------+----------+---------+----------+-------------+-------------+-----------+------------+
10 rows selected (2.842 seconds)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from boot limit 10;
+--------------------------------------+----------+-------+---------+------------+-------------------------+-----------+------------+
|                 name                 |   kind   | type  | status  |  num_val   |       string_val        | bool_val  | float_val  |
+--------------------------------------+----------+-------+---------+------------+-------------------------+-----------+------------+
| awt.toolkit                          | STRING   | BOOT  | BOOT    | null       | "sun.awt.X11.XToolkit"  | null      | null       |
| drill.client.supports-complex-types  | BOOLEAN  | BOOT  | BOOT    | null       | null                    | true      | null       |
| drill.exec.buffer.size               | STRING   | BOOT  | BOOT    | null       | "6"                     | null      | null       |
| drill.exec.buffer.spooling.delete    | BOOLEAN  | BOOT  | BOOT    | null       | null                    | true      | null       |
| drill.exec.buffer.spooling.size      | LONG     | BOOT  | BOOT    | 100000000  | null                    | null      | null       |
| drill.exec.cluster-id                | STRING   | BOOT  | BOOT    | null       | "drillbits1"            | null      | null       |
| drill.exec.compile.cache_max_size    | LONG     | BOOT  | BOOT    | 1000       | null                    | null      | null       |
| drill.exec.compile.compiler          | STRING   | BOOT  | BOOT    | null       | "DEFAULT"               | null      | null       |
| drill.exec.compile.debug             | BOOLEAN  | BOOT  | BOOT    | null       | null                    | true      | null       |
| drill.exec.compile.janino_maxsize    | LONG     | BOOT  | BOOT    | 262144     | null                    | null      | null       |
+--------------------------------------+----------+-------+---------+------------+-------------------------+-----------+------------+
10 rows selected (0.138 seconds)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from threads;
+-------------------+------------+----------------+---------------+
|     hostname      | user_port  | total_threads  | busy_threads  |
+-------------------+------------+----------------+---------------+
| itr-mastertest02  | 31010      | 36             | 34            |
| itr-nodetest02    | 31010      | 34             | 31            |
| itr-nodetest01    | 31010      | 27             | 27            |
| itr-mastertest01  | 31010      | 37             | 32            |
+-------------------+------------+----------------+---------------+
4 rows selected (9.853 seconds)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> select * from memory;
+-------------------+------------+---------------+-------------+-----------------+---------------------+-------------+
|     hostname      | user_port  | heap_current  |  heap_max   | direct_current  | jvm_direct_current  | direct_max  |
+-------------------+------------+---------------+-------------+-----------------+---------------------+-------------+
| itr-mastertest02  | 31010      | 347630584     | 1073741824  | 11798941        | 33931076            | 2147483648  |
| itr-nodetest02    | 31010      | 182035960     | 1073741824  | 7750365         | 16813963            | 2147483648  |
| itr-nodetest01    | 31010      | 274726896     | 1073741824  | 7750365         | 16778020            | 2147483648  |
| itr-mastertest01  | 31010      | 455562488     | 1073741824  | 7751085         | 17151349            | 2147483648  |
+-------------------+------------+---------------+-------------+-----------------+---------------------+-------------+
4 rows selected (0.904 seconds)

监控WebUI

http://drill.apache.org/docs/monitoring-and-canceling-queries-in-the-drill-web-ui/

log-and-debug

http://drill.apache.org/docs/log-and-debug-introduction/

troubleshooting

http://drill.apache.org/docs/troubleshooting/

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> ALTER SESSION SET `exec.errors.verbose`=true;
+-------+-------------------------------+
|  ok   |            summary            |
+-------+-------------------------------+
| true  | exec.errors.verbose updated.  |
+-------+-------------------------------+
1 row selected (4.051 seconds)
0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> SELECT hostname FROM sys.drillbits WHERE `current` = true;
+-----------------+
|    hostname     |
+-----------------+
| itr-nodetest02  |
+-----------------+
1 row selected (1.246 seconds)

0: jdbc:drill:zk=itr-mastertest01:2181,itr-ma> SELECT commit_id FROM sys.version;
+-------------------------------------------+
|                 commit_id                 |
+-------------------------------------------+
| 13a0c3c70991f29ca240adcac32b11b0d4fef21e  |
+-------------------------------------------+
1 row selected (1.624 seconds)

注意:

当指定的zk是一个全新的ZK, 之前如果使用zk=local在本次新的zk会话中
Storage-Plugin的信息都丢失.
因为我们指定的zookeeper集群是全新的. 所以drill还没有往里面写入任何数据.
这是因为在web ui上对Storage Plugin进行update或者create的数据都会写入到对应的zookeeper节点上!
当我们在界面上update hive, 并且enable后, 通过show databases就可以看到hive里的表.

  • 本文利用了一天时间完成,从零开始部署使用,转载请注明出处www.itweet.cn

参考:http://www.yankay.com/google-dremel-rationale/

相关资源:http://geek.csdn.net/news/detail/32952
http://www.quora.com/Apache-Drill

原创文章,转载请注明: 转载自Itweet的博客
本博客的文章集合: http://www.itweet.cn/blog/archive/