SHOW DATABASES; SHOW TABLES;執行結果如下:
[master:21000] > show databases; Query: show databases +------------------+ | name | +------------------+ | _impala_builtins | | default | | user00 | +------------------+ Fetched 3 row(s) in 0.00s [master:21000] > show tables; Query: show tables Fetched 0 row(s) in 0.01s看起來跟 Hive 的結果不太一致,那是因為 Impala 還沒跟 Hive 進行同步,讓我們使用以下指令,強迫 Impala 與 Hive 的 metastore 進行同步。
invalidate metadata; SHOW DATABASES; use userXX; SHOW TABLES;執行結果如下:
[master:21000] > invalidate metadata; Query: invalidate metadata Fetched 0 row(s) in 3.88s [master:21000] > SHOW DATABASES; Query: show DATABASES +------------------+ | name | +------------------+ | _impala_builtins | | default | | user00 | +------------------+ Fetched 3 row(s) in 0.00s [master:21000] > use user00; Query: use user00 [master:21000] > SHOW TABLES; Query: show TABLES +-------------------+ | name | +-------------------+ | batting | | fact_player_stats | | master | | mysql_data | | pitching | +-------------------+ Fetched 5 row(s) in 0.00s
由於 Impala 支援 INSERT INTO … VALUES 語法(Hive 必須要到 0.14 以後才有支援)。因此,讓我們先使用 INSERT 語法來建立一個示範用的資料表。請執行以下 Impala SQL 來建立範例資料表:
use userXX; CREATE TABLE Users ( id INT, name STRING ); CREATE TABLE Logins ( id INT, login TIMESTAMP ); INSERT INTO Users VALUES ( 1, "Jason" ), ( 2, "Jazz" ), ( 3, "Jim" ), ( 4, "Steven" ), ( 5, "Tim" ); INSERT INTO Logins VALUES ( 1, "2014-12-21 17:18:53" ), ( 2, "2014-12-20 21:38:42" ), ( 3, "2014-12-22 18:17:38" ), ( 4, "2014-12-23 09:25:35" ), ( 5, "2014-12-24 13:35:17" ); SELECT users.id, users.name, logins.login FROM users JOIN logins ON ( users.id = logins.id );
執行結果如下所示。由於這些動作都只發生在 impalad 跟 catalogd 之間,查詢的回應速度大約是秒至毫秒等級。
[master:21000] > use user00; Query: use user00 [master:21000] > CREATE TABLE Users ( id INT, name STRING ); Query: create TABLE Users ( id INT, name STRING ) Fetched 0 row(s) in 1.04s [master:21000] > CREATE TABLE Logins ( id INT, login TIMESTAMP ); Query: create TABLE Logins ( id INT, login TIMESTAMP ) Fetched 0 row(s) in 1.04s [master:21000] > INSERT INTO Users VALUES ( 1, "Jason" ), ( 2, "Jazz" ), ( 3, "Jim" ), ( 4, "Steven" ), ( 5, "Tim" ); Query: insert INTO Users VALUES ( 1, "Jason" ), ( 2, "Jazz" ), ( 3, "Jim" ), ( 4, "Steven" ), ( 5, "Tim" ) Inserted 5 row(s) in 6.21s [master:21000] > INSERT INTO Logins VALUES ( 1, "2014-12-21 17:18:53" ), ( 2, "2014-12-20 21:38:42" ), ( 3, "2014-12-22 18:17:38" ), ( 4, "2014-12-23 09:25:35" ), ( 5, "2014-12-24 13:35:17" ); Query: insert INTO Logins VALUES ( 1, "2014-12-21 17:18:53" ), ( 2, "2014-12-20 21:38:42" ), ( 3, "2014-12-22 18:17:38" ), ( 4, "2014-12-23 09:25:35" ), ( 5, "2014-12-24 13:35:17" ) Inserted 5 row(s) in 6.00s [master:21000] > SELECT users.id, users.name, logins.login FROM users JOIN logins ON ( users.id = logins.id ); Query: select users.id, users.name, logins.login FROM users JOIN logins ON ( users.id = logins.id ) +----+--------+---------------------+ | id | name | login | +----+--------+---------------------+ | 1 | Jason | 2014-12-21 17:18:53 | | 2 | Jazz | 2014-12-20 21:38:42 | | 3 | Jim | 2014-12-22 18:17:38 | | 4 | Steven | 2014-12-23 09:25:35 | | 5 | Tim | 2014-12-24 13:35:17 | +----+--------+---------------------+ Fetched 5 row(s) in 0.40s離開 Impala Shell 的方式與 Hive Shell 相同,請使用
quit;
或 exit;
[master:21000] > quit; Goodbye user00
當使用 Hive 時,必須先使用 LOAD DATA 的語法將資料來源匯入指定資料表。資料會存放於 Hive 的 HDFS 存放目錄(如:hdfs://lab.3du.me:9000/user/hive/warehouse 底下),那在 Impala 如果使用 INSERT INTO 語法,資料會存放在哪裡呢?
由於 Impala 與 Hive 共用 metastore,因此實際上的資料也會存放於 /user/hive/warehouse 底下!可以使用以下指令檢查:er00@master ~]$ hadoop fs -ls -R /user/hive/warehouse/$DBID.db drwxr-xr-x - user00 supergroup 0 2015-08-05 23:26 /user/hive/warehouse/user00.db/batting -rw-r--r-- 1 user00 supergroup 6488747 2015-08-05 23:26 /user/hive/warehouse/user00.db/batting/Batting.csv drwxr-xr-x - user00 supergroup 0 2015-08-05 23:32 /user/hive/warehouse/user00.db/fact_player_stats -rw-r--r-- 1 user00 supergroup 13057255 2015-08-05 23:32 /user/hive/warehouse/user00.db/fact_player_stats/000000_0 drwxrwxr-x - impala hive 0 2015-08-05 23:49 /user/hive/warehouse/user00.db/logins -rw-r--r-- 1 impala hive 110 2015-08-05 23:49 /user/hive/warehouse/user00.db/logins/464fad09d2508e04-ac0cdeb713d2dfa3_1008088474_data.0. drwxrwxr-x - impala hive 0 2015-08-05 23:49 /user/hive/warehouse/user00.db/logins/_impala_insert_staging drwxr-xr-x - user00 supergroup 0 2015-08-05 23:20 /user/hive/warehouse/user00.db/master -rw-r--r-- 1 user00 supergroup 3049250 2015-08-05 23:20 /user/hive/warehouse/user00.db/master/Master.csv drwxr-xr-x - user00 hive 0 2015-08-05 23:15 /user/hive/warehouse/user00.db/mysql_data -rw-r--r-- 1 user00 user00 0 2015-08-05 23:15 /user/hive/warehouse/user00.db/mysql_data/_SUCCESS -rw-r--r-- 1 user00 user00 8 2015-08-05 23:15 /user/hive/warehouse/user00.db/mysql_data/part-m-00000 -rw-r--r-- 1 user00 user00 8 2015-08-05 23:15 /user/hive/warehouse/user00.db/mysql_data/part-m-00001 drwxr-xr-x - user00 supergroup 0 2015-08-05 23:30 /user/hive/warehouse/user00.db/pitching -rw-r--r-- 1 user00 supergroup 3602473 2015-08-05 23:30 /user/hive/warehouse/user00.db/pitching/Pitching.csv drwxrwxr-x - impala hive 0 2015-08-05 23:49 /user/hive/warehouse/user00.db/users -rw-r--r-- 1 impala hive 36 2015-08-05 23:49 /user/hive/warehouse/user00.db/users/2d4f2425a22921e6-dee9b22ea2933490_1342285975_data.0. drwxrwxr-x - impala hive 0 2015-08-05 23:49 /user/hive/warehouse/user00.db/users/_impala_insert_staging
實驗建立一個新的資料表,名為 test。
[user00@master ~]$ impala-shell -q "use $DBID; create table test;" Starting Impala Shell without Kerberos authentication Connected to master:21000 Query: use user00 Query: create table test ERROR: AnalysisException: Table requires at least 1 column Could not execute command: create table test [user00@master ~]$ impala-shell -q "use $DBID; create table test (id int);" Starting Impala Shell without Kerberos authentication Connected to master:21000 Query: use user00 Query: create table test (id int) Fetched 0 row(s) in 1.04s
查詢 Hive 的 HDFS 存放目錄,會發現多了一個名為 test 的目錄。
[user00@master ~]$ hadoop fs -ls -R /user/hive/warehouse/$DBID.db drwxr-xr-x - user00 supergroup 0 2015-08-05 23:26 /user/hive/warehouse/user00.db/batting -rw-r--r-- 1 user00 supergroup 6488747 2015-08-05 23:26 /user/hive/warehouse/user00.db/batting/Batting.csv drwxr-xr-x - user00 supergroup 0 2015-08-05 23:32 /user/hive/warehouse/user00.db/fact_player_stats -rw-r--r-- 1 user00 supergroup 13057255 2015-08-05 23:32 /user/hive/warehouse/user00.db/fact_player_stats/000000_0 drwxrwxr-x - impala hive 0 2015-08-05 23:49 /user/hive/warehouse/user00.db/logins -rw-r--r-- 1 impala hive 110 2015-08-05 23:49 /user/hive/warehouse/user00.db/logins/464fad09d2508e04-ac0cdeb713d2dfa3_1008088474_data.0. drwxrwxr-x - impala hive 0 2015-08-05 23:49 /user/hive/warehouse/user00.db/logins/_impala_insert_staging drwxr-xr-x - user00 supergroup 0 2015-08-05 23:20 /user/hive/warehouse/user00.db/master -rw-r--r-- 1 user00 supergroup 3049250 2015-08-05 23:20 /user/hive/warehouse/user00.db/master/Master.csv drwxr-xr-x - user00 hive 0 2015-08-05 23:15 /user/hive/warehouse/user00.db/mysql_data -rw-r--r-- 1 user00 user00 0 2015-08-05 23:15 /user/hive/warehouse/user00.db/mysql_data/_SUCCESS -rw-r--r-- 1 user00 user00 8 2015-08-05 23:15 /user/hive/warehouse/user00.db/mysql_data/part-m-00000 -rw-r--r-- 1 user00 user00 8 2015-08-05 23:15 /user/hive/warehouse/user00.db/mysql_data/part-m-00001 drwxr-xr-x - user00 supergroup 0 2015-08-05 23:30 /user/hive/warehouse/user00.db/pitching -rw-r--r-- 1 user00 supergroup 3602473 2015-08-05 23:30 /user/hive/warehouse/user00.db/pitching/Pitching.csv drwxrwxr-x - impala hive 0 2015-08-05 23:53 /user/hive/warehouse/user00.db/test drwxrwxr-x - impala hive 0 2015-08-05 23:49 /user/hive/warehouse/user00.db/users -rw-r--r-- 1 impala hive 36 2015-08-05 23:49 /user/hive/warehouse/user00.db/users/2d4f2425a22921e6-dee9b22ea2933490_1342285975_data.0. drwxrwxr-x - impala hive 0 2015-08-05 23:49 /user/hive/warehouse/user00.db/users/_impala_insert_staging
use userXX; COMPUTE STATS users; COMPUTE STATS logins; SHOW TABLE STATS users; SHOW TABLE STATS logins; SHOW COLUMN STATS users; SHOW COLUMN STATS logins;
執行結果如下所示:
[user00@master ~]$ impala-shell Starting Impala Shell without Kerberos authentication Connected to master:21000 Welcome to the Impala shell. Press TAB twice to see a list of available commands. Copyright (c) 2012 Cloudera, Inc. All rights reserved. [master:21000] > use user00; Query: use user00 [master:21000] > COMPUTE STATS users; Query: compute STATS users +-----------------------------------------+ | summary | +-----------------------------------------+ | Updated 1 partition(s) and 2 column(s). | +-----------------------------------------+ Fetched 1 row(s) in 5.23s [master:21000] > COMPUTE STATS logins; Query: compute STATS logins +-----------------------------------------+ | summary | +-----------------------------------------+ | Updated 1 partition(s) and 2 column(s). | +-----------------------------------------+ Fetched 1 row(s) in 4.92s [master:21000] > SHOW TABLE STATS users; Query: show TABLE STATS users +-------+--------+------+--------------+--------+-------------------+ | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats | +-------+--------+------+--------------+--------+-------------------+ | 5 | 1 | 36B | NOT CACHED | TEXT | false | +-------+--------+------+--------------+--------+-------------------+ Fetched 1 row(s) in 0.01s [master:21000] > SHOW TABLE STATS logins; Query: show TABLE STATS logins +-------+--------+------+--------------+--------+-------------------+ | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats | +-------+--------+------+--------------+--------+-------------------+ | 5 | 1 | 110B | NOT CACHED | TEXT | false | +-------+--------+------+--------------+--------+-------------------+ Fetched 1 row(s) in 0.00s [master:21000] > SHOW COLUMN STATS users; Query: show COLUMN STATS users +--------+--------+------------------+--------+----------+-------------------+ | Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size | +--------+--------+------------------+--------+----------+-------------------+ | id | INT | 5 | -1 | 4 | 4 | | name | STRING | 5 | -1 | 6 | 4.199999809265137 | +--------+--------+------------------+--------+----------+-------------------+ Fetched 2 row(s) in 0.00s [master:21000] > SHOW COLUMN STATS logins; Query: show COLUMN STATS logins +--------+-----------+------------------+--------+----------+----------+ | Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size | +--------+-----------+------------------+--------+----------+----------+ | id | INT | 5 | -1 | 4 | 4 | | login | TIMESTAMP | 5 | -1 | 16 | 16 | +--------+-----------+------------------+--------+----------+----------+ Fetched 2 row(s) in 0.00s