實作七

Hive (4) : EXPLAIN 與 DESCRIBE

技巧一: EXPLAIN

  1. 有時候邏輯上相同的 HiveQL,在執行時間上會有所不同。或者,有時候我們只是需要瞭解進行某一個複雜的 HiveQL 查詢,需要做幾次 MapReduce。這時候就可以使用 EXPLAIN 這個關鍵字來進行查詢。語法為 EXPLAIN <HiveQL> ;
    hive> EXPLAIN SELECT * FROM userXX.fact_player_stats LIMIT 10; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME user00 fact_player_stats))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_LIMIT 10)))

STAGE DEPENDENCIES: Stage-0 is a root stage

STAGE PLANS: Stage: Stage-0 Fetch Operator limit: 10 Processor Tree: TableScan alias: fact_player_stats Select Operator expressions: expr: playerid type: string expr: teamid type: string expr: ab type: int expr: r type: int expr: h type: int expr: 2b type: int expr: 3b type: int expr: hr type: int expr: rbi type: int expr: g type: int outputColumnNames: col0, col1, col2, col3, col4, col5, col6, col7, col8, col9 Limit ListSink

Time taken: 1.166 seconds
hive> use userXX; 
hive> EXPLAIN SELECT A.PlayerID, B.teamID, B.AB, B.R, B.H, B.2B, B.3B, B.HR, B.RBI FROM Master A JOIN BATTING B ON A.playerID = B.playerID;
OK
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME Master) A) (TOK_TABREF (TOK_TABNAME batting) B) (= (. (TOK_TABLE_OR_COL A) playerID) (. (TOK_TABLE_OR_COL B) playerID)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL A) PlayerID)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL B) teamID)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL B) AB)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL B) R)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL B) H)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL B) 2B)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL B) 3B)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL B) HR)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL B) RBI)))))

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        a 
          TableScan
            alias: a
            Reduce Output Operator
              key expressions:
                    expr: playerid
                    type: string
              sort order: +
              Map-reduce partition columns:
                    expr: playerid
                    type: string
              tag: 0
              value expressions:
                    expr: playerid
                    type: string
        b 
          TableScan
            alias: b
            Reduce Output Operator
              key expressions:
                    expr: playerid
                    type: string
              sort order: +
              Map-reduce partition columns:
                    expr: playerid
                    type: string
              tag: 1
              value expressions:
                    expr: teamid
                    type: string
                    expr: ab
                    type: int
                    expr: r
                    type: int
                    expr: h
                    type: int
                    expr: 2b
                    type: int
                    expr: 3b
                    type: int
                    expr: hr
                    type: int
                    expr: rbi
                    type: int
      Reduce Operator Tree:
        Join Operator
          condition map:
               Inner Join 0 to 1
          condition expressions:
            0 {VALUE._col1}
            1 {VALUE._col3} {VALUE._col7} {VALUE._col8} {VALUE._col9} {VALUE._col10} {VALUE._col11} {VALUE._col12} {VALUE._col13}
          handleSkewJoin: false
          outputColumnNames: _col1, _col38, _col42, _col43, _col44, _col45, _col46, _col47, _col48
          Select Operator
            expressions:
                  expr: _col1
                  type: string
                  expr: _col38
                  type: string
                  expr: _col42
                  type: int
                  expr: _col43
                  type: int
                  expr: _col44
                  type: int
                  expr: _col45
                  type: int
                  expr: _col46
                  type: int
                  expr: _col47
                  type: int
                  expr: _col48
                  type: int
            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
            File Output Operator
              compressed: false
              GlobalTableId: 0
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

  Stage: Stage-0
    Fetch Operator
      limit: -1


Time taken: 0.679 seconds

技巧二: DESCRIBE

  1. 有時候如果您想要查詢資料表的 Schema 或者某個 Hive 內建函數的定義時,您可以使用 DESCRIBE 這個關鍵字。
  2. 語法一:DESCRIBE <資料表名稱>;
    hive> use userXX
    hive> DESCRIBE mysql_data;
    OK
    id  int 
    name    string  
    Time taken: 0.126 seconds
    
    hive> DESCRIBE master;
    OK
    lahmanid    int 
    playerid    string  
    managerid   int 
    hofid   string  
    birthyear   int 
    birthmonth  int 
    birthday    int 
    birthcountry    string  
    birthstate  string  
    birthcity   string  
    deathyear   int 
    deathmonth  int 
    deathday    int 
    deathcountry    string  
    deathstate  string  
    deathcity   string  
    namefirst   string  
    namelast    string  
    namenote    string  
    namegiven   string  
    namenick    string  
    weight  int 
    height  int 
    bats    string  
    throws  string  
    debut   string  
    finalgame   string  
    college string  
    lahman40id  string  
    lahman45id  string  
    retroid string  
    holtzid string  
    bbrefid string  
    Time taken: 0.122 seconds
    
  3. 語法二:DESCRIBE FUNCTION <函數名稱>;
    hive> DESCRIBE FUNCTION max;
    OK
    max(expr) - Returns the maximum value of expr
    Time taken: 0.082 seconds
    hive> DESCRIBE FUNCTION count;
    OK
    count(*) - Returns the total number of retrieved rows, including rows containing NULL values.
    count(expr) - Returns the number of rows for which the supplied expression is non-NULL.
    count(DISTINCT expr[, expr...]) - Returns the number of rows for which the supplied expression(s) are unique and non-NULL.
    Time taken: 0.111 seconds
    

本文件最後更新於: