實作一

Sqoop (1) : 將 MySQL 資料表匯出到 HDFS

檢驗資料庫連線

  1. 請輸入指令 sqoop version ,檢查 Sqoop 的版本
    [user00@master ~]$ sqoop version
    16/07/08 06:34:13 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.3.1
    Sqoop 1.4.5-cdh5.3.1
    git commit id 
    Compiled by jenkins on Tue Jan 27 16:34:48 PST 2015
    
  2. 請輸入指令 sqoop help ,查詢目前 Sqoop 版本支援的指令:
    [user00@master ~]$ sqoop
    Try 'sqoop help' for usage.
    [user00@master ~]$ sqoop help
    16/07/08 06:37:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.3.1
    usage: sqoop COMMAND [ARGS]
    . 
    Available commands:
      codegen            Generate code to interact with database records
      create-hive-table  Import a table definition into Hive
      eval               Evaluate a SQL statement and display the results
      export             Export an HDFS directory to a database table
      help               List available commands
      import             Import a table from a database to HDFS
      import-all-tables  Import tables from a database to HDFS
      import-mainframe   Import datasets from a mainframe server to HDFS
      job                Work with saved jobs
      list-databases     List available databases on a server
      list-tables        List available tables in a database
      merge              Merge results of incremental imports
      metastore          Run a standalone Sqoop metastore
      version            Display version information
    . 
    See 'sqoop help COMMAND' for information on a specific command.
    
  3. 請輸入如下指令並將「帳號」處替換成您的帳號:
    user00@master ~ $ export DBID=帳號
    ## 或者使用環境變數 $USER 來代表您的登入帳號
    user00@master ~ $ export DBID=$USER
    
  4. 請輸入指令 sqoop list-databases 檢查是否可以正常連線資料庫。
    [user00@master ~]$ sqoop list-databases --connect "jdbc:mysql://localhost/"  --username $DBID -P
    Enter password: 輸入密碼
    
  5. 若資料庫連線正常,您將看到以下的類似訊息:
    information_schema
    user00
    
  6. 請輸入指令 sqoop list-tables 檢查是否可以正常瀏覽資料表。
    [user00@master ~]$ sqoop list-tables --connect "jdbc:mysql://localhost/$DBID"  --username $DBID -P
    Enter password: 輸入密碼
    
  7. 若可正常瀏覽資料表,您將看到以下的類似訊息:
    16/07/08 06:57:36 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
    mysql_data
    

執行記錄匯出

  1. 請輸入指令 sqoop import 進行資料表紀錄的匯出:
    [user00@master ~]$ sqoop import --connect "jdbc:mysql://localhost/$DBID" --table mysql_data --username $DBID -P
    Enter password: 輸入密碼
    
  2. 若資料轉換有正常執行,您將看到以下的類似訊息:
    16/07/08 07:02:30 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
    16/07/08 07:02:30 INFO tool.CodeGenTool: Beginning code generation
    16/07/08 07:02:31 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `mysql_data` AS t LIMIT 1
    16/07/08 07:02:31 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `mysql_data` AS t LIMIT 1
    16/07/08 07:02:31 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
    Note: /tmp/sqoop-user00/compile/223418ffeba6b499e5a2444342aca53b/mysql_data.java uses or overrides a deprecated API.
    Note: Recompile with -Xlint:deprecation for details.
    16/07/08 07:02:33 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-user00/compile/223418ffeba6b499e5a2444342aca53b/mysql_data.jar
    16/07/08 07:02:33 WARN manager.MySQLManager: It looks like you are importing from mysql.
    16/07/08 07:02:33 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
    16/07/08 07:02:33 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
    16/07/08 07:02:33 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
    16/07/08 07:02:33 INFO mapreduce.ImportJobBase: Beginning import of mysql_data
    .
    ... 略 ...
    .
      File Input Format Counters 
          Bytes Read=0
      File Output Format Counters 
          Bytes Written=16
    16/07/08 07:03:01 INFO mapreduce.ImportJobBase: Transferred 16 bytes in 26.9385 seconds (0.5939 bytes/sec)
    16/07/08 07:03:01 INFO mapreduce.ImportJobBase: Retrieved 2 records.
    

檢查結果檔案

  1. 轉換之結果,可於 HDFS 取得。請檢查目錄下是否有 mysql_data 資料夾。若轉換成功,您將看到 part-m-***** 的結果檔案。
    [user00@master ~]$ hadoop fs -ls mysql_data
    Found 4 items
    -rw-r--r--   1 user00 user00          0 2015-08-05 10:06 mysql_data/_SUCCESS
    drwxr-xr-x   - user00 user00          0 2015-08-05 10:05 mysql_data/_logs
    -rw-r--r--   1 user00 user00          8 2015-08-05 10:06 mysql_data/part-m-00000
    -rw-r--r--   1 user00 user00          8 2015-08-05 10:06 mysql_data/part-m-00001
    
  2. 檢視 part-m-***** 轉換結果
    [user00@master ~]$ hadoop fs -cat mysql_data/part-*
    1,Hello
    2,World
    

常見問題

參考資料

  1. Sqoop: Big data conduit between NoSQL and RDBMS”, , Advisory Consultant, IBM, 23 Jul 2013

本文件最後更新於: