Introduction to Sqoop

Sqoop 簡介

Jazz Yao-Tsung Wang

Sqoop 使用語法

user@master ~ $ sqoop help
usage: sqoop COMMAND [ARGS]

Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information

See 'sqoop help COMMAND' for information on a specific command.

Sqoop Import (匯入)

Figure 1: Sqoop Import Overview

Source : “Apache Sqoop – Overview”, by Arvind Prabhakar, October 06, 2011

Figure 1: Sqoop Import Overview

Sqoop Import 語法 (1)

user@master ~ $ sqoop help import
usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS]

Common arguments:

--connect [jdbc-uri]              Specify JDBC connect string
-P                                Read password from console password
--username [username]             Set authentication username

Import control arguments:

--append                          Imports data in append mode
--as-avrodatafile                 Imports data to Avro data files
--as-sequencefile                 Imports data to SequenceFiles
--as-textfile                     Imports data as plain text (default)
--columns [col,col,col...]        Columns to import from table
-e,--query [statement]            Import results of SQL 'statement'
-m,--num-mappers [n]              Use 'n' map tasks to import in parallel
--table [table-name]              Table to read
--target-dir [dir]                HDFS plain table destination
--where [where clause]            WHERE clause to use during import

Sqoop Import 語法 (2)

Hive arguments:

--create-hive-table               Fail if the target hive table exists
--hive-import                     Import tables into Hive
--hive-table [table-name]         Sets the table name to use when importing to hive

HBase arguments:

--column-family [family]          Sets the target column family for the import
--hbase-create-table              If specified, create missing HBase tables
--hbase-row-key [col]             Specifies which input column to use as the row key
--hbase-table [table]             Import to [table] in HBase

At minimum, you must specify --connect and --table

Sqoop Export (匯出)

Figure 2: Sqoop Export Overview

Source : “Apache Sqoop – Overview”, by Arvind Prabhakar, October 06, 2011

Sqoop Export 語法

user@master ~ $ sqoop help export
usage: sqoop export [GENERIC-ARGS] [TOOL-ARGS]

Common arguments:

   --connect [jdbc-uri]            Specify JDBC connect string
-P                                 Read password from console
   --username [username]           Set authentication username

Export control arguments:

   --columns [col,col,col...]      Columns to export to table
   --export-dir [dir]              HDFS source path for the export
-m,--num-mappers [n]               Use 'n' map tasks to export in parallel
   --table [table-name]            Table to populate

At minimum, you must specify --connect, --export-dir, and --table