Introduction to Impala

Impala 簡介

Jazz Yao-Tsung Wang

Agenda

Introduction to Impala

Components of the Impala Server

Architecture of the Impala Server


high-level architectural view of Impala

File Format support in Impala (1)

Table 1. File Format Support in Impala
File Type Format Compression Codecs Impala Can CREATE? Impala Can INSERT?
Parquet Structured Snappy, gzip; currently Snappy by default Yes. Yes: CREATE TABLE, INSERT, LOAD DATA, and query.
Text Unstructured LZO, gzip, bzip2, Snappy Yes. For CREATE TABLE with no STORED AS clause, the default file format is uncompressed text, with values separated by ASCII 0x01 characters (typically represented as Ctrl-A). Yes: CREATE TABLE, INSERT, LOAD DATA, and query. If LZO compression is used, you must create the table and load data in Hive. If other kinds of compression are used, you must load data through LOAD DATA, Hive, or manually in HDFS.
Avro Structured Snappy, gzip, deflate, bzip2 Yes, in Impala 1.4.0 and higher. Before that, create the table using Hive. No. Load data through LOAD DATA on data files already in the right format, or use INSERT in Hive.

File Format support in Impala (2)

Table 1. File Format Support in Impala
File Type Format Compression Codecs Impala Can CREATE? Impala Can INSERT?
RCFile Structured Snappy, gzip, deflate, bzip2 Yes. No. Load data through LOAD DATA on data files already in the right format, or use INSERT in Hive.
SequenceFile Structured Snappy, gzip, deflate, bzip2 No. Load data through LOAD DATA on data files already in the right format, or use INSERT in Hive. Yes, in Impala 2.0 and higher. For earlier Impala releases, load data through LOAD DATA on data files already in the right format, or use INSERT in Hive.

Impala SQL Language

SQL Differences Between Impala and Hive