Skip to main content

CREATE EXTERNAL TABLE

Creates an external table.


Syntax

CREATE [READABLE | WRITABLE] EXTERNAL TABLE <table_name>      
( <column_name> <data_type> [, ...] | LIKE <other_table >)
LOCATION ('s3://<s3_endpoint>/<bucket_name>/[<s3_prefix>]
accessid=<access_key_id>
secret=<access_secret>
[threadnum=<thread_number>]
[chunksize=<chunk_size>]')
FORMAT 'TEXT'
[( [HEADER]
[DELIMITER [AS] '<delimiter>' | 'OFF']
[NULL [AS] '<null string>']
[ESCAPE [AS] '<escape>' | 'OFF']
[FILL MISSING FIELDS] )]
| 'CSV'
[( [HEADER]
[QUOTE [AS] '<quote>']
[DELIMITER [AS] '<delimiter>']
[NULL [AS] '<null string>']
[FORCE NOT NULL <column> [, ...]]
[ESCAPE [AS] '<escape>']
[FILL MISSING FIELDS] )]
| 'PARQUET'
| 'ORC'

Description

CREATE EXTERNAL TABLE creates a new readable external table definition in Relyt. Readable external tables are typically used for fast, parallel data loading. Once an external table is defined, you can query its data directly (and in parallel) using SQL commands. For example, you can select, join, or sort external table data. You can also create views for external tables. DML operations (UPDATE, INSERT, DELETE, or TRUNCATE) are not allowed on readable external tables, and you cannot create indexes on readable external tables.


Parameters

  • [READABLE | WRITABLE]

    The type of the external table. If not specified, the default READABLE is used.

    • READABLE: Readable external tables are used to load data to Relyt.

    • WRITABLE: Writable external tables are used to unload data.

  • <table_name>

    The name of the external table.

  • <column_name>

    The name of the column.

    Do not specify column constraints and default values for columns in external tables, because external tables do not support column constraints and default values.

  • <data_type>

    The data type of the column.

  • LOCATION

    The location of the external data source. Currently, only data sources from AWS S3 are supported.

  • <s3_endpoint>

    The endpoint to connect to the external data source. Supported endpoints include:

  • <bucket_name>

    The name of the S3 bucket.

  • [S3_prefix]

    An optional path prefix within the S3 bucket. If specified, only objects under this path prefix will be read.

  • accessid

    The AWS Access ID.

  • secret

    The AWS Access Secret.

  • threadnum

    The number of parallel threads to use when transferring data. The value ranges from 1 to 8. The default value for TEXT and CSV is 4 and that for PARQUET is 1.

  • chunksize

    The size of chunks to transfer at a time, in bytes. The value ranges from 1 MB to 128 MB, and the default value is 8 MB.

  • FORMAT

    The data format. Suppported options include TEXT, CSV, PARQUET, and ORC.

  • DELIMITER

    The single ASCII character that separates columns within each row of data. The default for TEXT is a tab character and that for CSV is a comma (,). For TEXT,you can set DELIMITER to OFF for special use cases in which unstructured data is loaded in to a single-column table.

    For the s3 protocol, the delimiter cannot be a newline character (\n) or a carriage return character (\r).

  • NULL

    The string that represents a NULL value. The default is \N (backslash-N) in TEXT mode, and an empty value with no quotations in CSV mode. You might prefer an empty string even in TEXT mode for cases where you do not want to distinguish NULL values from empty strings. When using external and web tables, any data item that matches this string will be considered a NULL value.

    As an example for the text format, this FORMAT clause can be used to specify that the string of two single quotes ('') is a NULL value.

    FORMAT 'text' (delimiter ',' null '\'\'\'\'' )
  • ESCAPE

    The single character that is used for C espace sequences, such as \n, \t, or \100 and for escaping data characters that might otherwise be taken as row or column delimiters. Make sure to choose an escape character that is not used anywhere in your actual column data. The default escape character is a backslash (\) for text-formatted files and a double quote(") for csv-formatted files, however it is possible to specify another character to represent an escape. It is also possible to deactivate escaping in text-formatted files by specifying the value 'OFF' as the escape value. This is very useful for data such as text-formatted web log data that has many embedded backslashes that are not intended to be escapes.

  • HEADER

    For readable external tables, specifies that the first line in the data files is a header row (contains the names of the table columns) and should not be included as data for the table. If using multiple data source files, all files must have a header row.

    For the s3 protocol, the column names in the header row cannot contain a newline character (\n) or a carriage return (\r).

  • QUOTE

    The quotation character for CSV mode. The default is double-quote (").

  • FORCE NOT NULL

    In CSV mode, processes each specified column as though it were quoted and hence not a NULL value. For the default null string in CSV mode (nothing between two delimiters), this causes missing values to be evaluated as zero-length strings.

  • FILL MISSING FIELDS

    In both TEXT and CSV mode for readable external tables, specifying FILL MISSING FIELDS will set missing trailing field values to NULL (instead of reporting an error) when a row of data has missing data fields at the end of a line or row. Blank rows, fields with a NOT NULL constraint, and trailing delimiters on a line will still report an error.


Examples

Create an external table named nation_ext using the s3 protocol:


CREATE WRITABLE EXTERNAL TABLE ks3_writable_ext (
id integer
)
LOCATION ('s3://ks3-cn-beijing.ksyuncs.com/******/ext-write/
accessid=************
secret=************')
FORMAT 'CSV';

SQL standard compatibility

CREATE EXTERNAL TABLE is a Relyt extension.