Vector Tables
A vector table is a database table designed to store and manage vector data. This section provides examples of creating a vector table using SQL:1999 syntax, with partial compatibility for SQL:2003 features.
Prerequisites
-
At least one DW service unit for vectors is available in the environment.
-
At least one database exists in the target DW service unit for vectors. For details about how to create a vector database, see the relevant topic in the Import Data topic.
-
You have obtained the username and password to sign in to console of the DW service unit for vectors.
Syntax
CREATE TABLE <table_name>
(
C1 <data_type>,
C2 <data_type>,
......,
CN <vector_type>(128),
PRIMARY KEY (<col_name1>[, <col_name2>, ... , <col_nameN>])
) DISTRIBUTED BY (C1);
For details about the table creation statement, see CREATE TABLE.
The numeric value within the parentheses of the vector column (<vector_type>(N)
) specifies the number of dimensions in the vector. For example, <vector_type>(128)
indicates a vector with 128 dimensions. The table below lists the supported vector column data types in Relyt Vector DPS:
Data Type | Description |
---|---|
vecf16 | float16[] , 16-bit floating-point array. |
vector | float32[] , 32-bit floating-point array. |
vecf16
offers half the storage and twice the performance of vector
with equal accuracy, making it the better choice.
Example
Run the following command to create a heap table named face_table
, where C1
is the primary key, and C4
is the vector column.
CREATE TABLE face_table (
C1 INT,
C2 TIMESTAMP NOT NULL,
C3 VARCHAR(20) NOT NULL,
C4 VECF16(512) NOT NULL,
PRIMARY KEY (C1)
) DISTRIBUTED BY (C1);
Use partitioned tables to manage incremental data
For time-based data (e.g., imported based on time and retained for N
days before expiring), the PARTITION BY
statement can divide the table into partitions, enabling independent data imports and TRUNCATE
operations for efficient incremental data management.
Here is an example statement:
CREATE TABLE test_tbl
(
C1 VARCHAR,
C2 VECF16(512),
post_publish_time TIMESTAMP
)
DISTRIBUTED BY (C1)
PARTITION BY RANGE (post_publish_time) (START (DATE '2024-08-01') INCLUSIVE END (DATE '2024-08-31') EXCLUSIVE EVERY(INTERVAL '1 day'), DEFAULT PARTITION extra);