Skip to main content

Vector Tables

A vector table is a database table designed to store and manage vector data. This section provides examples of creating a vector table using SQL:1999 syntax, with partial compatibility for SQL:2003 features.


Prerequisites

  • At least one DW service unit for vectors is available in the environment.

  • At least one database exists in the target DW service unit for vectors. For details about how to create a vector database, see the relevant topic in the Import Data topic.

  • You have obtained the username and password to sign in to console of the DW service unit for vectors.


Syntax

CREATE TABLE <table_name>
(
C1 <data_type>,
C2 <data_type>,
......,
CN <vector_type>(128),
PRIMARY KEY (<col_name1>[, <col_name2>, ... , <col_nameN>])
) DISTRIBUTED BY (C1);

For details about the table creation statement, see CREATE TABLE.

The numeric value within the parentheses of the vector column (<vector_type>(N)) specifies the number of dimensions in the vector. For example, <vector_type>(128) indicates a vector with 128 dimensions. The table below lists the supported vector column data types in Relyt Vector DPS:

Data TypeDescription
vecf16float16[], 16-bit floating-point array.
vectorfloat32[], 32-bit floating-point array.

info

vecf16 offers half the storage and twice the performance of vector with equal accuracy, making it the better choice.


Example

Run the following command to create a heap table named face_table, where C1 is the primary key, and C4 is the vector column.

CREATE TABLE face_table (  
C1 INT,
C2 TIMESTAMP NOT NULL,
C3 VARCHAR(20) NOT NULL,
C4 VECF16(512) NOT NULL,
PRIMARY KEY (C1)
) DISTRIBUTED BY (C1);


Use partitioned tables to manage incremental data

For time-based data (e.g., imported based on time and retained for N days before expiring), the PARTITION BY statement can divide the table into partitions, enabling independent data imports and TRUNCATE operations for efficient incremental data management.

Here is an example statement:

CREATE TABLE test_tbl
(
C1 VARCHAR,
C2 VECF16(512),
post_publish_time TIMESTAMP
)
DISTRIBUTED BY (C1)
PARTITION BY RANGE (post_publish_time) (START (DATE '2024-08-01') INCLUSIVE END (DATE '2024-08-31') EXCLUSIVE EVERY(INTERVAL '1 day'), DEFAULT PARTITION extra);