Run Vector Analysis Queries
This topic aims to guide you through the process of getting started with vector searches.
Prerequisites
-
At least one DW service unit for vectors is available in the environment.
-
The target table and data already exist in the target database. For instructions on how to import data into the database, refer to Import Data.
-
You have obtained the username and password to log in to the DW service unit for vectors.
Vector analysis examples
Here are some example queries for calculating the squared Euclidean distance.
Q1: Find the 10 vectors closest to the target vector
SELECT id,
vector <-> '[0.6377800107002258,0.9509999752044678,0.9408400058746338,-0.5509499907493591,0.06180400028824806,-1.6734999418258667,-0.5704600214958191,-1.5750000476837158,0.5274199843406677,-0.3642300069332123,0.5622000098228455,0.009283199906349182,0.391759991645813,0.46647000312805176,-0.7589899897575378,0.3084399998188019,0.4611699879169464,0.30028998851776123,1.5491000413894653,1.2386000156402588,-0.7254599928855896,1.7488000392913818,0.4075799882411957,-1.96589994430542,0.05322200059890747]'
AS dist FROM test_tbl ORDER BY dist LIMIT 10;
If you run this command in the console workbook, the results will appear as shown below:
Q2: Query the top 20 most similar data entries from August 8th to 15th, using the formula dist * 10
to calculate similarity, and return only those with a similarity score above 67
SELECT b.* FROM (SELECT a.id, a.dist * 10 AS similarity FROM
(SELECT id,
vector <-> '[0.6377800107002258,0.9509999752044678,0.9408400058746338,-0.5509499907493591,0.06180400028824806,-1.6734999418258667,-0.5704600214958191,-1.5750000476837158,0.5274199843406677,-0.3642300069332123,0.5622000098228455,0.009283199906349182,0.391759991645813,0.46647000312805176,-0.7589899897575378,0.3084399998188019,0.4611699879169464,0.30028998851776123,1.5491000413894653,1.2386000156402588,-0.7254599928855896,1.7488000392913818,0.4075799882411957,-1.96589994430542,0.05322200059890747]'
AS dist FROM test_tbl
WHERE post_publish_time >= '2024-08-08 00:00:00'
AND post_publish_time <= '2024-08-15 10:52:00'
ORDER BY dist ASC LIMIT 5000) AS a) AS b WHERE b.similarity > 67 OFFSET 0 LIMIT 20;
If you run this command in the console workbook, the results will appear as shown below: