Skip to main content

Run Vector Analysis Queries

This topic aims to guide you through the process of getting started with vector searches.


Prerequisites

  • At least one DW service unit for vectors is available in the environment.

  • The target table and data already exist in the target database. For instructions on how to import data into the database, refer to Import Data.

  • You have obtained the username and password to log in to the DW service unit for vectors.


Vector analysis examples

Here are some example queries for calculating the squared Euclidean distance.

Q1: Find the 10 vectors closest to the target vector

SELECT id,
vector <-> '[0.6377800107002258,0.9509999752044678,0.9408400058746338,-0.5509499907493591,0.06180400028824806,-1.6734999418258667,-0.5704600214958191,-1.5750000476837158,0.5274199843406677,-0.3642300069332123,0.5622000098228455,0.009283199906349182,0.391759991645813,0.46647000312805176,-0.7589899897575378,0.3084399998188019,0.4611699879169464,0.30028998851776123,1.5491000413894653,1.2386000156402588,-0.7254599928855896,1.7488000392913818,0.4075799882411957,-1.96589994430542,0.05322200059890747]'
AS dist FROM test_tbl ORDER BY dist LIMIT 10;

If you run this command in the console workbook, the results will appear as shown below:

Q2: Query the top 20 most similar data entries from August 8th to 15th, using the formula dist * 10 to calculate similarity, and return only those with a similarity score above 67

SELECT b.* FROM (SELECT a.id, a.dist * 10 AS similarity FROM
(SELECT id,
vector <-> '[0.6377800107002258,0.9509999752044678,0.9408400058746338,-0.5509499907493591,0.06180400028824806,-1.6734999418258667,-0.5704600214958191,-1.5750000476837158,0.5274199843406677,-0.3642300069332123,0.5622000098228455,0.009283199906349182,0.391759991645813,0.46647000312805176,-0.7589899897575378,0.3084399998188019,0.4611699879169464,0.30028998851776123,1.5491000413894653,1.2386000156402588,-0.7254599928855896,1.7488000392913818,0.4075799882411957,-1.96589994430542,0.05322200059890747]'
AS dist FROM test_tbl
WHERE post_publish_time >= '2024-08-08 00:00:00'
AND post_publish_time <= '2024-08-15 10:52:00'
ORDER BY dist ASC LIMIT 5000) AS a) AS b WHERE b.similarity > 67 OFFSET 0 LIMIT 20;

If you run this command in the console workbook, the results will appear as shown below: