查询 Cortex Search 服务¶

当您创建 Cortex Search Service 时，系统会预置一个 API 端点来提供低延迟的查询。您可以使用三种 APIs 查询 Cortex Search Service：

Python API
REST API
SQL SEARCH_PREVIEW 函数

参数¶

所有 APIs 都支持相同的查询参数集：

	参数	描述
必填	`query`	搜索查询，用于在服务的文本列中进行搜索。
可选	`columns`	A comma-separated list of columns to return for each relevant result in the response. These columns must be included in the source query for the service. If this parameter is not provided, only the search column is returned in the response.
	`filter`	筛选对象，用于根据 `ATTRIBUTES` 列中的数据筛选结果。请参阅筛选器语法了解具体语法。
	`scoring_config`	Configuration object for customizing search ranking behavior. See Customizing Cortex Search scoring for syntax.
	`scoring_profile`	查询中使用的命名评分配置文件，之前已通过 ALTER CORTEX SEARCH SERVICE ... ADD SCORING PROFILE 定义。如果提供了 `scoring_profile`，则任何提供的 `scoring_config` 将被忽略。
	`limit`	Maximum number of results to return in the response, up to 1000. The default limit is 10.

In addition, the SQL and Python APIs support multi-index queries. Using multi-index parameters allows for refining results from Cortex Search and reducing query cost by limiting the number of columns searched.

参数	描述
`multi_index_query`	The map used to determine which indexes to query. Each key in the map is the name of an indexed column, and each value is an array containing maps that define the query: If the index is a text index or a managed vector index, the query array can contain: Text queries: `{"text": "search_text"}` Vector queries, as an embedding vector: `{"vector": [vector_values]}` If the index is a user-provided vector embedding column, the query array can contain: If a `query_model` was specified at creation time for automatic embeddings, text queries: `{"text": "search_text"}`. Vector queries, as an embedding vector: `{"vector": [vector_values]}`

参数

描述

multi_index_query

The map used to determine which indexes to query. Each key in the map is the name of an indexed column, and each value is an array containing maps that define the query:

If the index is a text index or a managed vector index, the query array can contain:
- Text queries: {"text": "search_text"}
- Vector queries, as an embedding vector: {"vector": [vector_values]}
If the index is a user-provided vector embedding column, the query array can contain:
- If a query_model was specified at creation time for automatic embeddings, text queries: {"text": "search_text"}.
- Vector queries, as an embedding vector: {"vector": [vector_values]}

备注

Multi-index Cortex Search services can still be searched through the REST API or without the multi_index_query parameter. This causes an unrestricted search over all indexed columns, which affects query cost. For details on estimating cost for multi-index query compute, see Understanding cost for Cortex Search Services - Multi-index search.

语法¶

Simple queries to a Cortex Search Service use the following syntax:

import os
from snowflake.core import Root
from snowflake.snowpark import Session

# connect to Snowflake
CONNECTION_PARAMETERS = { ... }
session = Session.builder.configs(CONNECTION_PARAMETERS).create()
root = Root(session)

# fetch service
my_service = (root
    .databases["<service_database>"]
    .schemas["<service_schema>"]
    .cortex_search_services["<service_name>"]
)

# query service
resp = my_service.search(
    query="<query>",
    columns=["<col1>", "<col2>"],
    filter={"@eq": {"<column>": "<value>"} },
    limit=5
)
print(resp.to_json())

Copy

curl --location https://<ACCOUNT_URL>/api/v2/databases/<DB_NAME>/schemas/<SCHEMA_NAME>/cortex-search-services/<SERVICE_NAME>:query \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "Authorization: Bearer $PAT" \
--data '{
  "query": "<search_query>",
  "columns": ["col1", "col2"],
  "filter": <filter>,
  "limit": <limit>
}'

Copy

SELECT PARSE_JSON(
  SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
      'my_search_service',
      '{
         "query": "preview query",
         "columns":[
            "col1",
            "col2"
         ],
         "filter": {"@eq": {"col1": "filter value"} },
         "limit":10
      }'
  )
)['results'] as results;

Copy

Multi-index query syntax¶

Querying specific indices only or using a service with vector embeddings for a multi-index Cortex Search service uses the following syntax:

from snowflake.core import Root
from snowflake.snowpark import Session

session = Session.builder.configs( {...} ).create()
root = Root(session)

my_service = (root
  .databases["<service_database>"]
  .schemas["<service_schema>"]
  .cortex_search_services["<service_name>"]
)

resp = my_service.search(
    multi_index_query={
        "<index_name>": [
            {"text": "<search_text>"},
            {"vector": [<vector_values>]},
            ...
        ],
        ...
    },
    scoring_config={
        "weights": {
            "texts": <text_weight>,
            "vectors": <vector_weight>,
            "reranker": <reranker_weight>
        },
        "functions": {
            "vector_boosts": [
                {"weight": <weight>, "column": "<vector_column_name>"},
                ...
            ],
            "text_boosts": [
                {"weight": <weight>, "column": "<text_column_name>"},
                ...
            ]
        }
    },
    columns=["<column_name>", "<column_name>", ...],
    limit=<limit>
)

Copy

SELECT SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
      '<service_name>',
      '{
        "multi_index_query": {
          "<index_name>": [
            {"text": "<search_text>"},
            {"vector": [<vector_values>]},
            ...
          ],
          ...
        },
        "columns": ["<column_name>", "<column_name>", ...],
        "limit": <limit>,
        "scoring_config": {
          "weights": {
            "texts": <text_weight>,
            "vectors": <vector_weight>,
            "reranker": <reranker_weight>
          },
          "functions": {
            "vector_boosts": [
              {"weight": <weight>, "column": "<vector_column_name>"},
              ...
            ],
            "text_boosts": [
              {"weight": <weight>, "column": "<text_column_name>"}
              , ...
            ]
          }
        }
      }'
  );

Copy

设置和身份验证¶

Python API¶

可使用 Snowflake Python APIs 版本 0.8.0 或更高版本来查询 Cortex Search Service。有关 Snowflake Python APIs 的更多信息，请参阅 Snowflake Python APIs：使用 Python 管理 Snowflake 对象。

安装 Snowflake Python API 库¶

首先，从 PyPI 安装最新版本的 Snowflake Python APIs 包。有关通过 PyPI 安装此包的说明，请参阅安装 Snowflake Python APIs 库。

pip install snowflake -U

Copy

连接到 Snowflake¶

使用 Snowpark Session 或 Python Connector Connection 连接到 Snowflake，并创建 Root 对象。有关连接到 Snowflake 的更多说明，请参阅使用 Snowflake Python APIs 连接到 Snowflake。下面的示例使用 Snowpark Session 对象和 Python 字典进行配置。

import os
from snowflake.core import Root
from snowflake.snowpark import Session

CONNECTION_PARAMETERS = {
    "account": os.environ["snowflake_account_demo"],
    "user": os.environ["snowflake_user_demo"],
    "password": os.environ["snowflake_password_demo"],
    "role": "test_role",
    "database": "test_database",
    "warehouse": "test_warehouse",
    "schema": "test_schema",
}

session = Session.builder.configs(CONNECTION_PARAMETERS).create()
root = Root(session)

Copy

备注

查询 Cortex Search Service 需要使 Snowflake Python APIs 库的版本 0.8.0 或更高版本。

REST API¶

Cortex Search 在 Snowflake REST APIs 套件中提供了一个 REST API 端点。为 Cortex Search 服务生成的 REST 端点结构如下：

https://<account_url>/api/v2/databases/<db_name>/schemas/<schema_name>/cortex-search-services/<service_name>:query

Copy

其中：

<account_url>：您的 Snowflake 账户 URL。请参阅查找账户的组织名，了解有关查找账户 URL 的说明。
<db_name>：服务所在的数据库。
<schema_name>：服务所在的架构。
<service_name>：服务的名称。
:query: The method to invoke on the service; in this case, the query method.

有关更多详细信息，请参阅有关 Cortex Search Service 的 REST API 参考。

身份验证¶

Snowflake REST APIs 支持通过编程访问令牌 (PATs) 进行身份验证、使用 JSON Web 令牌 (JWTs) 进行密钥对身份验证，以及 OAuth。详情请参阅使用 Snowflake 对 Snowflake REST APIs 进行身份验证。

SQL SEARCH_PREVIEW 函数¶

The SNOWFLAKE.CORTEX.SEARCH_PREVIEW function allows you to preview the results of individual queries to a Cortex Search Service from within a SQL environment such as a worksheet or Snowflake notebook cell. This function makes it easy to interactively validate that a service has populated correctly and is serving reasonable results.

重要

SEARCH_PREVIEW 功能用于 Cortex Search Services 的测试和验证。它并非用于在终端用户应用程序中提供搜索查询。

The function operates only on string literals. It does not accept batch text data.
该函数的延迟高于 REST 和 Python APIs。

筛选器语法¶

Cortex Search 支持对在 CREATE CORTEX SEARCH SERVICE 命令中指定的 ATTRIBUTES 列进行筛选。

Cortex Search 支持五种匹配运算符：

TEXT 或 NUMERIC 等效运算符：@eq
ARRAY 包含运算符：@contains
NUMERIC 或 DATE/TIMESTAMP 大于或等于运算符：@gte
NUMERIC 或 DATE/TIMESTAMP 小于或等于运算符：@lte
主键相等性：@primarykey

这些匹配运算符可以由各种逻辑运算符组成：

@and
@or
@not

使用说明¶

Matching against NaN ('not a number') values in the source query is handled as described in 特殊值.
定点数值若超过 19 位（不含前导零）时，无法与 @eq、@gte 或 @lte 运算符兼容，这些运算符将不会返回此类数值（但通过使用 @not 运算符的完整查询仍可能返回这些结果）。
TIMESTAMP and DATE filters accept values of the form: YYYY-MM-DD and, for timezone aware dates: YYYY-MM-DD+HH:MM. If the timezone offset is not specified, the date is interpreted in UTC.
TIMESTAMP and DATE filters do not support dates past 9999-12-30.
@primarykey 仅适用于配置了主键的服务。筛选器的值必须为 JSON 对象，且需将每个主键列映射到其对应的值（或 NULL）。

这些运算符可以组合成一个筛选器对象。

示例¶

筛选字符串类列 string_col 等于值 value 的行。
```
{ "@eq": { "string_col": "value" } }
```
Copy
根据指定主键值筛选行：当 region 列中的值为 us-west-1 且 agent_id 列中的值为 abc123 时：
```
{ "@primarykey": { "region": "us-west-1", "agent_id": "abc123" } }
```
Copy

筛选 ARRAY 列 array_col 包含值 value 的行。

{ "@contains": { "array_col": "arr_value" } }

Copy

在 NUMERIC 列 numeric_col 处于 10.5 和 12.5（含）之间的行上进行筛选：

{
  "@and": [
    { "@gte": { "numeric_col": 10.5 } },
    { "@lte": { "numeric_col": 12.5 } }
  ]
}

Copy

在 TIMESTAMP 列 timestamp_col 处于 2024-11-19 和 ``2024-12-19``（含）之间的行上进行筛选。

{
  "@and": [
    { "@gte": { "timestamp_col": "2024-11-19" } },
    { "@lte": { "timestamp_col": "2024-12-19" } }
  ]
}

Copy

用逻辑运算符组成筛选器：

// Rows where the "array_col" column contains "arr_value" and the "string_col" column equals "value"
{
  "@and": [
    { "@contains": { "array_col": "arr_value" } },
    { "@eq": { "string_col": "value" } }
  ]
}

// Rows where the "string_col" column does not equal "value"
{
  "@not": { "@eq": { "string_col": "value" } }
}

// Rows where the "array_col" column contains at least one of "val1", "val2", or "val3"
{
  "@or": [
    { "@contains": { "array_col": "val1" } },
    { "@contains": { "array_col": "val2" } },
    { "@contains": { "array_col": "val3" } }
  ]
}

Copy

Multi-index queries¶

When created as a multi-index Cortex Search service with the CREATE CORTEX SEARCH SERVICE ... TEXT INDEXES ... VECTOR INDEXES syntax, the optional multi_index_query parameter is used. When omitting this parameter, all indices are used in the search.

使用说明¶

Each index to query is represented as a key-value pair in the multi_index_query map.
At least one vector index must be supplied in each query. Querying only text indexes is an error.
When querying a multi-index Cortex Search Service, the following behaviors apply:
- AND across fields: A match in all of the queried text or vector fields is required for a document to be returned.
- OR across terms within a text index field: When a query contains multiple terms such as "wash fold", a document is returned if any of the query terms are found within the document.
- Text queries are automatically normalized using stemming, lemmatization, and domain-specific rewrites via Snowflake's custom analyzer. This improves recall by matching related terms, such as linking "washing" to "wash" and "laundromat" to "laundry".
The scoring_config.weights field modifies the relative weight of each of the 3 high-level scoring techniques (vector, keyword, reranking) in a given query.

Within this field, weights are applied relative to each other. For example, { "texts": 3, "vectors": 2, "reranker": 1 } and { "texts": 30, "vectors": 20, "reranker": 10 } are equivalent.
Using the scoring_config.functions.vector_boosts and scoring_config.functions.text_boosts fields:
- These fields allow users to modify the relative weight of each vector index and text index query, respectively, in a given query.
- Within each field, weights are applied relative to each other, as in scoring_config.weights.
Multi-index queries can be combined with numeric boosts, time decays, and queries that disable reranking. For information on using those features, see 数字提升和时间衰减 and 重新排名.
When querying a multi-index service, the query parameter can be used to specify a query to be applied to all fields, unless the service contains a vector index with user-provided vector embeddings.
To optimize search performance and latency, columns containing vector embeddings are not returned in results when issuing a query to a user-provided vector index.
Snowflake recommends refining your queries to use the multi_index_query on multi-index Cortex Search services to reduce the amount of resources consumed, which affects cost.

For information on estimating pricing for multi-index queries, see Estimating costs for multi-index Cortex Search.

访问控制要求¶

查询 Cortex Search 服务的角色必须拥有以下权限才能检索结果：

权限	对象
USAGE	Cortex Search 服务
USAGE	Cortex Search 服务所在的数据库
USAGE	Cortex Search 服务所在的架构

使用所有者权限进行查询¶

Cortex Search 服务使用所有者权限执行搜索，并与使用所有者权限运行的其他 Snowflake 对象遵循相同的安全模型。

特别是，这意味着任何有足够权限来查询 Cortex Search 服务的角色都可以查询该服务已索引的任何数据，而不管该角色对服务源查询引用的基础对象（例如表和视图）的权限如何。

例如，对于引用具有行级掩码策略的表的 Cortex Search 服务，该服务的查询用户将能够从所有者角色具有读取权限的行查看搜索结果，即使查询用户的角色无法读取源表中的这些行。

例如，在将对 Cortex Search 服务具有 USAGE 权限的角色授予给其他 Snowflake 用户时，请注意。

已知限制¶

Cortex Search Service 的查询受到以下限制：

响应大小：从搜索查询返回给 Cortex Search Service 的响应有效负载的总大小不得超过以下限制：
- REST API 和 Python API：10 兆字节 (MB)
- SQL SEARCH_PREVIEW 函数：300 千字节 (KB)

Multi-index Cortex Search is subject to additional limitations, which may change during preview:

The Cortex Search Playground in the Snowsight UI does not support queries to multi-index services. Queries to multi-index services in the Playground display the message "Unable to query search service. Invalid request parameters or filter syntax."
The multi-index serving query syntax with the multi_index_query parameter is supported only in versions 1.6.0 or later of the Python API.

示例¶

本节提供了使用所有三种 API 方法查询 Cortex Search Service 的完整示例。

示例设置¶

以下示例使用一个名为 business_documents 的表，其中包含时间戳和数值列，用于演示各种功能

CREATE OR REPLACE TABLE business_documents (
    document_contents VARCHAR,
    last_modified_timestamp TIMESTAMP,
    created_timestamp TIMESTAMP,
    likes INT,
    comments INT
);

INSERT INTO business_documents (document_contents, last_modified_timestamp, created_timestamp, likes, comments)
VALUES
    ('Quarterly financial report for Q1 2024: Revenue increased by 15%, with expenses stable.',
     '2024-01-12 10:00:00', '2024-01-10 09:00:00', 10, 20),

    ('IT manual for employees: Instructions for usage of internal technologies, including hardware.',
     '2024-02-10 15:00:00', '2024-02-05 14:30:00', 85, 10),

    ('Employee handbook 2024: Updated policies on remote work, health benefits, and company culture.',
     '2024-02-10 15:00:00', '2024-02-05 14:30:00', 85, 10),

    ('Marketing strategy document: Target audience segmentation for upcoming product launch.',
     '2024-03-15 12:00:00', '2024-03-12 11:15:00', 150, 32),

    ('Product roadmap 2024: Key milestones for tech product development, including the launch.',
     '2024-04-22 17:30:00', '2024-04-20 16:00:00', 200, 45),

    ('Annual performance review process guidelines: Procedures for managers to conduct employee.',
     '2024-05-02 09:30:00', '2024-05-01 08:45:00', 60, 5);

CREATE OR REPLACE CORTEX SEARCH SERVICE business_documents_css
    ON document_contents
    WAREHOUSE = <warehouse_name>
    TARGET_LAG = '1 minute'
AS SELECT * FROM business_documents;

Copy

筛选示例¶

带有相等性过滤器的简单查询¶

resp = business_documents_css.search(
    query="technology",
    columns=["DOCUMENT_CONTENTS", "LIKES"],
    filter={"@eq": {"REGION": "US"}},
    limit=5
)

Copy

curl --location https://<ACCOUNT_URL>/api/v2/databases/<DB_NAME>/schemas/<SCHEMA_NAME>/cortex-search-services/<SERVICE_NAME>:query \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "Authorization: Bearer $PAT" \
--data '{
  "query": "technology",
  "columns": ["DOCUMENT_CONTENTS", "LIKES"],
  "filter": {"@eq": {"REGION": "US"}},
  "limit": 5
}'

Copy

SELECT PARSE_JSON(
  SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
      'business_documents_css',
      '{
         "query": "technology",
         "columns": ["DOCUMENT_CONTENTS", "LIKES"],
         "filter": {"@eq": {"REGION": "US"}},
         "limit": 5
      }'
  )
)['results'] as results;

Copy

范围筛选¶

resp = business_documents_css.search(
    query="business",
    columns=["DOCUMENT_CONTENTS", "LIKES", "COMMENTS"],
    filter={"@and": [
        {"@gte": {"LIKES": 50}},
        {"@lte": {"COMMENTS": 50}}
    ]},
    limit=10
)

Copy

curl --location https://<ACCOUNT_URL>/api/v2/databases/<DB_NAME>/schemas/<SCHEMA_NAME>/cortex-search-services/<SERVICE_NAME>:query \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "Authorization: Bearer $PAT" \
--data '{
  "query": "business",
  "columns": ["DOCUMENT_CONTENTS", "LIKES", "COMMENTS"],
  "filter": {"@and": [
    {"@gte": {"LIKES": 50}},
    {"@lte": {"COMMENTS": 50}}
  ]},
  "limit": 10
}'

Copy

SELECT PARSE_JSON(
  SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
      'business_documents_css',
      '{
         "query": "business",
         "columns": ["DOCUMENT_CONTENTS", "LIKES", "COMMENTS"],
         "filter": {"@and": [
           {"@gte": {"LIKES": 50}},
           {"@lte": {"COMMENTS": 50}}
         ]},
         "limit": 10
      }'
  )
)['results'] as results;

Copy

评分示例¶

数值加权¶

对 likes 和 comments 列同时应用数值加权，其中 comments 列的加权值是 likes 列的两倍。

resp = business_documents_css.search(
    query="technology",
    columns=["DOCUMENT_CONTENTS", "LIKES", "COMMENTS"],
    scoring_config={
        "functions": {
            "numeric_boosts": [
                {"column": "comments", "weight": 2},
                {"column": "likes", "weight": 1}
            ]
        }
    }
)

Copy

curl --location https://<ACCOUNT_URL>/api/v2/databases/<DB_NAME>/schemas/<SCHEMA_NAME>/cortex-search-services/<SERVICE_NAME>:query \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "Authorization: Bearer $PAT" \
--data '{
  "query": "technology",
  "columns": ["DOCUMENT_CONTENTS", "LIKES", "COMMENTS"],
  "scoring_config": {
    "functions": {
      "numeric_boosts": [
        {"column": "comments", "weight": 2},
        {"column": "likes", "weight": 1}
      ]
    }
  }
}'

Copy

SELECT PARSE_JSON(
  SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
      'business_documents_css',
      '{
         "query": "technology",
         "columns": ["DOCUMENT_CONTENTS", "LIKES", "COMMENTS"],
         "scoring_config": {
           "functions": {
             "numeric_boosts": [
               {"column": "comments", "weight": 2},
               {"column": "likes", "weight": 1}
             ]
           }
         }
      }'
  )
)['results'] as results;

Copy

在结果中，请注意：

通过加权后，尽管文档与查询“technology”的相关性略低，但由于其大量的 likes 和 comments，“Product roadmap 2024:...”文档仍排在首位。

在没有任何加权的情况下，该查询的首个结果是“IT manual for employees:...”。

时间衰减¶

根据 LAST_MODIFIED_TIMESTAMP 列应用时间衰减，其中：

相对于当前时间戳，具有较新 LAST_MODIFIED_TIMESTAMP 值的文档会获得加权。

LAST_MODIFIED_TIMESTAMP 值距离当前时间戳超过 240 小时的文档，获得的加权很少。

resp = business_documents_css.search(
    query="technology",
    columns=["DOCUMENT_CONTENTS", "LAST_MODIFIED_TIMESTAMP"],
    scoring_config={
        "functions": {
            "time_decays": [
                {"column": "LAST_MODIFIED_TIMESTAMP", "weight": 1, "limit_hours": 240, "now": "2024-04-23T00:00:00.000-08:00"}
            ]
        }
    }
)

Copy

curl --location https://<ACCOUNT_URL>/api/v2/databases/<DB_NAME>/schemas/<SCHEMA_NAME>/cortex-search-services/<SERVICE_NAME>:query \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "Authorization: Bearer $PAT" \
--data '{
  "query": "technology",
  "columns": ["DOCUMENT_CONTENTS", "LAST_MODIFIED_TIMESTAMP"],
  "scoring_config": {
    "functions": {
      "time_decays": [
        {"column": "LAST_MODIFIED_TIMESTAMP", "weight": 1, "limit_hours": 240, "now": "2024-04-23T00:00:00.000-08:00"}
      ]
    }
  }
}'

Copy

SELECT PARSE_JSON(
  SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
      'business_documents_css',
      '{
         "query": "technology",
         "columns": ["DOCUMENT_CONTENTS", "LAST_MODIFIED_TIMESTAMP"],
         "scoring_config": {
           "functions": {
             "time_decays": [
               {"column": "LAST_MODIFIED_TIMESTAMP", "weight": 1, "limit_hours": 240, "now": "2024-04-23T00:00:00.000-08:00"}
             ]
           }
         }
      }'
  )
)['results'] as results;

Copy

在结果中，请注意：

通过时间衰减后，尽管文档与查询“technology”的相关性略低，但由于其接近当前时间戳，“Product roadmap 2024:...” 文档仍排在首位。

在没有任何时间衰减的情况下，该查询的首个结果是“IT manual for employees:...”。

禁用重新排名¶

要禁用重新排名，请执行以下操作：

resp = business_documents_css.search(
    query="technology",
    columns=["DOCUMENT_CONTENTS", "LAST_MODIFIED_TIMESTAMP"],
    limit=5,
    scoring_config={
        "reranker": "none"
    }
)

Copy

curl --location https://<ACCOUNT_URL>/api/v2/databases/<DB_NAME>/schemas/<SCHEMA_NAME>/cortex-search-services/<SERVICE_NAME>:query \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "Authorization: Bearer $PAT" \
--data '{
  "query": "technology",
  "columns": ["DOCUMENT_CONTENTS", "LAST_MODIFIED_TIMESTAMP"],
  "scoring_config": {
    "reranker": "none"
  }
}'

Copy

SELECT PARSE_JSON(
  SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
      'business_documents_css',
      '{
         "query": "technology",
         "columns": ["DOCUMENT_CONTENTS", "LAST_MODIFIED_TIMESTAMP"],
         "scoring_config": {
           "reranker": "none"
         }
      }'
  )
)['results'] as results;

Copy

小技巧

要使用重新排名工具查询服务，请忽略 scoring_config 对象中的 "reranker": "none" 参数，因为重新排名是默认行为。

Multi-index query examples¶

This section provides examples for querying multi-index Cortex Search Services with a restriction on which indices to search, for the Python and SQL APIs.

Query a service with managed vector embeddings¶

Examples in this section use the following business_directory and example_search_service definitions:

-- Search data
CREATE OR REPLACE TABLE business_directory (name TEXT, address TEXT, description TEXT);
INSERT INTO business_directory VALUES
    ('Joe''s Coffee', '123 Bean St, Brewtown','A cozy café known for artisan espresso and baked goods.'),
    ('Sparkle Wash', '456 Clean Ave, Sudsville', 'Eco-friendly car wash with free vacuum service.'),
    ('Tech Haven', '789 Circuit Blvd, Siliconia', 'Computer store offering the latest gadgets and tech repair services.'),
    ('Joe''s Wash n'' Fold', '456 Apple Ct, Sudsville', 'Laundromat offering coin laundry and premium wash and fold services.'),
    ('Circuit Town', '459 Electron Dr, Sudsville', 'Technology store selling used computer parts at discounted prices.')
;

-- Cortex Search Service
CREATE OR REPLACE CORTEX SEARCH SERVICE example_search_service
    TEXT INDEXES name, address
    VECTOR INDEXES description (model='snowflake-arctic-embed-m-v1.5')
    WAREHOUSE = example_wh
    TARGET_LAG = '1 hour'
    AS ( SELECT * FROM business_directory );

Copy

Query specific indexes¶

To query example_search_service over the name text field and description vector field:

resp = business_directory.search(
    query="tech repair shop",
    columns=["name", "description"],
    limit=2
)

Copy

SELECT
  value['name']::text as name, value['address']::text as address, value['description']::text as description
FROM TABLE(FLATTEN(PARSE_JSON(
  SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
      'business_search_service',
      '{
      "query": "tech repair shop",
        "columns": ["name", "description"],
        "limit": 2
      }'
  ))['results']));

Copy

+---------------------+-----------------------------+--------------------------------------------------------------------------+
|        NAME         |           ADDRESS           |                            DESCRIPTION                                   |
|---------------------+-----------------------------+--------------------------------------------------------------------------|
| Tech Haven          | 789 Circuit Blvd, Siliconia | Computer store offering the latest gadgets and tech repair services.     |
| Circuit Town        | 459 Electron Dr, Sudsville  | Technology store selling used computer parts at discounted prices.       |
+---------------------+-----------------------------+--------------------------------------------------------------------------+

Query a managed vector column only¶

To query example_search_service for "refurbished components for PCs" over the vector index description, using managed embeddings:

resp = business_directory.search(
    multi_index_query={
        "description": [
            {"text": "refurbished components for PCs"}
        ]
    },
    columns=["name", "address", "description"],
    limit=5
)

Copy

SELECT
    value['name']::text as name, value['address']::text as address, value['description']::text as description
FROM TABLE(FLATTEN(PARSE_JSON(
    SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
        'business_search_service',
        '{
          "multi_index_query": {
            "description": [
              {"text": "refurbished components for PCs"}
            ]
          },
          "columns": ["name", "address", "description"],
          "limit": 5
        }'
    )
)['results']));

Copy

+---------------------+-----------------------------+--------------------------------------------------------------------------+
|        NAME         |           ADDRESS           |                            DESCRIPTION                                   |
|---------------------+-----------------------------+--------------------------------------------------------------------------|
| Circuit Town        | 459 Electron Dr, Sudsville  | Technology store selling used computer parts at discounted prices.       |
| Tech Haven          | 789 Circuit Blvd, Siliconia | Computer store offering the latest gadgets and tech repair services.     |
| Joe's Coffee        | 123 Bean St, Brewtown       | A cozy café known for artisan espresso and baked goods.                  |
| Joe's Wash n' Fold  | 456 Apple Ct, Sudsville    | Laundromat offering coin laundry and premium wash and fold services.      |
| Sparkle Wash        | 456 Clean Ave, Sudsville    | Eco-friendly car wash with free vacuum service.                          |
+---------------------+-----------------------------+--------------------------------------------------------------------------+

Query with index weights¶

To query the example_search_service for "sparkle" over the text index name and "clothing washing" over the vector index description, weighting vector scoring as four times more relevant than text or reranking:

resp = business_directory.search(
    multi_index_query={
        "name": [
            {"text": "sparkle"}
        ],
        "description": [
            {"text": "clothing washing"}
        ]
    },
    scoring_config={
        "weights": {
            "texts": 1,
            "vectors": 4,
            "reranker": 1
        }
    },
    columns=["name", "address", "description"],
    limit=2
)

Copy

SELECT
    value['name']::text as name, value['address']::text as address, value['description']::text as description
FROM TABLE(FLATTEN(PARSE_JSON(
    SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
        'business_search_service',
        '{
          "multi_index_query": {
            "name": [
              {"text": "sparkle"}
            ],
            "description": [
              {"text": "clothing washing"}
            ]
          },
          "scoring_config": {
            "weights": {
              "texts": 1,
              "vectors": 4,
              "reranker": 1
            }
          },
          "columns": ["name", "address", "description"],
          "limit": 2
        }'
    )
)['results']));

Copy

+---------------------+-----------------------------+--------------------------------------------------------------------------+
|        NAME         |           ADDRESS           |                            DESCRIPTION                                   |
|---------------------+-----------------------------+--------------------------------------------------------------------------|
| Joe's Wash n' Fold  | 456 Apple Ct, Sudsville     | Laundromat offering coin laundry and premium wash and fold services.     |
| Sparkle Wash        | 456 Clean Ave, Sudsville    | Eco-friendly car wash with free vacuum service.                          |
+---------------------+-----------------------------+--------------------------------------------------------------------------+

Note that because the weight of the description vector index colum is higher than the weight of any text column, the business most associated with "clothes washing" appears above the business containing "sparkle" in its name.

Query with individually weighted indexes¶

To query example_search_service with "circuit" over all fields, applying a relative weight to boost matches in the name column over the description column:

resp = business_directory.search(
    multi_index_query={
        "name": [{"text": "circuit"}],
        "address": [{"text": "circuit"}],
        "description": [{"text": "circuit"}]
    },
    scoring_config={
        "functions": {
            "text_boosts": [
                {"column": "name", "weight": 2},
                {"column": "address", "weight": 1}
            ]
        }
    },
    columns=["name", "address", "description"],
    limit=3
)

Copy

SELECT
    value['name']::text as name, value['address']::text as address, value['description']::text as description
FROM TABLE(FLATTEN(PARSE_JSON(
    SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
        'business_search_service',
        '{
          "multi_index_query": {
            "name": [ {"text": "circuit"} ],
            "address": [ {"text": "circuit"} ],
            "description": [ {"text": "circuit"} ]
          },
          "scoring_config": {
              "functions": {
                "text_boosts": [{"column":"name", "weight": 2}, {"column":"address", "weight": 1}]
                }
          },
          "columns": ["name", "address", "description"],
          "limit": 3
        }'
    )
)['results']));

Copy

+---------------------+-----------------------------+--------------------------------------------------------------------------+
|        NAME         |           ADDRESS           |                            DESCRIPTION                                   |
|---------------------+-----------------------------+--------------------------------------------------------------------------|
| Circuit Town        | 459 Electron Dr, Sudsville  | Technology store selling used computer parts at discounted prices.       |
| Tech Haven          | 789 Circuit Blvd, Siliconia | Computer store offering the latest gadgets and tech repair services.     |
| Joe's Coffee        | 123 Bean St, Brewtown       | A cozy café known for artisan espresso and baked goods.                  |
+---------------------+-----------------------------+--------------------------------------------------------------------------+

Note that boosting the name over address ranks the business named "Circuit Town" above the business located at an address on "Circuit Blvd".

Query a service with custom vector embeddings¶

Examples in this section use the following business_documents and example_search_service definitions:

-- Search data with only custom embeddings
CREATE OR REPLACE TABLE business_documents (
  document_contents VARCHAR,
  document_embedding VECTOR(FLOAT, 3)
);
INSERT INTO business_documents VALUES
  ('Quarterly financial report for Q1 2024: Revenue increased by 15%, with expenses stable. Highlights include strategic investments in marketing and technology.', [1, 1, 1]::VECTOR(float, 3)),
  ('IT manual for employees: Instructions for usage of internal technologies, including hardware and software guides and commonly asked tech questions.', [2, 2, 2]::VECTOR(float, 3)),
  ('Employee handbook 2024: Updated policies on remote work, health benefits, and company culture initiatives.', [2, 3, 2]::VECTOR(float, 3)),
  ('Marketing strategy document: Target audience segmentation for upcoming product launch.', [1, -1, -1]::VECTOR(float, 3))
;

-- Cortex Search Service
CREATE OR REPLACE CORTEX SEARCH SERVICE example_search_service
  TEXT INDEXES (document_contents)
  VECTOR INDEXES (document_embedding)
  WAREHOUSE = example_wh
  TARGET_LAG = '1 minute'
  AS SELECT * FROM business_documents;

Copy

备注

These examples use mock embeddings for simplicity. In a production use-case, vectors should be generated through a Snowflake vector embedding model or an externally-hosted embedding model.

Query an index with custom embeddings¶

To query example_search_service with "IT" and a corresponding embedding over the document_contents and document_embedding column:

resp = business_directory.search(
    multi_index_query={
        "document_embedding": [ {"vector": [1, 1, 1]} ],
        "document_contents": [ {"text": "IT"} ]
    },
    columns=["document_contents"],
    limit=2
)

Copy

SELECT
    value['document_contents']::text as document_contents
FROM TABLE(FLATTEN(PARSE_JSON(
    SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
        'byov_search_service',
        '{
          "multi_index_query": {
                "document_embedding": [ {"vector": [1, 1, 1] } ],
                "document_contents": [ {"text": "IT"} ]
          },
          "columns": ["document_contents"],
          "limit": 2
        }'
    )
)['results']));

Copy

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                   DOCUMENT_CONTENTS                                                                                      |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IT manual for employees: Instructions for usage of internal technologies, including hardware and software guides and commonly asked tech questions.                      |
| Quarterly financial report for Q1 2024: Revenue increased by 15%, with expenses stable. Highlights include strategic investments in marketing and technology.            |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+