CREATE CORTEX SEARCH SERVICE¶
创建新的 Cortex Search 服务 或替换现有服务。
语法¶
CREATE [ OR REPLACE ] CORTEX SEARCH SERVICE [ IF NOT EXISTS ] <name>
ON <search_column>
ATTRIBUTES <col_name> [ , ... ]
WAREHOUSE = <warehouse_name>
TARGET_LAG = '<num> { seconds | minutes | hours | days }'
[ EMBEDDING_MODEL = <embedding_model_name> ]
[ INITIALIZE = { ON_CREATE | ON_SCHEDULE } ]
[ COMMENT = '<comment>' ]
AS <query>;
CREATE [ OR REPLACE ] CORTEX SEARCH SERVICE <name>
TEXT INDEXES <text_column_name> [ , ... ]
VECTOR INDEXES <column_specification> [ , ... ]
ATTRIBUTES <col_name> [ , ... ]
WAREHOUSE = <warehouse_name>
TARGET_LAG = '<num> { seconds | minutes | hours | days }'
[ INITIALIZE = { ON_CREATE | ON_SCHEDULE } ]
[ COMMENT = '<comment>' ]
AS <query>;
必填参数¶
name指定 Cortex Search 服务标识符(即名称)的字符串;对于在其中创建服务的架构,它必须是唯一的。
此外,标识符必须以字母字符开头,且不能包含空格或特殊字符,除非整个标识符字符串放在双引号内(例如,
"My object")。放在双引号内的标识符也区分大小写。有关更多信息,请参阅 标识符要求。
ON search_columnSpecifies the text column in the base table that you wish to search on, for single-index Cortex Search. This column must be a text value.
TEXT INDEXES text_column_name [, ... ]Specifies comma-separated text columns in the base table to search on, for multi-index Cortex Search. Columns must be text values.
VECTOR INDEXES column_specification [ , ... ]Specifies columns for vector similarity searches. Column specifications include:
Managed vector embeddings:
text_column_name (model='embedding_model'): Specifies a text column and the embedding model used for vector generation. Must use one of the supported embedding models. If no model is specified, the default modelsnowflake-arctic-embed-m-v1.5is used.User-provided vector embeddings:
vector_column_name: Specifies a user-provided vector embedding column.User-provided vector embeddings with managed query embeddings:
vector_column_name(query_model='embedding_model'): Specifies a user-provided vector embedding column and the embedding model used for embedding text at query time. Thequery_modelmust be one of the Snowflake-managed embedding models supported in Cortex Search. If noquery_modelis specified, then the user-provided vector column can only be used with a vector embedding query.
For information on the behavior of vector embeddings, see 使用说明.
ATTRIBUTES col_name [ , ... ]指定在向服务发出查询时要进行过滤的基表中列的逗号分隔列表。必须通过显式枚举或通配符 (
*) 将属性列包含在源查询中。WAREHOUSE = warehouse_name指定要用于运行源查询、构建搜索索引并按 TARGET_LAG 目标保持对其进行刷新的仓库。
TARGET_LAG = 'num { seconds | minutes | hours | days }'指定 Cortex Search 服务内容应滞后于源查询中指定的基表更新的最长时间。
可选参数¶
EMBEDDING_MODEL = <embedding_model_name>可选参数,指定要在 Cortex Search Service 中使用的嵌入模型。创建 Cortex Search Service 后,无法更改此属性。要修改属性,请使用 CREATE OR REPLACE CORTEX SEARCH SERVICE 命令。
部分嵌入模型仅在某些云区域可用于 Cortex Search。有关按区域按型号划分的可用性列表,请参阅 Cortex Search 区域可用性。
每个模型处理的每 100 万个输入令牌可能会产生不同的成本。请参阅 Snowflake 服务使用表,以了解每个函数的每百万个词元消耗的 credit 成本。
如果
EMBEDDING_MODEL未指定,则使用默认模型。默认模型为snowflake-arctic-embed-m-v1.5。INITIALIZE指定 Cortex Search 服务的初始 刷新 的行为。创建服务后无法更改此属性。要修改属性,请使用 CREATE OR REPLACE CORTEX SEARCH SERVICE 命令替换 Cortex Search 服务。
ON_CREATE创建时同步刷新 Cortex Search 服务。如果此刷新失败,则服务创建失败并显示错误消息。
ON_SCHEDULE在下次定时刷新时刷新 Cortex Search 服务。
刷新计划过程运行时,将填充 Cortex Search Service。创建 Cortex Search Service 时不填充数据。如果您尝试查询服务,您可能会看到以下错误,因为第一次计划刷新尚未发生。
Your service has not yet been loaded into our serving system. Please retry your request in a few minutes.
默认:
ON_CREATECOMMENT = 'comment'指定服务的注释。
AS query指定一个查询,用于定义创建服务的基表。
访问控制要求¶
权限 |
对象 |
|---|---|
CREATE CORTEX SEARCH SERVICE |
Schema in which you are creating the search service. |
SELECT |
Tables and views that the service queries. |
USAGE |
Warehouse that refreshes the service. |
Operating on an object in a schema requires at least one privilege on the parent database and at least one privilege on the parent schema.
有关创建具有指定权限集的自定义角色的说明,请参阅 创建自定义角色。
有关对 安全对象 执行 SQL 操作的相应角色和权限授予的一般信息,请参阅 访问控制概述。
注意
To create a Cortex Search Service, your role must have the required privileges to use the Cortex embedding functions. This requires granting the SNOWFLAKE.CORTEX_USER database role or the SNOWFLAKE.CORTEX_EMBED_USER database role to the service creator role.
使用说明¶
注意
客户应确保在使用 Snowflake 服务时,不会将个人数据(用户对象除外)、敏感数据、出口管制数据或其他受监管数据作为元数据输入。有关更多信息,请参阅 Snowflake 中的元数据字段。
用于运行 Cortex Search 服务源查询的仓库大小确实会影响每次刷新的速度和成本。更大的仓库可缩短构建和刷新时间。但是,在本预览版中,Snowflake 建议为每项 Cortex Search 服务使用大小不超过 MEDIUM 的仓库。
Snowflake 建议为每个 Cortex Search 服务使用专用仓库,以免干扰其他工作负载。
搜索索引作为创建语句的一部分构建,这意味着 CREATE CORTEX SEARCH SERVICE 语句可能需要更长的时间才能完成较大的数据集。
When creating a multi-index search service, at least one column must be specified in the VECTOR INDEXES clause in order to ensure the highest quality of search results. Attempting to create a service with no vector indexes returns an error.
A column can be specified in the TEXT INDEXES clause, the VECTOR INDEXES clause, or both:
Columns specified as text indexes can be used for keyword (lexical) search. When querying a text index, results are scored based on the degree of lexical similarity.
Columns specified as vector indexes can be used for vector (semantic) search. When querying a vector index, results are scored based on the degree of semantic similarity.
Columns specified as both text and vector indexes are used for both types of search.
Each vector index column employs one of three methods for managing embeddings:
Managed vector embeddings: Snowflake calculates the vector embeddings when a text column is specified either in the ON or VECTOR INDEXES clauses. Must use one of the supported embedding models.
User-provided vector embeddings: You are responsible for computing the vector embeddings with a Snowflake-provided vector embedding model or an externally-hosted embedding model prior to ingestion by the Cortex Search Service, as well for text inputs at query time.
User-provided vector embeddings with managed query embeddings: You are responsible for computing the vector embeddings with one of the Snowflake-managed embedding models supported in Cortex Search prior to ingestion by the Cortex Search Service. At query time, Cortex Search will embed text queries using the specified
query_model.
Change Tracking Requirements¶
When creating a Cortex Search Service, if change tracking is not already enabled on the tables that it queries, Snowflake automatically attempts to enable change tracking on them. In order to support incremental refreshes, change tracking must be enabled with non-zero time travel retention on all underlying objects used by a Cortex Search Service.
As base objects change, so does the Cortex Search Service. If you recreate a base object, you must re-enable change tracking.
For more information about enabling change tracking, see 启用更改跟踪.
The OR REPLACE and IF NOT EXISTS clauses are mutually exclusive. They can't both be used in the same statement.
CREATE OR REPLACE <object> 语句是原子的。也就是说,当对象被替换时,旧对象将被删除,新对象将在单个事务中创建。
Examples¶
使用 snowflake-arctic-embed-l-v2.0 嵌入模型创建名为 mysvc 的 Cortex Search 服务:
CREATE OR REPLACE CORTEX SEARCH SERVICE mysvc
ON transcript_text
ATTRIBUTES region,agent_id
WAREHOUSE = mywh
TARGET_LAG = '1 hour'
EMBEDDING_MODEL = 'snowflake-arctic-embed-l-v2.0'
AS (
SELECT
transcript_text,
date,
region,
agent_id
FROM support_db.public.transcripts_etl
);
创建名为 mysvc 的 Cortex Search 服务,第一次刷新定于一个 TARGET_LAG 期(1 小时)后运行。
CREATE OR REPLACE CORTEX SEARCH SERVICE mysvc
ON transcript_text
ATTRIBUTES region
WAREHOUSE = mywh
TARGET_LAG = '1 hour'
INITIALIZE = ON_SCHEDULE
AS SELECT * FROM support_db.public.transcripts_etl;
Create a multi-index search service named business_search_service that searches the table business_directory, where:
nameandaddressare specified as text indexes, so they are searchable with keyword search only.descriptionis specified as a vector index, so it is eligible for vector (semantic) search using managed vector embeddings and thesnowflake-arctic-embed-m-v1.5model.
-- Generate sample data
CREATE OR REPLACE TABLE business_directory (name TEXT, address TEXT, description TEXT);
INSERT INTO business_directory VALUES
('Joe''s Coffee', '123 Bean St, Brewtown','A cozy café known for artisan espresso and baked goods.'),
('Sparkle Wash', '456 Clean Ave, Sudsville', 'Eco-friendly car wash with free vacuum service.'),
('Tech Haven', '789 Circuit Blvd, Siliconia', 'Computer store offering the latest gadgets and tech repair services.'),
('Joe''s Wash n'' Fold', '456 Apple Ct, Sudsville', 'Laundromat offering coin laundry and premium wash and fold services.'),
('Circuit Town', '459 Electron Dr, Sudsville', 'Technology store selling used computer parts at discounted prices.')
;
-- Create the Cortex Search Service
CREATE OR REPLACE CORTEX SEARCH SERVICE business_search_service
TEXT INDEXES name, address
VECTOR INDEXES description (model='snowflake-arctic-embed-m-v1.5')
WAREHOUSE = mywh
TARGET_LAG = '1 hour'
AS ( SELECT * FROM business_directory );
Create a multi-index Cortex Search service with custom vector embeddings called custom_vector_search_service. This service searches a table with a text column (document_contents) and a separate user-provided vector embedding column (document_embedding) that contains embeddings corresponding to the text column.
备注
This example uses mock embeddings for simplicity. In a production use-case, vectors should be generated through a Snowflake vector embedding model or an externally-hosted embedding model.
-- Generate sample data
CREATE OR REPLACE TABLE business_documents (
document_contents VARCHAR,
document_embedding VECTOR(FLOAT, 3)
);
INSERT INTO business_documents VALUES
('Quarterly financial report for Q1 2024: Revenue increased by 15%, with expenses stable. Highlights include strategic investments in marketing and technology.', [1, 1, 1]::VECTOR(float, 3)),
('IT manual for employees: Instructions for usage of internal technologies, including hardware and software guides and commonly asked tech questions.', [2, 2, 2]::VECTOR(float, 3)),
('Employee handbook 2024: Updated policies on remote work, health benefits, and company culture initiatives.', [2, 3, 2]::VECTOR(float, 3)),
('Marketing strategy document: Target audience segmentation for upcoming product launch.', [1, -1, -1]::VECTOR(float, 3))
;
-- Create the Cortex Search Service
CREATE OR REPLACE CORTEX SEARCH SERVICE custom_vector_search_service
TEXT INDEXES (document_contents)
VECTOR INDEXES (document_embedding)
WAREHOUSE = mywh
TARGET_LAG = '1 minute'
AS SELECT * FROM business_documents;
Create a service managed_vector_search_service with user-managed vector embeddings and managed query embeddings:
-- Generate sample data
CREATE OR REPLACE TABLE business_documents (
document_contents VARCHAR
);
INSERT INTO business_documents VALUES
('Quarterly financial report for Q1 2024: Revenue increased by 15%, with expenses stable. Highlights include strategic investments in marketing and technology.'),
('IT manual for employees: Instructions for usage of internal technologies, including hardware and software guides and commonly asked tech questions.'),
('Employee handbook 2024: Updated policies on remote work, health benefits, and company culture initiatives.'),
('Marketing strategy document: Target audience segmentation for upcoming product launch.');
-- Add managed vector embeddings
ALTER TABLE business_documents ADD COLUMN document_embeddings VECTOR(FLOAT, 768);
UPDATE business_documents SET document_embeddings = AI_EMBED('snowflake-arctic-embed-m-v1.5', document_contents);
-- Create the Cortex Search Service
CREATE OR REPLACE CORTEX SEARCH SERVICE managed_vector_search_service
TEXT INDEXES document_contents
VECTOR INDEXES document_embedding(query_model='snowflake-arctic-embed-m-v1.5')
WAREHOUSE = mywh
TARGET_LAG = '1 minute'
AS SELECT * FROM business_documents;