- 类别:
字符串和二进制函数 (AI Functions)
AI_CLASSIFY¶
备注
AI_CLASSIFY 是 CLASSIFY_TEXT (SNOWFLAKE.CORTEX) 的更新版本。要获得最新功能,请使用 AI_CLASSIFY。
将文本或图像分类到您指定的类别中。
区域可用性¶
下表显示了可以在其中使用 AI_CLASSIFY 函数来处理文本和图像的区域:
数据类型
|
AWS US 西部 2
(俄勒冈)
|
AWS US 东部 1
(弗吉尼亚北部)
|
AWS 欧洲中部 1
(法兰克福)
|
AWS 欧洲西部 1
(爱尔兰)
|
AWS AP 东南部 2
(悉尼)
|
AWS AP 东北部 1
(东京)
|
Azure 东部 US 2
(弗吉尼亚)
|
Azure 西欧
(荷兰)
|
AWS
(跨区域)
|
|---|---|---|---|---|---|---|---|---|---|
TEXT
|
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
IMAGE
|
✔ |
✔ |
✔ |
✔ |
语法¶
AI_CLASSIFY( <input> , <list_of_categories> [, <config_object>, <return_error_details> ] )
实参¶
必填:
input您正在分类的字符串、图像或 提示 对象。
对于文本分类,输入字符串区分大小写。结果可能因大小写而异。
list_of_categoriesAn array of categories with at least two unique values. The number of categories is restricted only by the token window, but in practice, exceeding twenty categories might reduce classification accuracy. Categories are case sensitive.
类别可以是简单的字符串,也可以是相同类型的 SQL 对象。如果您使用对象,则可以为一个或多个类别提供描述,以提高分类准确性。
对于每个类别,请指定以下内容:
备注
描述算作输入词元,会增加分类操作的成本。有关更多信息,请参阅 成本注意事项。
可选:
config_object配置设置指定为键/值对。支持的键:
task_description:对于不超过 50 个字的分类任务的解释。这可以帮助模型了解分类任务的上下文,并提高准确性。output_mode:设置为'multi'以进行多标签分类。默认为'single'以进行单标签分类。examples:用于少量样本学习的示例对象列表。每个示例必须包括以下内容:input:待分类的示例文本。labels:输入的正确类别列表。explanation:解释输入为何映射到这些类别。
return_error_detailsA boolean flag that indicates whether to return error details for rows with non-fatal errors. When set to
TRUE, the function returns a JSON object with the response value and error message. To return error details, you must also set theAI_SQL_ERROR_HANDLING_USE_FAIL_ON_ERRORsession parameter toFALSE.Default:
FALSE
返回¶
一个序列化对象。该对象的 labels 字段是一个数组,用于指定输入所属类别的列表。
对于单标签分类,labels 数组只有一个元素。对于多标签分类,labels 字段可以有多个元素。
When return_error_details is set to TRUE and AI_SQL_ERROR_HANDLING_USE_FAIL_ON_ERROR is set to FALSE, returns a JSON object containing the following keys:
"value": A JSON object containing the classification results.NULLif there's an error."error": A string containing the error details.NULLif no error occurred.
When return_error_details is set to FALSE and AI_SQL_ERROR_HANDLING_USE_FAIL_ON_ERROR is set to FALSE, returns NULL for columns with errors.
访问控制要求¶
Users must use a role that has the SNOWFLAKE.CORTEX_USER database role. For more information about this privilege, see Cortex LLM privileges.
Error handling¶
By default, AI_CLASSIFY terminates the query when it encounters a non-fatal error on a specific row. This behavior can disrupt large batch operations where an error can stop all processing.
For rows with non-fatal errors, you can enable error handling during your session. When you have error handling enabled, AI_CLASSIFY returns NULL for rows that failed, allowing the rest of the query to complete successfully.
Use the following code to enable error handling for your session:
ALTER SESSION SET AI_SQL_ERROR_HANDLING_USE_FAIL_ON_ERROR=false;
When you have error handling enabled, you can choose to either:
Have AI_CLASSIFY return
NULLfor any row with a non-fatal error.Receive a detailed error object for each error instead of
NULL.
使用说明¶
为获得最佳结果,请遵循以下准则:
为
input和list_of_categories使用英文纯文本。Avoid trying to classify non-prose such as code snippets, logs, or non-English text.
避免在文本中使用非开源代码或格式(例如专有语言或格式)。底层语言模型并未进行专有格式的训练。
不要在类别标签中使用缩写、特殊字符或行话。
使用描述性类别。避免使用诸如“Xa4s3”或“类别 1”之类的类别名称。
使用彼此互斥的类别。
当输入与类别之间的关系不明确或较为复杂时,提供清晰的任务描述可以提高准确性。
添加标签描述可以提高准确性,尤其是在标签不明确或者需要特定选择标准时。撰写描述,以清晰突出每个标签与其他标签的区别。
每个标签、描述和示例都会增加每次 AI_CLASSIFY 调用的输入词元数量,从而影响成本。
示例有助于提高准确性。
备注
AI_CLASSIFY adds a prompt to your input to generate its response. This increases the token count beyond the text that you've provided.
示例¶
以下示例使用仅包含所需实参的 AI_CLASSIFY 函数。
AI_CLASSIFY:文本¶
以下示例将提示分为两个类别之一,travel 或 cooking:
SELECT AI_CLASSIFY('One day I will see the world', ['travel', 'cooking']);
以下是前面命令的输出。
'{
"labels": ["travel"]
}';
以下示例使用多标签分类:
SELECT AI_CLASSIFY(
'One day I will see the world and learn to cook my favorite dishes',
['travel', 'cooking', 'reading', 'driving'],
{'output_mode': 'multi'}
);
以下是前面命令的输出。
'{
"labels": ["travel", "cooking"]
}';
以下示例传入了任务描述、标签描述和少量样本学习示例:
SELECT AI_CLASSIFY(
'One day I will see the world and learn to cook my favorite dishes',
[
{'label': 'travel', 'description': 'content related to traveling'},
{'label': 'cooking'},
{'label': 'reading'},
{'label': 'driving'}
],
{
'task_description': 'Determine topics related to the given text',
'output_mode': 'multi',
'examples': [
{
'input': 'i love traveling with a good book',
'labels': ['travel', 'reading'],
'explanation': 'the text mentions traveling and a good book which relates to reading'
}
]
});
以下是前面命令的输出。
'{
"labels": ["travel", "cooking"]
}';
以下示例创建了一个 text_classification_table,其中包含文本列以及该文本的可能类别列。对表的每一行调用 AI_CLASSIFY 函数,可对文本列中的字符串进行分类。
CREATE OR REPLACE TEMPORARY TABLE text_classification_table AS
SELECT 'France' AS input, ['North America', 'Europe', 'Asia'] AS classes
UNION ALL
SELECT 'Singapore', ['North America', 'Europe', 'Asia']
UNION ALL
SELECT 'one day I will see the world', ['travel', 'cooking', 'dancing']
UNION ALL
SELECT 'my lobster bisque is second to none', ['travel', 'cooking', 'dancing'];
SELECT input,
classes,
AI_CLASSIFY(input, classes):labels AS classification
FROM text_classification_table;
AI_CLASSIFY:图像¶
使用单个文件输入:
WITH food_pictures AS (
SELECT
TO_FILE(file_url) AS img
FROM DIRECTORY(@file_stage)
)
SELECT
*,
AI_CLASSIFY(img, ['dessert', 'drink', 'main dish', 'side dish']):labels AS classification
FROM food_pictures;
使用由 PROMPT() 构造的提示对象:
WITH food_pictures AS (
SELECT
TO_FILE(file_url) AS img
FROM DIRECTORY(@file_stage)
)
SELECT
*,
AI_CLASSIFY(PROMPT('Please help me classify the food within this image {0}', img),
['dessert', 'drink', 'main dish', 'side dish']):labels AS classification
FROM food_pictures;
Process rows with error handling¶
After you've set AI_SQL_ERROR_HANDLING_USE_FAIL_ON_ERROR set to FALSE, AI_CLASSIFY continues to process rows with non-fatal errors.
The following is an example query:
WITH images AS (
SELECT
TO_FILE(file_url) AS img
FROM DIRECTORY(@file_stage)
)
SELECT AI_CLASSIFY(PROMPT('What is the photo {0} review about', img), ['travel', 'cooking'])
FROM reviews
LIMIT 2;
The following shows the output of the preceding command. The first row was processed successfully, while the second row returned NULL.
{"labels": ["travel"]}
null
To get detailed information for each error, set return_error_details to TRUE.
ALTER SESSION SET AI_SQL_ERROR_HANDLING_USE_FAIL_ON_ERROR=false;
WITH images AS (
SELECT
TO_FILE(file_url) AS img
FROM DIRECTORY(@file_stage)
)
SELECT AI_CLASSIFY(
prompt => PROMPT('What is the photo {0} review about', img),
classes => ['travel', 'cooking'],
return_error_details => TRUE
)
FROM reviews
LIMIT 2;
The following is the output of the preceding command. The first row was processed successfully, while the second row returned an error message.
{ "value": {"labels": ["travel"]}, "error": null }
{ "value": null, "error": "invalid image file"}
限制¶
Snowflake AI 函数不适用于在以下类型的暂存区中通过文件创建的 FILE 对象:
具有加密模式
TYPE = 'SNOWFLAKE_FULL'的内部暂存区使用任何客户端加密模式的外部暂存区:
TYPE = 'AWS_CSE'TYPE = 'AZURE_CSE'
用户暂存区
表暂存区
带有双引号名称的暂存区