类别：: 字符串和二进制函数 (AI Functions)

AI_CLASSIFY¶

备注

AI_CLASSIFY 是 CLASSIFY_TEXT (SNOWFLAKE.CORTEX) 的更新版本。要获得最新功能，请使用 AI_CLASSIFY。

将文本或图像分类到您指定的类别中。

区域可用性¶

下表显示了可以在其中使用 AI_CLASSIFY 函数来处理文本和图像的区域：

数据类型	AWS US 西部 2 （俄勒冈）	AWS US 东部 1 （弗吉尼亚北部）	AWS 欧洲中部 1 （法兰克福）	AWS 欧洲西部 1 （爱尔兰）	AWS AP 东南部 2 （悉尼）	AWS AP 东北部 1 （东京）	Azure 东部 US 2 （弗吉尼亚）	Azure 西欧（荷兰）	AWS （跨区域）
TEXT	✔	✔	✔	✔	✔	✔	✔	✔	✔
IMAGE	✔	✔	✔						✔

语法¶

AI_CLASSIFY( <input> , <list_of_categories> [, <config_object>] )

Copy

实参¶

必填：

input

您正在分类的字符串、图像或提示对象。

对于文本分类，输入字符串区分大小写。结果可能因大小写而异。

list_of_categories

An array of categories with at least two unique values. The number of categories is restricted only by the token window, but in practice, exceeding twenty categories might reduce classification accuracy. Categories are case sensitive.

类别可以是简单的字符串，也可以是相同类型的 SQL 对象。如果您使用对象，则可以为一个或多个类别提供描述，以提高分类准确性。

对于每个类别，请指定以下内容：

:code:`label`（必填）：类别的名称。
:code:`description`（可选）：使用不超过 25 个字描述该类别。

备注

Descriptions count as input tokens, which affects the cost of the classification operation. For more information, see 成本注意事项.

可选：

config_object

配置设置指定为键/值对。支持的键：

task_description：对于不超过 50 个字的分类任务的解释。这可以帮助模型了解分类任务的上下文，并提高准确性。
output_mode：设置为 'multi' 以进行多标签分类。默认为 'single' 以进行单标签分类。
examples：用于少量样本学习的示例对象列表。每个示例必须包括以下内容：
- input：待分类的示例文本。
- labels：输入的正确类别列表。
- explanation：解释输入为何映射到这些类别。

返回¶

一个序列化对象。该对象的 labels 字段是一个数组，用于指定输入所属类别的列表。

对于单标签分类，labels 数组只有一个元素。对于多标签分类，labels 字段可以有多个元素。

访问控制要求¶

Users must use a role that has the SNOWFLAKE.CORTEX_USER database role. For more information about this privilege, see Cortex LLM privileges.

使用说明¶

为获得最佳结果，请遵循以下准则：

为 input 和 list_of_categories 使用英文纯文本。
Avoid trying to classify non-prose such as code snippets, logs, or non-English text.
避免在文本中使用非开源代码或格式（例如专有语言或格式）。底层语言模型并未进行专有格式的训练。
不要在类别标签中使用缩写、特殊字符或行话。
使用描述性类别。避免使用诸如“Xa4s3”或“类别 1”之类的类别名称。
使用彼此互斥的类别。
当输入与类别之间的关系不明确或较为复杂时，提供清晰的任务描述可以提高准确性。
添加标签描述可以提高准确性，尤其是在标签不明确或者需要特定选择标准时。撰写描述，以清晰突出每个标签与其他标签的区别。
每个标签、描述和示例都会增加每次 AI_CLASSIFY 调用的输入词元数量，从而影响成本。
示例有助于提高准确性。

备注

AI_CLASSIFY adds a prompt to your input to generate its response. This increases the token count beyond the text that you've provided.

示例¶

以下示例使用仅包含所需实参的 AI_CLASSIFY 函数。

AI_CLASSIFY：文本¶

以下示例将提示分为两个类别之一，travel 或 cooking：

SELECT AI_CLASSIFY('One day I will see the world', ['travel', 'cooking']);

Copy

以下是前面命令的输出。

'{
  "labels": ["travel"]
 }';

以下示例使用多标签分类：

SELECT AI_CLASSIFY(
  'One day I will see the world and learn to cook my favorite dishes',
  ['travel', 'cooking', 'reading', 'driving'],
  {'output_mode': 'multi'}
);

Copy

以下是前面命令的输出。

'{
  "labels": ["travel", "cooking"]
 }';

以下示例传入了任务描述、标签描述和少量样本学习示例：

SELECT AI_CLASSIFY(
  'One day I will see the world and learn to cook my favorite dishes',
  [
    {'label': 'travel', 'description': 'content related to traveling'},
    {'label': 'cooking'},
    {'label': 'reading'},
    {'label': 'driving'}
  ],
  {
    'task_description': 'Determine topics related to the given text',
    'output_mode': 'multi',
    'examples': [
      {
        'input': 'i love traveling with a good book',
        'labels': ['travel', 'reading'],
        'explanation': 'the text mentions traveling and a good book which relates to reading'
      }
    ]
  });

Copy

以下是前面命令的输出。

'{
  "labels": ["travel", "cooking"]
}';

以下示例创建了一个 text_classification_table，其中包含文本列以及该文本的可能类别列。对表的每一行调用 AI_CLASSIFY 函数，可对文本列中的字符串进行分类。

CREATE OR REPLACE TEMPORARY TABLE text_classification_table AS
SELECT 'France' AS input, ['North America', 'Europe', 'Asia'] AS classes
UNION ALL
SELECT 'Singapore', ['North America', 'Europe', 'Asia']
UNION ALL
SELECT 'one day I will see the world', ['travel', 'cooking', 'dancing']
UNION ALL
SELECT 'my lobster bisque is second to none', ['travel', 'cooking', 'dancing'];

SELECT input,
    classes,
    AI_CLASSIFY(input, classes):labels AS classification
FROM text_classification_table;

Copy

AI_CLASSIFY：图像¶

使用单个文件输入：

WITH food_pictures AS (
  SELECT
      TO_FILE(file_url) AS img
  FROM DIRECTORY(@file_stage)
)
SELECT
*,
AI_CLASSIFY(img, ['dessert', 'drink', 'main dish', 'side dish']):labels AS classification
FROM food_pictures;

Copy

使用由 PROMPT() 构造的提示对象：

  WITH food_pictures AS (
  SELECT
      TO_FILE(file_url) AS img
  FROM DIRECTORY(@file_stage)
)
SELECT
*,
AI_CLASSIFY(PROMPT('Please help me classify the food within this image {0}', img),
  ['dessert', 'drink', 'main dish', 'side dish']):labels AS classification
FROM food_pictures;

Copy

限制¶

Snowflake AI 函数不适用于在以下类型的暂存区中通过文件创建的 FILE 对象：
- 具有加密模式 TYPE = 'SNOWFLAKE_FULL' 的内部暂存区
- 使用任何客户端加密模式的外部暂存区：
  - TYPE = 'AWS_CSE'
  - TYPE = 'AZURE_CSE'
- 用户暂存区
- 表暂存区
- 带有双引号名称的暂存区