AI_ COMPLETE structured outputs¶
AI_COMPLETE lets you supply a JSON schema or SQL type literal that completion responses must follow, producing structured output. Structured output reduces the need for post-processing in your AI data pipelines and enables seamless integration with systems that require deterministic responses. AI_COMPLETE verifies each generated token against your structured output definition to ensure that the response conforms to your type structure.
AI_COMPLETE 支持的每个模型都支持结构化输出,但最强大的模型通常会产生更高质量的响应。
Using AI_ COMPLETE with type literals¶
Type literals allow you to define structured output for AI_COMPLETE using SQL types, taking advantage of Snowflake’s built-in mappings between SQL and JSON types. Begin your type literal with the TYPE keyword and use a SQL OBJECT as the top-level type. The properties of your top-level object can be any SQL type with a supported mapping to JSON.
Note
Type literals are supported only for the single string text prompt version of AI_COMPLETE. For more information, see AI_COMPLETE (Single string).
The following example uses a type literal to produce structured output for a prompt. The prompt contains both instructions to the model and the data to process. The response_format type literal produces the model’s response as a JSON object with a top-level note containing a date, address, items_count, and a price array containing prices.
The following is a full response to this query:
Type literal notes and limitations¶
Specifying a structured output schema as a type literal follows these rules:
- STRING and VARCHAR types are mapped to JSON strings.
- VARCHAR types aren’t guaranteed to produce output of a specific length.
- FIXED types without a scale are mapped to JSON integers. All other numeric types are mapped to JSON numbers.
Type literals have restrictions around supported types:
- The empty object OBJECT() isn’t allowed as a type literal.
- Not all SQL types have a mapping for structured output. These include, but aren’t limited, to:
- VARIANT
- MAP
- Date & time data types
The use of an unsupported data type returns an error.
Using AI_ COMPLETE with JSON schemas¶
For more control over structured output, use a JSON schema (https://json-schema.org/) as the value for
response_format. The supplied JSON schema defines the structure, data types, and constraints that the generated text
must conform to, including required fields.
For simple tasks, you don’t need to specify any details of the output format, or even instruct the model to “respond in JSON.” For more complex tasks, prompting the model to respond in JSON can improve accuracy; see 优化 JSON 依从性准确性.
The following illustrates the syntax of an AI_COMPLETE function call that uses a JSON schema to specify the structured
output format. The schema defines a top-level object, properties, with a property_name property of type string;
this field is required in the response.
Important
For OpenAI (GPT) models, the following requirements apply:
- additionalProperties (https://json-schema.org/understanding-json-schema/reference/object#additionalproperties) field must be set to
falsein every node of the schema. - The required (https://json-schema.org/understanding-json-schema/reference/object#required) field must be included and contain the names of every property in the schema.
Other models do not require these fields, but you might include them anyway so you don’t need a different schema for OpenAI models.
SQL examples¶
The following example is a more complete demonstration of using AI_COMPLETE with a single string input.
响应:
The following example demonstrates how to use the response_format argument to specify a JSON schema for the response and using the
show_details argument to return inference metadata.
响应:
Python 示例¶
Note
Structured output is supported in snowflake-ml-python version 1.8.0 and later.
The following example demonstrates how to use the response_format argument to specify a JSON schema for the response.
响应:
Pydantic 示例¶
Pydantic 是适用于 Python 的数据验证和设置管理库。此示例使用 Pydantic 定义响应格式的架构。该代码执行以下步骤:
- 使用 Pydantic 定义架构
- Converts the Pydantic model to a JSON schema using the
model_json_schemamethod - Passes the JSON schema to the
completefunction as theresponse_formatargument
Note
This example is meant to be run in a Snowsight Python worksheet, which already has a connection to Snowflake. To run it in a different environment, you might need to establish a connection to Snowflake using the Snowflake Connector for Python.
响应:
REST API 示例¶
You can use the Snowflake Cortex LLM REST API to invoke COMPLETE with the LLM of your choice. Below is an example supplying a schema using the Cortex LLM REST API:
响应:
创建一个 JSON 架构定义¶
为了获得 COMPLETE 结构化输出的最佳准确度,请遵循以下指导原则:
-
Use the “required” field in the schema to specify required fields. COMPLETE raises an error if a required field cannot be extracted.
In the following example, the schema directs COMPLETE to find people mentioned in the document. The
peoplefield is marked as required to make sure people are identified.
响应:
- Provide detailed descriptions of the fields to be extracted so that the model can more accurately identify them. For
example, the following schema includes a description of each of the fields of
people:name,age, andisAdult.
Using a JSON reference¶
Schema references solve practical problems when using Cortex COMPLETE Structured Outputs. With references, represented
by $ref, you can define common objects like addresses or prices once, then reuse them throughout the
schema. This way, when you need to update validation logic or add a field, you can change it in one place instead of in
multiple locations.
Using references reduces coding effort, reduces bugs from inconsistent implementations, and makes code reviews simpler. Referenced components create cleaner hierarchies that better represent entity relationships in your data model. As projects grow more complex, this modular approach helps you manage technical debt while maintaining schema integrity.
Pydantic 之类的第三方库在 Python 中原生支持引用机制,从而简化了代码中架构的使用。
以下准则适用于 JSON 架构中引用的使用:
- Scope limitation: The
$refmechanism is limited to the user’s schema only; external schema references (such as HTTP URLs) are not supported. - Definition placement: Object definitions should be placed at the top level of the schema, specifically under the definitions or
$defskey. - Enforcement: While the JSON Schema specification recommends using the
$defskey for definitions, Snowflake’s validation mechanism strictly enforces this structure. This is an example of a valid$defsobject:
使用 JSON 引用的示例¶
此 SQL 示例演示了 JSON 架构中引用的使用。
响应:
优化 JSON 依从性准确性¶
COMPLETE 结构化输出通常不需要提示;它已经明白它的响应应该符合您指定的架构。然而,任务复杂度会显著影响 LLMs 遵循 JSON 响应格式的能力。任务越复杂,指定提示就越能提高结果的准确性。
- Simple tasks such as text classification, entity extraction, paraphrasing, and summarization tasks that don’t require complex reasoning generally do not require additional prompting. For smaller models of lower intelligence, just using Structured Outputs significantly improves JSON adherence accuracy, as it ignores any text the model provides unrelated to the supplied schema.
- Medium-complexity tasks include any simple task in which the model is asked for additional reasoning, such as providing its rationale for a classification decision. For these use cases, we recommend adding “Respond in JSON” in the prompt to optimize performance.
- Complex reasoning tasks prompt models to perform more open-ended ambiguous tasks, such as assessing and scoring
the quality of a call based on the relevance, professionalism, and faithfulness of answers. For these use cases, we
recommend using the most powerful models like Anthropic’s
claude-sonnet-4-6or Mistral AI’smistral-large2and adding “Respond in JSON”, and details about the schema you want to generate in the prompt.
For the most consistent results, set the temperature option to 0 when you call COMPLETE, regardless of the task or
model.
Tip
To handle possible errors raised by a model, use TRY_COMPLETE rather than COMPLETE.
成本注意事项
Cortex COMPLETE 结构化输出功能根据处理的词元数量计算成本,但不会因对照 JSON 架构验证每个词元而产生额外计算成本。然而,处理(和计费)的词元数量会随着架构复杂性的增加而增加。一般来说,提供的架构越大越复杂,消耗的输入和输出词元就越多。深度嵌套的高度结构化响应(例如分层数据)比简单架构消耗更多词元。
限制
- 您不能在架构的键中使用空格。
- 属性名称允许使用的字符包括:字母、数字、连字符和下划线。名称的最大长度为 64 个字符。
- You cannot address external schemas using
$refor$dynamicRef.
不支持以下约束关键字。使用不受支持的约束关键字会导致错误。
| 类型 | 关键字 |
|---|---|
| 整数 | multipleOf |
| 数字 | multipleOf, minimum, maximum, exclusiveMinimum, exclusiveMaximum |
| 字符串 | minLength, maxLength, format |
| 数组 | uniqueItems, contains, minContains, maxContains, minItems, maxItems |
| 对象 | patternProperties, minProperties, maxProperties, propertyNames |
这些限制可能会在未来的版本中得到解决。
错误条件
| Situation | Example message | HTTP status code |
|---|---|---|
| Request validation failed. The query was cancelled as the model wouldn’t be able to generate a valid response. This can be caused by a malformed request. | please provide a type for the response format object, please provide a schema for the response format object | 400 |
输入架构验证失败。由于模型无法生成有效的响应,查询被取消。这可能是由于请求负载中缺失必需的属性,或者使用了不受支持的 json 架构功能(例如约束),或者不当使用 $ref 机制(例如,超出架构范围) |
| 400 |
| Model output validation failed. The model could not generate a response that matched the schema. |
| 422 |