- 类别:
字符串和二进制函数 (AI Functions)
AI_EXTRACT¶
Extracts information from an input string or file.
语法¶
Extract information from an input string:
AI_EXTRACT( <text>, <responseFormat> )
AI_EXTRACT( text => <text>,
responseFormat => <responseFormat> )
Extract information from a file:
AI_EXTRACT( <file>, <responseFormat> )
AI_EXTRACT( file => <file>,
responseFormat => <responseFormat> )
实参¶
textAn input string for extraction.
fileA FILE for extraction.
Supported file formats:
PDF
PNG
PPTX, PPT
EML
DOC, DOCX
JPEG、JPG
HTM、HTML
TEXT、TXT
TIF、TIFF
BMP, GIF, WEBP
MD
The files must be less than 100 MB in size.
responseFormatInformation to be extracted in one of the following response formats:
Simple object schema that maps the label and information to be extracted; for example:
{'name': 'What is the last name of the employee?', 'address': 'What is the address of the employee?'}An array of strings that contain the information to be extracted, for example:
['What is the last name of the employee?', 'What is the address of the employee?']An array of arrays that contain two strings (label and the information to be extracted); for example:
[['name', 'What is the last name of the employee?'], ['address', 'What is the address of the employee?']]A JSON schema that defines the structure of the extracted information. Supports entity and table extraction. For example:
{ 'schema': { 'type': 'object', 'properties': { 'income_table': { 'description': 'Income for FY2026Q2', 'type': 'object', 'properties': { 'month': { 'description': 'Month', 'type': 'array' }, 'income': { 'description': 'Income', 'type': 'array' } } }, 'title': { 'description': 'What is the title of the document?', 'type': 'string' }, 'employees': { 'description': 'What are the names of employees?', 'type': 'array' } } } }
备注
You can't combine the JSON schema format with other response formats. If
responseFormatcontains theschemakey, you must define all questions within the JSON schema. Additional keys are not supported.The model only accepts certain shapes of JSON schema. Top level type must always be an object, which contains independently extracted sub-objects. Sub-objects may be a table (object of lists of strings representing columns), a list of strings, or a string.
String is currently the only supported scalar type.
The
descriptionfield is optional.Use the
descriptionfield to provide context to the model; for example, to help the model localize the right table in a document.
返回¶
A JSON object containing the extracted information.
Example of an output that includes array, table, and single value extraction:
{
"error": null,
"response": {
"employees": [
"Smith",
"Johnson",
"Doe"
],
"income_table": {
"income": ["$120 678","$130 123","$150 998"],
"month": ["February", "March", "April"]
},
"title": "Financial report"
}
}
访问控制要求¶
Users must use a role that has been granted the SNOWFLAKE.CORTEX_USER database role. For information about granting this privilege, see Cortex LLM privileges.
使用说明¶
You can't use both
textandfileparameters simultaneously in the same function call.You can either ask questions in natural language or describe information to be extracted (such as city, street, ZIP code); for example:
['address': 'City, street, ZIP', 'name': 'First and last name']The following languages are supported:
阿拉伯语
Bengali
缅甸语
Cebuano
中文
捷克语
荷兰语
英语
法语
德语
希伯来语
印地语
印尼语
意大利语
日语
高棉语
韩语
老挝语
马来语
波斯语
波兰语
葡萄牙语
俄语
西班牙语
Tagalog
泰语
土耳其语
乌尔都语
越南语
文档的长度不得超过 125 页。
In a single AI_EXTRACT call, you can ask a maximum of 100 questions for entity extraction, and a maximum of 10 questions for table extraction.
A table extraction question is equal to 10 entity extraction questions. For example, you can ask 4 table extraction questions and 60 entity extraction questions in a single AI_EXTRACT call.
The maximum output length for entity extraction is 512 tokens per question. For table extraction, the model returns answers that are a maximum of 4096 tokens.
Client-side encrypted stages are not supported.
Confidence scores are not supported.
示例¶
Extraction from an input string¶
The following example extracts information from the input text:
SELECT AI_EXTRACT( text => 'John Smith lives in San Francisco and works for Snowflake', responseFormat => {'name': 'What is the first name of the employee?', 'city': 'What is the address of the employee?'} );
The following example extracts and parses information from the input text:
SELECT AI_EXTRACT( text => 'John Smith lives in San Francisco and works for Snowflake', responseFormat => PARSE_JSON('{"name": "What is the first name of the employee?", "address": "What is the address of the employee?"}') );
Extraction from a file¶
The following example extracts information from the
document.pdffile:SELECT AI_EXTRACT( file => TO_FILE('@db.schema.files','document.pdf'), responseFormat => [['name', 'What is the first name of the employee?'], ['city', 'Where does the employee live?']] );
The following example extracts information from all files in a directory on a stage:
备注
Ensure that the directory table is enabled. For more information, see 管理目录表.
SELECT AI_EXTRACT( file => TO_FILE('@db.schema.files', relative_path), responseFormat => [ 'What is this document?', 'How would you classify this document?' ] ) FROM DIRECTORY (@db.schema.files);
The following example extracts the
titlevalue from thereport.pdffile:SELECT AI_EXTRACT( file => TO_FILE('@db.schema.files', 'report.pdf'), responseFormat => { 'schema': { 'type': 'object', 'properties': { 'title': { 'description': 'What is the title of document?', 'type': 'string' } } } } );
The following example extracts the
employeesarray from thereport.pdffile:SELECT AI_EXTRACT( file => TO_FILE('@db.schema.files', 'report.pdf'), responseFormat => { 'schema': { 'type': 'object', 'properties': { 'employees': { 'description': 'What are the surnames of employees?', 'type': 'array' } } } } );
The following example extracts the
income_tabletable from thereport.pdffile:SELECT AI_EXTRACT( file => TO_FILE('@db.schema.files', 'report.pdf'), responseFormat => { 'schema': { 'type': 'object', 'properties': { 'income_table': { 'description': 'Income for FY2026Q2', 'type': 'object', 'properties': { 'month': { 'type': 'array' }, 'income': { 'type': 'array' } } } } } } );
The following example extracts table (
income_table), single value (title), and array (employees) from thereport.pdffile:SELECT AI_EXTRACT( file => TO_FILE('@db.schema.files', 'report.pdf'), responseFormat => { 'schema': { 'type': 'object', 'properties': { 'income_table': { 'description': 'Income for FY2026Q2', 'type': 'object', 'properties': { 'month': { 'type': 'array' }, 'income': { 'type': 'array' } } }, 'title': { 'description': 'What is the title of document?', 'type': 'string' }, 'employees': { 'description': 'What are the surnames of employees?', 'type': 'array' } } } } );
Regional availability¶
法律声明¶
有关法律声明,请参阅 Snowflake AI 和 ML。