类别：: 文件函数 (AISQL)

AI_TRANSCRIBE¶

Transcribes text from an audio or video file with optional timestamps and speaker labels. AI_TRANSCRIBE supports numerous languages, and audio can contain more than one language. Timestamps and speaker labels are extracted based on the specified timestamp granularity, as shown in the table below.


时间戳粒度	结果
默认值	将整个音频文件转录成一段
文字	转录的每个文字都有时间戳
发言人	每次更换发言人时指明谁在说话以及时间戳

语法¶

AI_TRANSCRIBE( <audio_file> [ , <options> ] [, <return_error_details> ] )

实参¶

必填：

audio_file: 代表音频文件的 FILE 类型对象。使用 TO_FILE 函数创建对暂存文件的引用。

可选：

options

包含零个或多个以下字段的 OBJECT 值。

timestamp_granularity：指定所需时间戳粒度的字符串。可能的值：
- "word"：文件被转录为一系列文字，每个文字都有自己的时间戳。
- "speaker": The file is transcribed as a series of conversational "turns," each with its own timestamp and speaker label.
如果未指定此字段，则默认情况下会将整个文件转录为没有时间戳的单个片段。

return_error_details

A BOOLEAN flag that indicates whether to return error details in case of error. When set to TRUE, the function returns an OBJECT that contains the value and the error message, one of which is NULL depending on whether the function succeeded or failed. See Error behavior for details.

返回¶

包含转录结果的 JSON 表示的字符串。JSON 对象包含以下字段：

"audio_duration"：音频文件的总时长（以秒为单位）。
"text"：完整音频文件的转录，在未指定 timestamp_granularity 字段时提供。
"segments"：分段数组，在 timestamp_granularity 字段设置为 "word" 或 "speaker" 时提供。每个分段都是一个包含以下字段的 JSON 对象：
- "start"：分段的开始时间，以秒为单位。
- "end"：分段的结束时间，以秒为单位。
- "text"：分段的转录文本。
- "speaker_label"：分段的发言人标签，在 timestamp_granularity 字段设置为 speaker 时提供。标签的形式为“SPEAKER_00”、“SPEAKER_01”等，按照在音频文件中检测到发言人顺序进行分配。

Error behavior¶

By default, if AI_TRANSCRIBE can't process the input, the function returns NULL. If the query processes multiple rows, rows with errors return NULL and don't prevent the query from completing.

The return value on error depends on the return_error_details argument. The following table shows the return value based on the return_error_details argument:

return_error_details

Return value

Description

FALSE

Not passed

NULL

TRUE

OBJECT with value and error fields

value: A VARCHAR value containing the transcription result, or NULL if an error occurred.

error: A VARCHAR value that contains the error message if an error occurred, or NULL if the function succeeded.

`return_error_details`	Return value	Description
FALSE Not passed	NULL
TRUE	OBJECT with `value` and `error` fields	`value`: A VARCHAR value containing the transcription result, or NULL if an error occurred. `error`: A VARCHAR value that contains the error message if an error occurred, or NULL if the function succeeded.

For more information about error handling for AI functions, see Snowflake Cortex AI Function: Multirow error handling improvements.

访问控制要求¶

Users must use a role that has been granted the SNOWFLAKE.CORTEX_USER database role. See Cortex LLM privileges for more information on this role.

使用说明¶

For a list of supported languages, see Supported languages

Supported languages are automatically detected. A file can contain multiple languages, each of which is recognized and transcribed. For accurate language detection, speech must begin within the first five seconds of the file.
AI_TRANSCRIBE supports the following audio and video file formats:

Audio

FLAC, MP3, MP4, OGG, WAV, WEBM

Video

MKV, MP4, OGV, WEBM

Video files must contain at least one audio track in FLAC, MP3, OPUS, VORBIS, or WAV format.

Factors such as sample rate, bit depth, and number of channels do not affect transcription, but they might make the file too large to process if they are too high. Internally, AI_TRANSCRIBE uses monophonic audio at 16 KHz, and resamples input files when they are not already in this format
最大音频文件大小为 700 MB。
当时间戳粒度设置为“word”或“speaker”时，音频文件的最长持续时间为 60 分钟。如果不使用时间戳粒度，则最长持续时间为 120 分钟。

Audio	FLAC, MP3, MP4, OGG, WAV, WEBM
Video	MKV, MP4, OGV, WEBM

示例¶

有关示例，请参阅 AI 音频示例。

故障排除¶

If the function fails, it raises an error. Common error messages include:


错误消息	情况和解决方案
选项对象无效	为 `timestamp_granularity` 字段提供的选项（如果提供）必须是“文字”或“发言人”。
服务器未响应	无法检索音频文件，可能是因为 URL 范围已过期。
文件过大。最大大小为 734,003,200 字节，文件超过了此限制。	提供的音频文件超过了最大文件大小。
文件格式无效。仅支持 [“flac”、“mp3”、“ogg”、“wav”、“webm”] 文件，或者 WebM 文件不包含音频流。	音频文件不是支持的格式之一，错误消息中列出了这些格式。WebM 文件支持多种媒体类型，因此请确保文件包含音频流。如果文件采用支持的格式，请检查它是否损坏。
文件重采样到 16000 赫兹后会过大。预期大小为 3,355,444,448,000.0 字节。	重采样到 16 KHz 后，提供的音频文件过大。如果提供的音频的采样率较低，则其重采样大小会大于原始音频，并且可能会超过允许的最大文件大小。
音频持续时间过长：6052.10 秒。允许的最大值：3600 秒。或音频持续时间过长：7335.28 秒。允许的最大值：7200 秒。	提供的音频文件过长。如果您使用时间戳粒度，则最长持续时间为 60 分钟（3600 秒）。
检测到不支持的语言	音频文件包含 AI_TRANSCRIBE 不支持的语言。

区域可用性¶

AI_TRANSCRIBE 在以下区域可用：

AWS US 西部 2（俄勒冈州）
AWS US 东部 1（弗吉尼亚北部）
AWS EU 中部 1（法兰克福）
Azure 东部 US 2（弗吉尼亚）

法律声明¶

请参阅 Snowflake AI 和 ML。