Cortex AI Functions: Audio¶
Cortex AI Audio 提供先进的 LLM 音频处理功能,包括:
- 转录: 将口语转换为文本。
- 说话者识别: 确定多扬声器音频文件的每个部分的发言人。
- 时间戳提取: 识别每个口语文字的时间戳。
These capabilities are available through the AI_TRANSCRIBE function. Because AI_TRANSCRIBE is managed and hosted inside Snowflake, you can easily integrate audio processing into your data workflows without onerous setup or infrastructure management.
Note
The AI_TRANSCRIBE function also processes audio tracks in video files.
AI_ TRANSCRIBE¶
AI_TRANSCRIBE is a fully managed SQL function that transcribes audio and video files stored in a stage, extracting text, timestamps, and speaker information. See Create stage for media files for information on creating a stage suitable for storing files for processing by AI_TRANSCRIBE.
Under the hood, AI_TRANSCRIBE orchestrates optimized AI models for transcription and speaker diarization, processing audio files of up to two hours in length. AI_TRANSCRIBE is horizontally scalable, allowing efficient batch processing by processing multiple files at the same time. Audio can be processed directly from object storage to avoid unnecessary data movement.
默认情况下,AI_TRANSCRIBE 将音频文件转换为干净、可读的文本。您还可以指定时间戳粒度,以提取每个单词或说话者变化的时间戳。对于字幕等应用程序或让用户通过点击脚本中的单词来跳转到音频的特定部分,单词级时间戳非常有用。说话者级别的时间戳有助于了解谁在会议、面试或电话中说了什么。
| 时间戳粒度模式 | 结果 |
|---|---|
| 默认值 | 将整个音频文件转录成一段 |
| 文字 | 转录的每个文字都有时间戳 |
| 发言人 | 每次更换发言人时指明谁在说话以及时间戳 |
支持的语言
AI_TRANSCRIBE supports the following languages, which are automatically detected. Files can contain multiple supported languages.
Note
Language detection requires audio to begin within the first five seconds of the file. For best results, trim excess silence before uploading.
- 阿拉伯语
- 保加利亚语
- 广东话
- 加泰罗尼亚语
- 中文
- 捷克语
- 荷兰语
- 英语
- 法语
- 德语
- 希腊语
- Hebrew
- Hindi
- 匈牙利语
- 印尼语
- 意大利语
- 日语
- 韩语
- 拉脱维亚语
- Malay
- Norwegian
- 波兰语
- 葡萄牙语
- 罗马尼亚语
- 俄语
- 塞尔维亚语
- 斯洛文尼亚语
- 西班牙语
- 瑞典语
- 泰语
- 土耳其语
- 乌克兰语
Supported media formats¶
AI_TRANSCRIBE supports the following audio and video file formats:
| Audio | FLAC, MP3, MP4, OGG, WAV, WEBM |
|---|---|
| Video | MKV, MP4, OGV, WEBM |
Video files must contain at least one audio track in FLAC, MP3, OPUS, VORBIS, or WAV format.
示例
文本转录
The following example transcribes an audio file stored in the
financial_consultation stage, returning a text transcript of the entire file. The
TO_FILE function converts the staged file to a file reference.
响应:
使用时间戳进行单词级分段
Set the timestamp granularity to “word” to extract precise timestamps for every word spoken, enabling searchable, navigable transcripts. Note that this audio file is in Spanish.
响应:
Note
为简洁起见,输出被截断。完整输出包含音频文件中每个单词的片段。
说话者识别
Set timestamp granularity to “speaker” to detect, separate, and identify unique speakers in conversations or meetings. This example uses an audio file an audio file with two speakers, one speaking English and the other Spanish.
响应:
Note
为简洁起见,输出被截断。完整输出包含音频文件中每个对话“回合”的片段。
Use with other AI Functions¶
Call transcript analysis¶
You can pass the output of AI_TRANSCRIBE to other AI Functions for further processing. For example, you can use AI_SUMMARIZE to summarize the transcription, or AI_CLASSIFY to classify the content of the transcription. This example uses AI_SENTIMENT and AI_COMPLETE to analyze the text transcribed from customer call audio and provide sentiment on four dimensions and an assessment of the agent.
Note
AI_SENTIMENT 仅分析文本,不考虑语气等语音特征。
AI_SENTIMENT 响应:
AI_COMPLETE 响应:
Video transcript analysis¶
The following example transcribes a video file stored in the podcast_videos_S3 stage,
响应:
Once you have the transcript, you can use AI_COMPLETE to perform additional analysis. This example identifies retail brands mentioned in the conversation for use in advertising or sponsorship analytics.
Response
成本注意事项
Billing for all AI Functions is based on token consumption. For transcription, each second of audio processed is 50 tokens, regardless of language or segmentation method. A full hour of audio is therefore 180,000 tokens. Assuming that processing a million tokens costs 1.3 credits, and that Snowflake credits cost US $3 each, each hour of audio processed costs about US $0.702. This estimate is subject to change. For current pricing information, see the Snowflake Service Consumption Table.
Note
AI_TRANSCRIBE 最低计费时长为 1 分钟。少于 1 分钟的文件仍会被处理,但按照 1 分钟计费。要高效地处理大量的短音频文件,可以考虑将它们批处理成一个文件,并使用时间戳来标识生成的转录中每个原始文件的开头和结尾。