Snowflake Cortex 中的 AI 可观察性¶
使用 Snowflake Cortex 中的 AI 可观察性,评估和跟踪您的生成式 AI 应用程序。使用 AI 可观察性,您可以使您的应用程序更加值得信赖和透明。您可以通过运行系统性评估,使用该功能来测量 AI 应用程序的性能表现。您可以使用评估中的信息来迭代您的应用程序配置并优化性能。您还可以使用它来记录应用程序跟踪以进行调试。
使用 AI 可观察性进行性能基准测试,从而确保您的应用程序更可靠,并为生产部署提供更高的信心。
AI 可观察性具有以下功能:
- Evaluations: Use AI Observability to systematically evaluate the performance of your generative AI applications and agents using the LLM-as-a-judge technique. You can use metrics, such as accuracy, latency, usage, and cost, to quickly iterate on your application configurations and optimize performance.
- 比较: 并行对比多项评估结果,并评估响应的质量与准确性。您可以分析不同 LLMs、提示和推理配置的响应,从而确定生产部署的最佳配置。
- 跟踪: 追踪应用程序执行的每个环节,包括输入提示、检索上下文、工具使用及 LLM 推理过程。使用它来调试单个记录并优化应用程序的准确性、延迟和成本。
AI 可观察性可用于评估各种任务类型,例如检索增强生成 (RAG) 和摘要。例如,上下文相关性分数可以帮助您检测与用户查询对应的搜索结果检索的质量。您可以使用答案相关性和锚定性分数,根据检索到的上下文来检测最终响应的真实性和相关性。
在摘要任务中,您可以根据原始输入衡量 LLM 生成摘要的事实正确性和全面性,并避免在生成式 AI 应用中提示词和 LLMs 出现高频幻觉的情况。
To get started, learn about the 关键概念, and then take a quick walkthrough with the AI Observability Tutorial. You can then use the information in Evaluate AI applications for an in-depth walkthrough.
To review a specific concept, see the Snowflake AI Observability Reference. For querying AI_OBSERVABILITY_EVENTS with SQL for a Cortex Agent (pass CORTEX AGENT as agent_type) or an External Agent application (pass EXTERNAL AGENT as agent_type), see Monitor Cortex Agent requests, GET_AI_OBSERVABILITY_EVENTS (SNOWFLAKE.LOCAL), and External Agent commands.
Visibility of unredacted raw fields in monitoring and in observability user-defined table function results is covered by the READ UNREDACTED AI OBSERVABILITY EVENTS TABLE account privilege; it does not apply to Cortex Agent evaluation runs or the External AgentEvaluations experience. For more details, please see Account Privilege READ UNREDACTED AI OBSERVABILITY EVENTS TABLE and Monitor Cortex Agent requests.
访问控制和先决条件
在开始使用 AI 可观察性之前,请执行以下操作:
-
To create and execute runs, your role must have the following roles or privileges granted. For more information, see Required privileges:
- CORTEX_USER 数据库角色
- 架构的 CREATE EXTERNAL AGENT 权限
- 架构的 CREATE TASK 权限
- EXECUTE TASK 全局权限
-
在 Python 项目中安装以下 Trulens Python 包:
trulens-coretrulens-connectors-snowflaketrulens-providers-cortex
The version of the package that you’re using in your Python project should be version 2.1.2 or later.
TruLens is the platform that Snowflake uses to track your applications. For more information, see the TruLens documentation (https://trulens.org/getting_started).
关键概念
应用程序
应用程序是指采用多组件(如 LLMs、工具 [例如搜索检索器或 APIs] 及其他自定义逻辑)设计的端到端生成式 AI 应用程序。例如,一个应用程序可以包含由检索器、重排序器和 LLMs 串联组成的 RAG 管道。您可为能在任何环境(如 Snowflake、云端或本地)中运行的应用程序启用 AI 可观察性。
External Agent¶
Applications are represented in Snowflake as External Agent objects. An External Agent object is used to store application and evaluation metadata (such as the application name, version name, or run name). It does not store the application code, application definition, execution traces, or evaluation results. While the application can be hosted in any environment (such as Snowflake, cloud, or on-premises), the execution traces and evaluation results are stored in an event table in your Snowflake account. For more information, see Observability data.
In addition to storing application and evaluation metadata, the External Agent object is also used to govern access to the traces and evaluation results for the application. For more information, see Required privileges.
The TruLens SDK automatically creates External Agent objects when you register an application (for example, using TruApp(),
TruChain, TruGraph, or TruLlama). Running an evaluation can also create an External Agent if one does not already exist
for the specified application name.
You can also manage external agents using SQL commands. For more information, see External Agent commands.
Important
External Agent objects share a namespace with model objects. You cannot create an external agent with the same name as an existing model in the same schema, and vice versa. If a name collision occurs (for example, when an evaluation and a model share the same name), you must rename or drop the conflicting object before proceeding.
版本
应用程序可以有多个版本。每个版本代表不同的实施方式。例如,这些版本可以表示不同的检索器、提示、LLMs 或推理配置。
数据集
数据集表示一组输入。您可以将其配置为同时表示一组预期输出(基准真值),用于测试应用程序。使用数据集,您可以调用应用程序来执行以下任务:
- 生成输出。
- 捕获跟踪。
- 计算评估指标。
You can use a dataset containing both the inputs and the generated outputs to compute the evaluation metrics without invoking the application. For a list of fields supported in the dataset, see Dataset and attributes.
运行
运行是一项评估作业。它使用您指定的数据集和应用程序版本来计算评估指标。
A run has an invocation stage and a computation stage. The invocation stage triggers the application to generate the output and corresponding traces. The computation stage computes the evaluation metrics specified for the run. Multiple computations can be performed to add new metrics to an existing run. For the list of statuses associated with the execution of a run, see Runs.
指标
Evaluation metrics are scores that you use to assess generative AI application performance based on your own criteria. These metrics use LLMs to grade outputs and provide detailed scoring information. For a comprehensive list of metrics and their definitions, see Evaluation metrics.
跟踪
Traces are comprehensive records that capture the inputs, outputs, and intermediate steps of the interactions with an LLM application. Traces provide a detailed view of the application’s execution. Use traces to analyze and understand the model’s behavior at each stage. You can compare the traces of different application versions to identify improvements, debug issues, and verify intended performance. For information about accessing traces associated with each record, see Evaluate AI applications.
定价
AI Observability uses LLM judges to compute the evaluation metrics. For server-side evaluations, LLMs on Cortex AI are used as LLM judges. The LLM judges are invoked via the COMPLETE (SNOWFLAKE.CORTEX) function to perform evaluations. You incur charges for the Cortex Complete function calls. The LLM used to perform the evaluations determines how much you’re charged. Additionally, you’re charged the following:
- 用于管理评估任务的仓库费用
- 用于计算评估指标的查询的仓库费用
- 评估结果的存储费用
- 在 Snowsight 中查看评估结果时产生的仓库检索费用