Use Snowsight to set up data quality checks

This topic describes how to use Snowsight to set up data quality checks. You can use the following strategies to set up data quality checks:

For an introduction to concepts of data quality checks, see Core concepts of data quality checks.

使用 Cortex Data Quality 设置质量检查

Cortex Data Quality 使用 AI 根据元数据特征提供数据质量检查建议。如果您接受这些建议,Snowflake 将定期检查您的数据是否存在质量问题,以识别潜在风险。

Cortex Data Quality leverages the Snowflake Cortex AI_COMPLETE function to intelligently suggest data quality checks. Because it runs securely inside Snowflake Cortex, your enterprise data and metadata always stay securely inside Snowflake. Cortex Data Quality also fully respects Snowflake access control and provides suggestions that are based only on the data that you can access.

要要使用 Cortex Data Quality 设置数据质量检查,请执行以下操作:

  1. Sign in to Snowsight.

  2. In the navigation menu, select Catalog » Database Explorer, and then select the object.

  3. Select the Data Quality tab.

  4. Select Monitoring.

  5. 执行下列操作之一:

    • If this is the first time you are setting up quality checks, select Get started.
    • If you are setting up additional quality checks, select Add quality check, and then select Suggested quality checks.
  6. Review the suggested data quality checks. To change the criteria that determine whether data passes a quality check, edit the contents of the What should the result be? column.

  7. Select the quality checks that you want to implement, and then select Apply.

For more information about Cortex Data Quality, see 有关 Cortex Data Quality 的更多信息.

手动设置质量检查

要根据您对数据的了解手动创建数据质量检查,请执行以下操作:

  1. Sign in to Snowsight.

  2. In the navigation menu, select Catalog » Database Explorer, and then select the object.

  3. Select the Data Quality tab.

  4. Select Monitoring.

  5. 执行下列操作之一:

    • If this is the first time you are setting up quality checks, select Start manually.
    • If you are setting up additional quality checks, select Add quality check, and then select Build checks manually.
  6. In the Set up a quality check dialog, select the type of check that you want to create.

  7. Configure the criteria that determine if data passes the quality check, and then select Save.

Tip

If you want to enable anomaly detection so that Snowflake can automatically detect data quality issues based on the historical volume and freshness of your data, either use Cortex Data Quality and accept its suggestions for anomaly detection or set up anomaly detection manually.

调整质量检查的运行频率

表或视图的计划决定了支持数据质量检查运行的 DMF 的执行频率。计划可以基于时间,也可以基于表的更新。

Note

You can’t use Snowsight to adjust the schedule until you have added at least one quality check. You can use an ALTER <object> command to set the schedule for a table or view at anytime.

  1. Sign in to Snowsight.
  2. In the navigation menu, select Catalog » Database Explorer, and then select the object.
  3. Select the Data Quality tab.
  4. Select Monitoring.
  5. Select Settings.
  6. 指定您希望运行 DMF 的频率:
    • To run the DMF at a regular interval of one day or less, select Interval-based timing and select the interval from the drop-down list.
    • To run the DMF on a custom schedule, select Select schedule and set the schedule.
    • To run the DMF whenever there is a DML change to the table — for example, when a row is added — select Trigger-based execution.

有关 Cortex Data Quality 的更多信息

以下部分提供了有关 Cortex Data Quality 的其他信息。

必填 LLMs

Cortex Data Quality won’t work unless the CORTEX_MODELS_ALLOWLIST account parameter allows the mistral-7b and llama3.1-8b models within the account. By default, both models are allowed. For more information about setting this parameter, see Account-level allowlist parameter.

访问控制要求

拥有 ACCOUNTADMIN 角色的管理员具备使用 Cortex 建议数据质量检查所需的所有权限。

其他用户必须拥有以下权限和角色:

  • 表的 OWNERSHIP 权限
  • 账户的 EXECUTE DATA METRIC FUNCTION 权限
  • SNOWFLAKE.DATA_METRIC_USER 数据库角色
  • SNOWFLAKE.CORTEX_USER 数据库角色

限制访问

默认情况下,CORTEX_USER 数据库角色被授予 PUBLIC 角色,这意味着每个用户都拥有该角色。如果您不希望所有用户都能使用 Snowflake Cortex 功能,可以从 PUBLIC 角色中撤销此数据库角色,然后将其授予特定角色。

要阻止用户使用 Cortex 建议质量检查,请通过运行以下命令从 PUBLIC 角色中撤销 CORTEX_USER 数据库角色。请务必使用 ACCOUNTADMIN 角色。

USE ROLE ACCOUNTADMIN;

REVOKE DATABASE ROLE SNOWFLAKE.CORTEX_USER
  FROM ROLE PUBLIC;

You can now selectively provide access by granting the CORTEX_USER database role to specific roles. In the following example, use the ACCOUNTADMIN role and grant the user some_user the CORTEX_USER database role through the account role cortex_access_role, which you create for this purpose.

USE ROLE ACCOUNTADMIN;

CREATE ROLE cortex_access_role;
GRANT DATABASE ROLE SNOWFLAKE.CORTEX_USER TO ROLE cortex_access_role;

GRANT ROLE cortex_access_role TO USER some_user;

您也可以将 CORTEX_USER 数据库角色授予现有角色。

成本注意事项

使用 Cortex Data Quality 的成本包括以下内容:

  • Costs associated with the COMPLETE (SNOWFLAKE.CORTEX) function. These charges appear on a bill as AI-Services, which includes all uses of Snowflake Cortex.
  • Compute cost of the default warehouse that runs Snowsight.

法律声明

如本页前文所述,Cortex Data Quality 利用了第三方模型和/或服务。

输入和输出的 Data Classification 如下表所示。

输入 Data Classification输出 Data Classification名称
使用量数据使用量数据Preview AI Features. [1]

For additional information about the use of AI, see Snowflake AI and ML.