Introduction to data quality checks¶
Data quality checks in Snowflake continuously validate the health of your data. These checks help you comply with regulatory standards, meet service-level agreements through accurate metrics, and build credibility in data-driven decisions by providing automated, consistent data validation. Cortex Data Quality lets you leverage AI to agentically suggest data quality checks based on characteristics of your metadata and usage patterns, eliminating the need to manually define checks and accelerating your setup process while keeping your data securely inside Snowflake. Once configured, quality checks run automatically on your chosen schedule, reporting violations so you can take corrective action.
Get started¶
Snowflake provides a web interface to set up data quality checks and monitor the results of these checks.
To get started, do one of the following:
To set up data quality checks for your data, see 使用 Snowsight 设置数据质量检查.
To monitor the results of your existing data quality checks, see 监控 Snowsight 中的数据质量检查.
Core concepts of data quality checks¶
- Data metric function (DMF)
A DMF measures an attribute of your data such as how many NULL values exist in a column or how often a table is being updated. The DMF returns a value based on the current state of your data, but doesn't define whether that value constitutes a data quality issue; a DMF is a building block of a data quality check.
Snowflake provides system DMFs to measure common metrics without requiring configuration. For a list of the system DMFs that are available for various dimensions, see 系统数据指标函数.
If there isn't a system DMF for the metric that you want to monitor, you can define a custom DMF. To learn how to create a custom DMF, see 自定义数据指标函数.
- Expectations
An expectation is combined with a DMF to create a data quality check. When a DMF returns a value, it's compared to the expectation's definition to determine whether data passed or failed the check. Return values that fail the check are reported as expectation violations so you can take appropriate action.
If you use Snowsight to create a data quality check, you choose the DMF and define the expectation at the same time. You can also use SQL to work with expectations directly.
- Anomaly detection
Anomaly detection uses historical data to automatically detect when a DMF return value is above or below a predicted range. Currently, Snowflake can automatically detect anomalies in the volume and freshness of your data. For more information, see 检测数据质量异常.
- DMF schedule
The DMF schedule for a table or view determines how often a DMF runs. Because a DMF powers a data quality check, the DMF schedule determines how often the quality check is performed. By default, the DMF schedule runs a DMF once every hour. To adjust the schedule for a table or view, see 调整质量检查的运行频率.
The DMF schedule doesn't affect how often Snowflake checks whether there is an anomaly.
支持的表类型¶
您可在以下类型的表对象上设置 DMF:
动态表
事件表
外部表
Apache Iceberg™ 表
物化视图
表 (CREATE TABLE),包括临时表和瞬态表
视图
您不能在混合表或 Stream 对象上设置 DMF。
Cost considerations¶
The DMFs that power data quality checks use serverless compute resources that incur costs. For the pricing of these costs, see Snowflake Service Consumption Table.
The credits consumed by the serverless compute resources are listed under the "Data Quality Monitoring" category on your monthly bill. These credits include compute consumed by all system or user-defined data quality metrics that you use. You are not billed for creating a DMF.
只有在计算对象的计划 DMF 时,才会计费。使用计划外的数据指标函数无需付费,如使用 SELECT 语句调用 DMF。
日志记录基础设施会将指标输出合并到事件表中。日志记录服务产生的消耗在您的月账单上显示为“日志记录”。
小技巧
To track consumption related to quality checks, you can query the following views:
DATA_QUALITY_MONITORING_USAGE_HISTORY to track your credit consumption related to using DMFs in your account.
METERING_DAILY_HISTORY to track the daily credits consumed for an account in your organization. The
service_typecolumn specifiesDATA_QUALITY_MONITORING.
Replication¶
有关复制和 DMFs 的信息,请参阅 数据指标函数的复制 (DMFs)。
限制¶
使用 DMFs 时请注意以下限制:
每个账户最多只能有 1 万个 DMFs 对象关联。对表或视图设置 DMF 的每个实例都算作一个关联。
Data sharing: You can't grant privileges on a DMF to a share or set a DMF on a shared table or view.
不支持对对象标签设置 DMF。
You can't set a DMF on objects in a reader account.
Trial accounts don't support this feature.