Introduction to Data Quality and data metric functions

Data Quality uses data metric functions (DMFs), which include Snowflake-provided system DMFs and user-defined DMFs, to monitor the state and integrity of your data. You can use DMFs to measure key metrics, such as, but not limited to, freshness and counts that measure duplicates, NULLs, rows, and unique values.

About Data Quality and DMFs

Data Quality focuses on knowing the state and integrity of your data, which includes data freshness and accuracy with respect to true data values compared to null values or blank fields in a column, to make data-driven decisions. You can measure the quality of your data by using DMFs. Snowflake provides built-in system DMFs in the SNOWFLAKE.CORE schema to measure common metrics without having to define them. You can also define your own custom DMFs to fine-tune your data quality measurements more precisely, and these DMFs are stored in the database and schema of your choice.

Whether you use system DMFs, custom DMFs, or both, after you assign a DMF to a table or view, Snowflake records the results of scheduling the DMF in a dedicated event table for data metric functions. You can specify the frequency for how often the DMF is called. For example, you can schedule the DMFs on a particular table to run three times daily. You can modify the frequency as needed based on your own internal data quality requirements. All DMFs that are set on the table follow the same schedule.

After you schedule the DMFs to run, you can configure alerts to notify you when changes to data quality occur. By combining the DMF and alert functionality, you can have consistent threshold notifications for data quality on the tables that you measure. These insights enhance your data governance posture by enabling the following:

  • Data stewards to know the current state of their data based on a particular metric.

  • Data engineers to take immediate action on important tables and views.

  • Platform administrators to ensure data quality monitoring is done with cost, consistency, and performance.

The data quality workflow of defining, measuring, and monitoring data can then be applied to additional workloads.

For information, see:

Supported table kinds

You can set a DMF on the following kinds of table objects:

  • Dynamic table

  • Event table

  • External table

  • Apache Iceberg™ table

  • Materialized view

  • Table (CREATE TABLE), including temporary and transient tables

  • View

You cannot set a DMF on a hybrid table or a stream object.

Billing and pricing

Snowflake creates and manages the virtual warehouse objects to support this feature. You can use this feature without having to provision or use any additional virtual warehouse resources outside of your normal SQL query workflow.

Calling a DMF uses serverless compute resources as described in more detail in the Snowflake Service Consumption Table. The table lists the credit pricing per cloud region and Snowflake Edition.

The credits that you use are listed in the “Data Quality Monitoring” category on your monthly bill. These credits include compute consumed by all system or user-defined data quality metrics that you use. You are not billed for creating a DMF.

  • Billing occurs only when a scheduled DMF is computed on an object. You are not billed for unscheduled data metric function usage, such as calling a DMF with a SELECT statement.

  • The logging infrastructure consolidates metric outputs in the event table. Consumption incurred by the logging service shows up on your monthly bill as “Logging.”

For more information, see Serverless credit usage.

Tip

You can query the DATA_QUALITY_MONITORING_USAGE_HISTORY to track your credit consumption related to using DMFs in your account.

Benefits

Using DMFs enhances your data quality efforts and provides these benefits:

Facilitate compliance:

By knowing the state of your data, it’s easier to demonstrate how you adhere to compliance and regulatory standards. This decreases your risk and enhances your data governance posture.

Meet service-level agreements (SLAs):

Accurate data metrics, such as freshness, help to meet SLAs between data providers, data consumers, and clients.

Credibility:

DMFs provide data validation, which facilitates trustworthy data-driven decision making.

Consistency:

Using system DMFs and repeatedly using proper custom DMFs leads to consistent data quality evaluation over time. This enhances the credibility of your data.

Optimize for specific use cases:

Custom DMFs enable data engineers to design accurate metrics to measure the data, which leads to more accurate optimizations for targeted applications of the data.

Automated measurement:

Calling the DMF is automated after you assign the DMF to a table or view, and specify the schedule for the DMF to run. No additional work is needed to actively measure your data quality. For more information, see Schedule your DMFs to run and View the DMF results

Performance management:

Snowflake evaluates how you use DMFs and optimizes the query evaluation to provide optimal query performance.

Considerations

  • For information about replication and DMFs, see Replication of data metric functions (DMFs).

  • To set a DMF on a table, the table owner role must be granted the global (account-level) EXECUTE DATA METRIC FUNCTION privilege. Therefore, the table owner role should be a user-defined custom role that you created with the CREATE ROLE command or a system role, such as the SYSADMIN role.

    You cannot grant global privileges to database roles because database roles are scoped to the database in which they exist. If you have a table that is owned by a database role and you want to set a DMF on that table, you must transfer the OWNERSHIP privilege of the table to a custom role or system role. For more information, see GRANT OWNERSHIP.

Limitations

Note the following limitations when using DMFs:

  • Setting DMFs on objects:

    You can only have 10,000 total associations of DMFs on objects per account. Each instance of setting a DMF on a table or view counts as one association.

  • Data sharing:

    You cannot grant privileges on a DMF to share or set a DMF on a shared table or view.

  • Setting a DMF on an object tag is not supported.

  • Cloning:

    • Using the CLONE operation and the CREATE TABLE … LIKE operation does not result in DMF assignments on the target object.

    • If you clone a database or schema that contains DMFs, the DMFs are cloned to the target database or schema.

  • You cannot set a DMF on objects in a reader account.

  • Trial accounts do not support this feature.

Managing DMFs

These sections summarize information on how to manage DMFs:

Additionally, you can grant application roles to manage access to the results of DMFs. For more information, see Managing access to the DMF results

DMF privileges

Snowflake supports the following privileges to use with a DMF.

The USAGE privilege on the parent database and schema are required to perform operations on any object in a schema.

Privilege

Object

Usage

CREATE DATA METRIC FUNCTION

Schema

Enables creating a new DMF in a schema.

USAGE

Data metric function

Enables calling the DMF.

EXECUTE DATA METRIC FUNCTION

Account

Enables using serverless compute resources when calling a DMF.

OWNERSHIP

Data metric function

Transfers ownership of the DMF, which grants full control over the DMF. Required to alter most properties of the DMF.

Additionally, if you’d like to manage the assignment of a DMF on a table or view, use a role with these privileges:

Privilege

Object

Notes

OWNERSHIP

Table or view

OWNERSHIP is a special privilege on an object that is automatically granted to the role that created the object, but can also be transferred using the GRANT OWNERSHIP command to a different role by the owning role (or any role with the MANAGE GRANTS privilege).

USAGE

Data metric function

This privilege is for user-defined DMFs.

For system DMFs, grant the SNOWFLAKE.DATA_METRIC_USER database role to the table owner role.

USAGE

Database, schema

These objects are the database and schema that store the user-defined DMF.

EXECUTE DATA METRIC FUNCTION

Account

This privilege enables you to control which roles have access to serverless compute resources.

DMF command reference

Snowflake supports the following commands to work with DMFs:

Additionally, use the ALTER TABLE and the ALTER VIEW commands to do the following:

  • Add or drop a data metric function on a column.

  • Add or drop a data metric function on the table or view itself.

  • Schedule the data metric function to run.

For representative examples, see Working with data metric functions.

When using the GRANT <privileges> and REVOKE <privileges> commands, you can:

  • Grant privileges to an account role or a database role. Other role types, such as application roles, are not supported.

  • Specify any supported DMF privilege or ALL PRIVILEGES.

You can use the GRANT OWNERSHIP command to transfer ownership to an account role.

The GRANT and REVOKE commands require you to specify the arguments of the user-defined DMF that you create. For example:

GRANT USAGE ON FUNCTION
  governance.dmfs.count_positive_numbers(TABLE(NUMBER, NUMBER, NUMBER))
  TO data_engineer;
Copy

Summary of DMF commands, operations, and privileges

The following table summarizes the relationship between DMF privileges and DDL operations.

The USAGE privilege on the parent database and schema are required to perform operations on any object in a schema.

In addition to the requirements in the table, if you’re using a system DMF, use the ACCOUNTADMIN role or a role that is granted IMPORTED PRIVILEGES on the shared SNOWFLAKE database.

Operation

Privilege required

Use a system DMF.

See System data metric functions.

Create a DMF with the CREATE DATA METRIC FUNCTION command.

A role with the CREATE DATA METRIC FUNCTION privilege on the schema.

Modify a DMF with an ALTER FUNCTION command.

A role with the OWNERSHIP privilege on the DMF.

Add a DMF or drop a DMF from a table or view with an ALTER TABLE or ALTER VIEW command.

A role with the following privileges:

  • OWNERSHIP on the table

  • USAGE on the DMF

  • EXECUTE DATA METRIC FUNCTION on the account

Identify metric references by calling the DATA_METRIC_FUNCTION_REFERENCES function.

Both of the following:

  • A role with the USAGE privilege on the DMF.

  • A role with the SELECT privilege on the table or view on which the DMF is added.

Describe a DMF with a DESCRIBE FUNCTION command.

A role with the USAGE privilege on the DMF.

Drop a DMF from the current or specified schema with a DROP FUNCTION command.

A role with the OWNERSHIP privilege on the DMF.

List data metric functions with a SHOW DATA METRIC FUNCTIONS or SHOW FUNCTIONS command.

A role with the USAGE privilege on the DMF.

Call a DMF manually using SELECT.

A role with the global EXECUTE DATA METRIC FUNCTION privilege or a role with the USAGE privilege on the DMF.

View DMF results.

For more information, see View the DMF results.

Language: English