Remediation of data quality issues

Data quality metrics (DMFs) let you identify how many records in a table might contain quality issues. For example, the SNOWFLAKE.CORE.NULL_COUNT can identify how many records contain a NULL value in a specific column.

To help you fix these possible quality issues, you can call the SYSTEM$DATA_METRIC_SCAN system function to return the individual records identified by the DMF as containing data that failed a data quality check. For example, if you pass the NULL_COUNT DMF into the SYSTEM$DATA_METRIC_SCAN function as an argument, then you can obtain the actual records that contain a NULL value, not just the number of records that contain a NULL value.

Supported DMFs

The SYSTEM$DATA_METRIC_SCAN function accepts a DMF as an argument to return the records identified by the DMF as containing problematic data. The following system DMFs can be used as arguments:

  • SNOWFLAKE.CORE.NULL_COUNT

  • SNOWFLAKE.CORE.NULL_PERCENT

  • SNOWFLAKE.CORE.BLANK_COUNT

  • SNOWFLAKE.CORE.BLANK_PERCENT

  • SNOWFLAKE.CORE.DUPLICATE_COUNT

Limitations and considerations

  • You cannot use custom DMFs as an argument of the SYSTEM$DATA_METRIC_SCAN function.

  • If a table is protected by a policy, such as a masking policy or row access policy, the SYSTEM$DATA_METRIC_SCAN function might return unexpected or incomplete data because results depend on the user’s role when executing the function.

Calling the SYSTEM$DATA_METRIC_SCAN function

When you call the SYSTEM$DATA_METRIC_SCAN function, it analyses a table with a DMF to identify possible data quality issues. You must pass in the following arguments to the SYSTEM$DATA_METRIC_SCAN function: the name of the table, the DMF, and any arguments being passed to the DMF to help identify problematic records. For example, given that the SNOWFLAKE.CORE.NULL_COUNT system metric function returns the total number of NULL values in a particular column, the following returns the rows of the employeesTable table that have NULL values in the SSN column in the version of the table that existed at a specific time in the past.

SELECT *
  FROM TABLE(SYSTEM$DATA_METRIC_SCAN(
    REF_ENTITY_NAME  => '<governance.sch.employeesTable>'
    ,METRIC_NAME  => '<snowflake.core.null_count>'
    ,ARGUMENT_NAME => '<SSN>'
    ,AT_TIMESTAMP => '<2024-08-28 02:00:00 -0700>'
  ));
Copy

To check the results of a DMF evaluation on the table or view in the past, you can pass the AT_TIMESTAMP argument. The AT_TIMESTAMP argument allows you to use Time Travel to cast the timestamp string to return only those records that existed in the table at the ‘2024-08-28 02:00:00 -0700’ timestamp.

Using SYSTEM$DATA_METRIC_SCAN to fix data

The SYSTEM$DATA_METRIC_SCAN function is a table function that returns a set of rows. The output of the function can be used within a DML statement to take action on the records that have been identified as containing data that failed a data quality check.

Suppose you want to replace blank values in the email column of the t table with NULL values. Because the BLANK_COUNT data metric function identifies blank values, you could run the following statement:

UPDATE T
  SET email = null
  WHERE T.ID IN (SELECT ID FROM TABLE(SYSTEM$DATA_METRIC_SCAN(
    REF_ENTITY_NAME => '<t>'
    ,METRIC_NAME => '<snowflake.core.blank_count>'
    ,ARGUMENT_NAME => '<email>'
  )));
Copy
Language: English