Classify sensitive data automatically

Automatic sensitive data classification is a serverless feature that enables the automatic detection and tagging of sensitive data. The feature continuously monitors tables within a specific schema and classifies their columns using native and custom classification categories.

Automatic sensitive data classification lets data engineers and stewards do the following:

  • Demonstrate how automatically classifying tables meets internal governance and compliance needs.

  • Ensure sensitive data is properly tagged.

  • Ensure the right access controls are in place to protect the sensitive data.

Get started

The basic workflow to automatically classify sensitive data consists of the following:

  1. Create a classification profile that controls how often sensitive data in a schema is automatically classified, including whether system tags should be automatically applied after classification.

  2. Optionally, use the classification profile to map user-defined tags to system tags so a column with sensitive data can be associated with a user-defined tag based on its classification. You can add the tag mapping while creating the classification profile or after creating it.

  3. Optionally, add a custom classifier to the classification profile so sensitive data can be automatically classified with user-defined semantic and privacy categories. You can add custom classifiers while creating the classification profile or after creating it.

  4. Set the classification profile on a schema so that tables in the schema get automatically classified.

For end-to-end examples of this workflow, see Examples.

About classification profiles

A data engineer creates a classification profile by creating an instance of the CLASSIFICATION_PROFILE class to define the criteria that are used to automatically classify tables in a schema. This criteria includes:

  • How long a table should exist before automatically classifying it.

  • How long before previously classified tables should be reclassified.

  • Whether system and custom tags should be set on columns after the automatic classification.

  • A mapping between system classification tags and user-defined object tags so the user-defined tags can be applied automatically.

When the data engineer assigns the classification profile to a schema, sensitive data in the tables of the schema are automatically classified on the schedule defined by the profile. A data engineer can assign the same classification profile to multiple schemas, or can create multiple classification profiles if there is a need to set different classification criteria for different schemas.

The process of automatically classifying data requires access to the raw data in the table. The raw data includes tables that have a masking policy assigned to a column. However, Snowflake preserves the intention of regulating access to protected data by using an internal role to automatically classify data. The internal role can access data protected by a masking policy, but this role is not accessible to users.

For an example of using the CREATE CLASSIFICATION_PROFILE command to create a classification profile, see Examples.

About tag mapping

You can use the classification profile to map SEMANTIC_CATEGORY system tags to one or more object tags. This tag mapping allows a column with sensitive data to be automatically assigned a user-defined tag based on its classification. The tag map can be added while creating the classification profile or later by calling the <classification_profile_name>!SET_TAG_MAP method.

Because user-defined object tags can have a masking policy associated with them, you can use a tag map to enable automatic tag-based masking, which facilitates column protection aligned with the SEMANTIC_CATEGORY system tag.

Regardless of whether you are defining the tag map while creating the classification profile or after, the contents of the map are specified as a JSON object. This JSON object contains the 'column_tag_map' key, which is an array of objects that specify a user-defined tag, the string value of that tag, and the semantic categories to which the tag is being mapped. After the tag map is associated with a classification profile and you automatically classify tables in a schema, the tag is assigned to the columns that correspond to the semantic categories.

The following is an example of a tag map:

'tag_map': {
  'column_tag_map': [
    {
      'tag_name':'tag_db.sch.pii',
      'tag_value':'Highly Confidential',
      'semantic_categories':[
        'NAME',
        'NATIONAL_IDENTIFIER'
      ]
    },
    {
      'tag_name': 'tag_db.sch.pii',
      'tag_value':'Confidential',
      'semantic_categories': [
        'EMAIL'
      ]
    }
  ]
}
Copy

Based on this mapping, if you have a column of email addresses and the classification process determines that the column contains these addresses, the tag_db.sch.pii = 'Confidential' tag is set on the column containing the email addresses.

If your tag map includes multiple JSON objects that map tags, tag values, and category values, the order of the JSON objects determines which tag and value to set on the column if there is a conflict. Specify the JSON objects in the desired assignment order from left to right, or top to bottom if you are formatting JSON.

Tip

Each object in the 'column_tag_map' field has only has one required key: 'tag_name'. If you don’t specify a value for the user-defined tag, the classification process applies the recommended SEMANTIC_CATEGORY tag’s value.

If there is a conflict with a manually assigned tag and a tag applied by automatic classification, an error occurs. For information about tracking these errors, see Troubleshooting.

View results of automatic classification

You can view the results of automatic classification in the following ways:

  • Call the SYSTEM$GET_CLASSIFICATION_RESULT stored procedure. For example:

    CALL SYSTEM$GET_CLASSIFICATION_RESULT('mydb.sch.t1');
    
    Copy
  • Use a role that is granted the SNOWFLAKE.GOVERNANCE_VIEWER database role to query the DATA_CLASSIFICATION_LATEST view. For example:

    SELECT * FROM snowflake.account_usage.data_classification_latest;
    
    Copy

Limitations

  • Classification profiles cannot be set on a reader account.

  • Only one classification profile can be set on a schema.

  • The same classification profile cannot be set on more than 10,000 schemas.

  • A maximum of 10,000 tables can be classified in a schema.

Access control

This section describes the privileges and roles that let you work with classification profiles and enable automatic sensitive data classification.

Task

Required privilege/role

Notes

Create a classification profile

SNOWFLAKE.CLASSIFICATION_ADMIN database role

For information about granting this database role to other roles, see Using SNOWFLAKE database roles.

CREATE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE on Schema

You need this privilege on the schema where you want to create the classification profile instance.

Call methods on a classification profile instance

<classification_profile>!PRIVACY_USER instance role

For information about granting this instance role to other roles, see Instance roles.

Set the classification profile on a schema

EXECUTE AUTO CLASSIFICATION on Account

This privilege is granted to the ACCOUNTADMIN by default, which can be used to grant this privilege to other roles.

MODIFY on Schema

You need this privilege on the schema that contains the table that you want to automatically classify.

APPLY TAG on Account

List classification profiles

<classification_profile>!PRIVACY_USER instance role

Drop classification profiles

OWNERSHIP on classification profile instance

For an example of granting these privileges and database roles to the role of a data engineer, see Basic example: Automatically classifying tables in a schema.

Cost of automatically classifying sensitive data

Automatic sensitive data classification consumes credits as it uses serverless compute resources to classify tables in the schema. For more information about pricing for this consumption, see Table 5 in the Snowflake Service Consumption Table.

You can query views in the ACCOUNT_USAGE and ORGANIZATION_USAGE schemas to determine how much was spent on automatically classifying sensitive data. To monitor credit consumption, query the following views:

METERING_HISTORY view (ACCOUNT_USAGE)

Lets you retrieve the hourly cost of automatic classification by focusing on SENSITIVE_DATA_CLASSIFICATION in the SERVICE_TYPE column. For example:

SELECT
  service_type,
  start_time,
  end_time,
  entity_id,
  name,
  credits_used_compute,
  credits_used_cloud_services,
  credits_used,
  budget_id
  FROM snowflake.account_usage.metering_history
  WHERE service_type = 'SENSITIVE_DATA_CLASSIFICATION';
Copy
METERING_DAILY_HISTORY view (ACCOUNT_USAGE and ORGANIZATION_USAGE)

Lets you retrieve the daily cost of automatic classification by focusing on SENSITIVE_DATA_CLASSIFICATION in the SERVICE_TYPE column. For example:

SELECT
  service_type,
  usage_date,
  credits_used_compute,
  credits_used_cloud_services,
  credits_used
  FROM snowflake.account_usage.metering_daily_history
  WHERE service_type = 'SENSITIVE_DATA_CLASSIFICATION';
Copy
USAGE_IN_CURRENCY_DAILY (ORGANIZATION_USAGE)

Lets you retrieve the daily cost of automatic classification by focusing on SENSITIVE_DATA_CLASSIFICATION in the SERVICE_TYPE column. Use this view to determine the cost in currency, not credits.

Examples

Basic example: Automatically classifying tables in a schema

Complete these steps to automatically classify a table in the schema:

  1. As an administrator, give the data engineer the roles and privileges they need to automatically classify tables in a schema.

    USE ROLE ACCOUNTADMIN;
    
    GRANT USAGE ON DATABASE mydb TO ROLE data_engineer;
    GRANT MODIFY ON SCHEMA mydb.sch TO ROLE data_engineer;
    GRANT SELECT ON TABLE mydb.sch.t1 TO ROLE data_engineer;
    
    GRANT DATABASE ROLE SNOWFLAKE.CLASSIFICATION_ADMIN TO ROLE data_engineer;
    GRANT CREATE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE ON SCHEMA mydb.sch TO ROLE data_engineer;
    
    GRANT EXECUTE AUTO CLASSIFICATION ON ACCOUNT TO ROLE data_engineer;
    GRANT APPLY TAG ON ACCOUNT TO ROLE data_engineer;
    
    Copy
  2. Switch to the data engineer role:

    USE ROLE data_engineer;
    
    Copy
  3. Create the classification profile as an instance of the CLASSIFICATION_PROFILE class:

    CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE
      my_classification_profile(
        {
          'minimum_object_age_for_classification_days': 1,
          'maximum_classification_validity_days': 30,
          'auto_tag': true
        });
    
    Copy
  4. Call the DESCRIBE method on the instance to confirm its properties:

    SELECT my_classification_profile!DESCRIBE();
    
    Copy
  5. Set the classification profile instance on the schema, which starts the background process of monitoring tables in the schema and automatically classifying them for sensitive data.

    ALTER SCHEMA mydb.sch
     SET CLASSIFICATION_PROFILE = 'mydb.sch.my_classification_profile';
    
    Copy
  6. Call the SYSTEM$GET_CLASSIFICATION_RESULT stored procedure to obtain the results of the automatic classification.

    CALL SYSTEM$GET_CLASSIFICATION_RESULT('mydb.sch.t1');
    
    Copy
  7. If you no longer need to automatically classify tables in a schema, unset the classification profile from the schema:

    ALTER SCHEMA mydb.sch UNSET CLASSIFICATION_PROFILE;
    
    Copy
  8. Drop any classification profiles that are not needed using the DROP CLASSIFICATION_PROFILE command.

Example: Using a tag map and custom classifiers

  1. As an administrator, give the data engineer the roles and privileges they need to automatically classify tables in a schema and set tags on columns.

  2. Create the classification profile.

    CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE
      my_classification_profile(
        {
          'minimum_object_age_for_classification_days': 1,
          'maximum_classification_validity_days': 30,
          'auto_tag': true
        });
    
    Copy
  3. Call the SET_TAG_MAP method on the instance to add a tag map to the classification profile. This allows custom tags to be automatically applied on columns that contain sensitive data.

    CALL my_classification_profile!SET_TAG_MAP(
      {'column_tag_map':[
        {
          'tag_name':'my_db.sch1.pii',
          'tag_value':'sensitive',
          'semantic_categories':['NAME']
        }]});
    
    Copy

    Alternatively, you could have added this tag map when you created the classification profile.

  4. Call the SET_CUSTOM_CLASSIFIERS method to add custom classifiers to the classification profile. This allows sensitive data to be automatically classified with user-defined semantic and privacy categories.

    CALL my_classification_profile!set_custom_classifiers(
      {
        'medical_codes': medical_codes!list(),
        'finance_codes': finance_codes!list()
      });
    
    Copy

    Alternatively, you could have added the custom classifiers when you created the classification profile.

  5. Call the DESCRIBE method on the instance to confirm that the tag map and custom classifiers have been added to the classification profile.

    SELECT my_classification_profile!DESCRIBE();
    
    Copy
  6. Set the classification profile instance on the schema.

    ALTER SCHEMA mydb.sch
     SET CLASSIFICATION_PROFILE = 'mydb.sch.my_classification_profile';
    
    Copy
  7. Attach a masking policy to the tag_db.sch.pii tag to enable tag-based masking.

    ALTER TAG tag_db.sch.pii SET MASKING POLICY pii_mask;
    
    Copy

Example: Testing a classification profile before enabling automatic classification

  1. As an administrator, give the data engineer the roles and privileges they need to automatically classify tables in a schema and set tags on columns.

  2. Create the classification profile with a tag map and custom classifiers:

    CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE my_classification_profile2(
      {
        'minimum_object_age_for_classification_days':1,
        'auto_tag':true,
        'tag_map': {
          'column_tag_map':[
            {
              'tag_name':'tag_db.sch.pii',
              'tag_value':'highly sensitive',
              'semantic_categories':['NAME','NATIONAL_IDENTIFIER']
            },
            {
              'tag_name':'tag_db.sch.pii',
              'tag_value':'sensitive',
              'semantic_categories':['EMAIL','MEDICAL_CODE']
            }
          ]
        },
        'custom_classifiers': {
          'medical_codes': medical_codes!list(),
          'finance_codes': finance_codes!list()
        }
      }
    );
    
    Copy
  3. Call the SYSTEM$CLASSIFY stored procedure to test the tag mappings on the table1 table before enabling automatic classification.

    CALL SYSTEM$CLASSIFY(
     'db.sch.table1',
     'db.sch.my_classification_profile'
    );
    
    Copy

    The tags key in the output contains the details about whether the tag was set (true if set, false otherwise), the name of the tag that was set, and the value of the tag:

    {
      "classification_profile_config": {
        "classification_profile_name": "db.schema.my_classification_profile"
      },
      "classification_result": {
        "EMAIL": {
          "alternates": [],
          "recommendation": {
            "confidence": "HIGH",
            "coverage": 1,
            "details": [],
            "privacy_category": "IDENTIFIER",
            "semantic_category": "EMAIL",
            "tags": [
              {
                "tag_applied": true,
                "tag_name": "snowflake.core.semantic_category",
                "tag_value": "EMAIL"
              },
              {
                "tag_applied": true,
                "tag_name": "snowflake.core.privacy_category",
                "tag_value": "IDENTIFIER"
              },
              {
                "tag_applied": true,
                "tag_name": "tag_db.sch.pii",
                "tag_value": "sensitive"
              }
            ]
          },
          "valid_value_ratio": 1
        },
        "FIRST_NAME": {
          "alternates": [],
          "recommendation": {
            "confidence": "HIGH",
            "coverage": 1,
            "details": [],
            "privacy_category": "IDENTIFIER",
            "semantic_category": "NAME",
            "tags": [
              {
                "tag_applied": true,
                "tag_name": "snowflake.core.semantic_category",
                "tag_value": "NAME"
              },
              {
                "tag_applied": true,
                "tag_name": "snowflake.core.privacy_category",
                "tag_value": "IDENTIFIER"
              },
              {
                "tag_applied": true,
                "tag_name": "tag_db.sch.pii",
                "tag_value": "highly sensitive"
              }
            ]
          },
          "valid_value_ratio": 1
        }
      }
    }
    
  4. Having verified that automatic classification based on the classification profile will have the desired result, set the classification profile instance on the schema.

    ALTER SCHEMA mydb.sch
     SET CLASSIFICATION_PROFILE = 'mydb.sch.my_classification_profile';
    
    Copy

Troubleshooting

Automatic classification errors are persisted in the default event table of the account. You can use the following query to access the error messages:

SELECT
  record_type,
  record:severity_text::string log_level,
  parse_json(value) error_message
  FROM log_db.log_schema.log_table
  WHERE record_type='LOG' and scope:name ='snow.automatic_sensitive_data_classification'
  ORDER BY log_level;
Copy
Language: English