Classify sensitive data automatically¶
Automatic sensitive data classification is a serverless feature that enables the automatic detection and tagging of sensitive data. The feature continuously monitors tables within a specific schema and classifies their columns using native and custom classification categories.
Automatic sensitive data classification lets data engineers and stewards do the following:
Demonstrate how automatically classifying tables meets internal governance and compliance needs.
Ensure sensitive data is properly tagged.
Ensure the right access controls are in place to protect the sensitive data.
Get started¶
The basic workflow to automatically classify sensitive data consists of the following:
Create a classification profile that controls how often sensitive data in a schema is automatically classified, including whether system tags should be automatically applied after classification.
Optionally, use the classification profile to map user-defined tags to system tags so a column with sensitive data can be associated with a user-defined tag based on its classification. You can add the tag mapping while creating the classification profile or after creating it.
Optionally, add a custom classifier to the classification profile so sensitive data can be automatically classified with user-defined semantic and privacy categories. You can add custom classifiers while creating the classification profile or after creating it.
Set the classification profile on a schema so that tables in the schema get automatically classified.
For end-to-end examples of this workflow, see Examples.
About classification profiles¶
A data engineer creates a classification profile by creating an instance of the CLASSIFICATION_PROFILE class to define the criteria that are used to automatically classify tables in a schema. This criteria includes:
How long a table should exist before automatically classifying it.
How long before previously classified tables should be reclassified.
Whether system and custom tags should be set on columns after the automatic classification.
A mapping between system classification tags and user-defined object tags so the user-defined tags can be applied automatically.
When the data engineer assigns the classification profile to a schema, sensitive data in the tables of the schema are automatically classified on the schedule defined by the profile. A data engineer can assign the same classification profile to multiple schemas, or can create multiple classification profiles if there is a need to set different classification criteria for different schemas.
The process of automatically classifying data requires access to the raw data in the table. The raw data includes tables that have a masking policy assigned to a column. However, Snowflake preserves the intention of regulating access to protected data by using an internal role to automatically classify data. The internal role can access data protected by a masking policy, but this role is not accessible to users.
For an example of using the CREATE CLASSIFICATION_PROFILE command to create a classification profile, see Examples.
About tag mapping¶
You can use the classification profile to map SEMANTIC_CATEGORY system tags to one or more object tags. This tag mapping allows a column with sensitive data to be automatically assigned a user-defined tag based on its classification. The tag map can be added while creating the classification profile or later by calling the <classification_profile_name>!SET_TAG_MAP method.
Because user-defined object tags can have a masking policy associated with them, you can use a tag map to enable automatic tag-based masking, which facilitates column protection aligned with the SEMANTIC_CATEGORY system tag.
Regardless of whether you are defining the tag map while creating the classification profile or after, the contents of the map are specified
as a JSON object. This JSON object contains the 'column_tag_map'
key, which is an array of objects that specify a user-defined tag,
the string value of that tag, and the semantic categories to which the tag is being mapped. After the tag map is associated with a
classification profile and you automatically classify tables in a schema, the tag is assigned to the columns that correspond to the
semantic categories.
The following is an example of a tag map:
'tag_map': {
'column_tag_map': [
{
'tag_name':'tag_db.sch.pii',
'tag_value':'Highly Confidential',
'semantic_categories':[
'NAME',
'NATIONAL_IDENTIFIER'
]
},
{
'tag_name': 'tag_db.sch.pii',
'tag_value':'Confidential',
'semantic_categories': [
'EMAIL'
]
}
]
}
Based on this mapping, if you have a column of email addresses and the classification process determines that the column contains these
addresses, the tag_db.sch.pii = 'Confidential'
tag is set on the column containing the email addresses.
If your tag map includes multiple JSON objects that map tags, tag values, and category values, the order of the JSON objects determines which tag and value to set on the column if there is a conflict. Specify the JSON objects in the desired assignment order from left to right, or top to bottom if you are formatting JSON.
Tip
Each object in the 'column_tag_map'
field has only has one required key: 'tag_name'
. If you don’t specify a value for
the user-defined tag, the classification process applies the recommended SEMANTIC_CATEGORY tag’s value.
If there is a conflict with a manually assigned tag and a tag applied by automatic classification, an error occurs. For information about tracking these errors, see Troubleshooting.
View results of automatic classification¶
You can view the results of automatic classification in the following ways:
Call the SYSTEM$GET_CLASSIFICATION_RESULT stored procedure. For example:
CALL SYSTEM$GET_CLASSIFICATION_RESULT('mydb.sch.t1');
Use a role that is granted the SNOWFLAKE.GOVERNANCE_VIEWER database role to query the DATA_CLASSIFICATION_LATEST view. For example:
SELECT * FROM snowflake.account_usage.data_classification_latest;
Limitations¶
Classification profiles cannot be set on a reader account.
Only one classification profile can be set on a schema.
The same classification profile cannot be set on more than 10,000 schemas.
A maximum of 10,000 tables can be classified in a schema.
Access control¶
This section describes the privileges and roles that let you work with classification profiles and enable automatic sensitive data classification.
Task |
Required privilege/role |
Notes |
---|---|---|
Create a classification profile |
SNOWFLAKE.CLASSIFICATION_ADMIN database role |
For information about granting this database role to other roles, see Using SNOWFLAKE database roles. |
CREATE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE on Schema |
You need this privilege on the schema where you want to create the classification profile instance. |
|
Call methods on a classification profile instance |
<classification_profile>!PRIVACY_USER instance role |
For information about granting this instance role to other roles, see Instance roles. |
Set the classification profile on a schema |
EXECUTE AUTO CLASSIFICATION on Account |
This privilege is granted to the ACCOUNTADMIN by default, which can be used to grant this privilege to other roles. |
MODIFY on Schema |
You need this privilege on the schema that contains the table that you want to automatically classify. |
|
APPLY TAG on Account |
||
List classification profiles |
<classification_profile>!PRIVACY_USER instance role |
|
Drop classification profiles |
OWNERSHIP on classification profile instance |
For an example of granting these privileges and database roles to the role of a data engineer, see Basic example: Automatically classifying tables in a schema.
Cost of automatically classifying sensitive data¶
Automatic sensitive data classification consumes credits as it uses serverless compute resources to classify tables in the schema. For more information about pricing for this consumption, see Table 5 in the Snowflake Service Consumption Table.
You can query views in the ACCOUNT_USAGE and ORGANIZATION_USAGE schemas to determine how much was spent on automatically classifying sensitive data. To monitor credit consumption, query the following views:
- METERING_HISTORY view (ACCOUNT_USAGE)
Lets you retrieve the hourly cost of automatic classification by focusing on
SENSITIVE_DATA_CLASSIFICATION
in theSERVICE_TYPE
column. For example:SELECT service_type, start_time, end_time, entity_id, name, credits_used_compute, credits_used_cloud_services, credits_used, budget_id FROM snowflake.account_usage.metering_history WHERE service_type = 'SENSITIVE_DATA_CLASSIFICATION';
- METERING_DAILY_HISTORY view (ACCOUNT_USAGE and ORGANIZATION_USAGE)
Lets you retrieve the daily cost of automatic classification by focusing on
SENSITIVE_DATA_CLASSIFICATION
in theSERVICE_TYPE
column. For example:SELECT service_type, usage_date, credits_used_compute, credits_used_cloud_services, credits_used FROM snowflake.account_usage.metering_daily_history WHERE service_type = 'SENSITIVE_DATA_CLASSIFICATION';
- USAGE_IN_CURRENCY_DAILY (ORGANIZATION_USAGE)
Lets you retrieve the daily cost of automatic classification by focusing on
SENSITIVE_DATA_CLASSIFICATION
in theSERVICE_TYPE
column. Use this view to determine the cost in currency, not credits.
Examples¶
Basic example: Automatically classifying tables in a schema¶
Complete these steps to automatically classify a table in the schema:
As an administrator, give the data engineer the roles and privileges they need to automatically classify tables in a schema.
USE ROLE ACCOUNTADMIN; GRANT USAGE ON DATABASE mydb TO ROLE data_engineer; GRANT MODIFY ON SCHEMA mydb.sch TO ROLE data_engineer; GRANT SELECT ON TABLE mydb.sch.t1 TO ROLE data_engineer; GRANT DATABASE ROLE SNOWFLAKE.CLASSIFICATION_ADMIN TO ROLE data_engineer; GRANT CREATE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE ON SCHEMA mydb.sch TO ROLE data_engineer; GRANT EXECUTE AUTO CLASSIFICATION ON ACCOUNT TO ROLE data_engineer; GRANT APPLY TAG ON ACCOUNT TO ROLE data_engineer;
Switch to the data engineer role:
USE ROLE data_engineer;
Create the classification profile as an instance of the CLASSIFICATION_PROFILE class:
CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE my_classification_profile( { 'minimum_object_age_for_classification_days': 1, 'maximum_classification_validity_days': 30, 'auto_tag': true });
Call the DESCRIBE method on the instance to confirm its properties:
SELECT my_classification_profile!DESCRIBE();
Set the classification profile instance on the schema, which starts the background process of monitoring tables in the schema and automatically classifying them for sensitive data.
ALTER SCHEMA mydb.sch SET CLASSIFICATION_PROFILE = 'mydb.sch.my_classification_profile';
Call the SYSTEM$GET_CLASSIFICATION_RESULT stored procedure to obtain the results of the automatic classification.
CALL SYSTEM$GET_CLASSIFICATION_RESULT('mydb.sch.t1');
If you no longer need to automatically classify tables in a schema, unset the classification profile from the schema:
ALTER SCHEMA mydb.sch UNSET CLASSIFICATION_PROFILE;
Drop any classification profiles that are not needed using the DROP CLASSIFICATION_PROFILE command.
Example: Using a tag map and custom classifiers¶
As an administrator, give the data engineer the roles and privileges they need to automatically classify tables in a schema and set tags on columns.
Create the classification profile.
CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE my_classification_profile( { 'minimum_object_age_for_classification_days': 1, 'maximum_classification_validity_days': 30, 'auto_tag': true });
Call the SET_TAG_MAP method on the instance to add a tag map to the classification profile. This allows custom tags to be automatically applied on columns that contain sensitive data.
CALL my_classification_profile!SET_TAG_MAP( {'column_tag_map':[ { 'tag_name':'my_db.sch1.pii', 'tag_value':'sensitive', 'semantic_categories':['NAME'] }]});
Alternatively, you could have added this tag map when you created the classification profile.
Call the SET_CUSTOM_CLASSIFIERS method to add custom classifiers to the classification profile. This allows sensitive data to be automatically classified with user-defined semantic and privacy categories.
CALL my_classification_profile!set_custom_classifiers( { 'medical_codes': medical_codes!list(), 'finance_codes': finance_codes!list() });
Alternatively, you could have added the custom classifiers when you created the classification profile.
Call the DESCRIBE method on the instance to confirm that the tag map and custom classifiers have been added to the classification profile.
SELECT my_classification_profile!DESCRIBE();
Set the classification profile instance on the schema.
ALTER SCHEMA mydb.sch SET CLASSIFICATION_PROFILE = 'mydb.sch.my_classification_profile';
Attach a masking policy to the
tag_db.sch.pii
tag to enable tag-based masking.ALTER TAG tag_db.sch.pii SET MASKING POLICY pii_mask;
Example: Testing a classification profile before enabling automatic classification¶
As an administrator, give the data engineer the roles and privileges they need to automatically classify tables in a schema and set tags on columns.
Create the classification profile with a tag map and custom classifiers:
CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE my_classification_profile2( { 'minimum_object_age_for_classification_days':1, 'auto_tag':true, 'tag_map': { 'column_tag_map':[ { 'tag_name':'tag_db.sch.pii', 'tag_value':'highly sensitive', 'semantic_categories':['NAME','NATIONAL_IDENTIFIER'] }, { 'tag_name':'tag_db.sch.pii', 'tag_value':'sensitive', 'semantic_categories':['EMAIL','MEDICAL_CODE'] } ] }, 'custom_classifiers': { 'medical_codes': medical_codes!list(), 'finance_codes': finance_codes!list() } } );
Call the SYSTEM$CLASSIFY stored procedure to test the tag mappings on the
table1
table before enabling automatic classification.CALL SYSTEM$CLASSIFY( 'db.sch.table1', 'db.sch.my_classification_profile' );
The
tags
key in the output contains the details about whether the tag was set (true
if set,false
otherwise), the name of the tag that was set, and the value of the tag:{ "classification_profile_config": { "classification_profile_name": "db.schema.my_classification_profile" }, "classification_result": { "EMAIL": { "alternates": [], "recommendation": { "confidence": "HIGH", "coverage": 1, "details": [], "privacy_category": "IDENTIFIER", "semantic_category": "EMAIL", "tags": [ { "tag_applied": true, "tag_name": "snowflake.core.semantic_category", "tag_value": "EMAIL" }, { "tag_applied": true, "tag_name": "snowflake.core.privacy_category", "tag_value": "IDENTIFIER" }, { "tag_applied": true, "tag_name": "tag_db.sch.pii", "tag_value": "sensitive" } ] }, "valid_value_ratio": 1 }, "FIRST_NAME": { "alternates": [], "recommendation": { "confidence": "HIGH", "coverage": 1, "details": [], "privacy_category": "IDENTIFIER", "semantic_category": "NAME", "tags": [ { "tag_applied": true, "tag_name": "snowflake.core.semantic_category", "tag_value": "NAME" }, { "tag_applied": true, "tag_name": "snowflake.core.privacy_category", "tag_value": "IDENTIFIER" }, { "tag_applied": true, "tag_name": "tag_db.sch.pii", "tag_value": "highly sensitive" } ] }, "valid_value_ratio": 1 } } }
Having verified that automatic classification based on the classification profile will have the desired result, set the classification profile instance on the schema.
ALTER SCHEMA mydb.sch SET CLASSIFICATION_PROFILE = 'mydb.sch.my_classification_profile';
Troubleshooting¶
Automatic classification errors are persisted in the default event table of the account. You can use the following query to access the error messages:
SELECT
record_type,
record:severity_text::string log_level,
parse_json(value) error_message
FROM log_db.log_schema.log_table
WHERE record_type='LOG' and scope:name ='snow.automatic_sensitive_data_classification'
ORDER BY log_level;