Use SQL to set up sensitive data classification¶
The following sections describe how to use SQL to set up the automatic classification of sensitive data within a database. If you want to use a web interface to set up sensitive data classification, see 使用 Trust Center 设置敏感数据分类.
The basic workflow for using SQL to classify sensitive data consists of the following steps:
Create a classification profile that controls what happens during sensitive data classification.
Set the classification profile on a database or schema to automatically classify tables in the entity.
有关该工作流程的端到端示例,请参阅 示例。
关于分类配置文件¶
A classification profile defines the criteria that are used to automatically classify tables in a database. This criteria includes:
表在自动分类前应存在多长时间。
之前已分类的表应在多长时间内重新分类。
分类后,是否在列上自动设置系统和自定义标签。您可以决定是让 Snowflake 自动应用推荐的标签,还是选择先查看系统建议的标签分配,然后再由您自行应用。
在系统分类标签和用户定义的对象标签之间 映射,以便自动应用用户定义的标签。
Whether custom classifiers are used to classify data.
When a data engineer assigns the classification profile to a database, sensitive data in the tables that belong to the database is automatically classified on the schedule defined by the profile. A data engineer can assign the same classification profile to multiple databases, or create multiple classification profiles to set different classification criteria for different databases.
To use SQL to create a classification profile, run the CREATE CLASSIFICATION_PROFILE command to create an instance of the CLASSIFICATION_PROFILE class.
For an example of using the CREATE CLASSIFICATION_PROFILE command to create a classification profile, see 示例.
关于标签映射¶
You can use the classification profile to map SEMANTIC_CATEGORY system tags to one or more object tags. With this tag mapping, a column with sensitive data can be automatically assigned a user-defined tag based on its classification. The tag map can be added while creating the classification profile or later by calling the <classification_profile_name>!SET_TAG_MAP method.
Regardless of whether you are defining the tag map while creating the classification profile or after, the contents of the map are specified
as a JSON object. This JSON object contains the 'column_tag_map' key, which is an array of objects that specify a user-defined tag,
the string value of that tag, and the semantic categories to which the tag is being mapped.
以下是标签映射示例:
'tag_map': {
'column_tag_map': [
{
'tag_name':'tag_db.sch.pii',
'tag_value':'Highly Confidential',
'semantic_categories':[
'NAME',
'NATIONAL_IDENTIFIER'
]
},
{
'tag_name': 'tag_db.sch.pii',
'tag_value':'Confidential',
'semantic_categories': [
'EMAIL'
]
}
]
}
根据此映射,如果您有一列电子邮件地址,且分类过程确定该列包含这些地址,则会在包含电子邮件地址的列上设置 tag_db.sch.pii = 'Confidential' 标签。
如果您的标签映射包括多个映射标签、标签值和类别值的 JSON 对象,则 JSON 对象的顺序将决定在发生冲突时在列上设置哪个标签和值。按所需的赋值顺序从左到右指定 JSON 对象,如果是格式化 JSON,则从上到下。
小技巧
column_tag_map 字段中的每个对象只有一个必填键:tag_name。如果省略 tag_value 和 semantic_categories 键,用户定义的标签将应用于 SEMANTIC_CATEGORY 系统标签所应用的每一列,并且用户定义的标签值将与给定列的 SEMANTIC_CATEGORY 标签值相匹配。
If there is a conflict with a manually assigned tag and a tag applied by automatic classification, an error occurs. For information about tracking these errors, see 敏感数据分类故障排除.
Classify data using a subset of native semantic categories¶
By default, Snowflake classifies data into its native semantic categories whenever it identifies sensitive data. A semantic category represents a type of data, such as email addresses, credit card numbers, or social security numbers.
You can configure the classification profile to limit which types of data (semantic categories) to classify as sensitive. Snowflake classifies data only if it belongs to the subset of semantic categories that you specify in the profile.
The snowflake_semantic_categories key in a classification profile's configuration object defines the list of semantic categories
that you want classified. You can specify which data to classify in two ways:
Classify by semantic category: Specify categories like NAME or EMAIL to classify all data of that type, regardless of location.
Classify by semantic category and country: For categories with country-specific subcategories (like TAX_IDENTIFIER or PASSPORT), you can use the
country_codeskey to classify only data from specific countries. For a list of two-letter country codes and supported semantic subcategories, see 敏感数据分类的原生语义类别.
You can mix both approaches in the same classification profile. The following examples demonstrate each variation:
Example 1: Classify specific semantic categories¶
This example configures a classification profile so data is classified only if Snowflake identifies it as belonging to the semantic categories NAME and NATIONAL_IDENTIFIER. All other types of data are not classified.
CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE
my_classification_profile(
{
'minimum_object_age_for_classification_days': 0,
'snowflake_semantic_categories':
[
{'category': 'NAME'},
{'category': 'NATIONAL_IDENTIFIER'}
]
});
Example 2: Classify a category with specific country codes¶
For semantic categories that have country-specific subcategories, you can use the country_codes key to limit classification to
specific countries. The country_codes value is a two-letter country code.
This example classifies data only if Snowflake identifies that it is a tax identifier in Italy (IT) or France (FR):
CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE
my_classification_profile(
{
'minimum_object_age_for_classification_days': 0,
'snowflake_semantic_categories':
[
{
'category': 'TAX_IDENTIFIER',
'country_codes': ['IT', 'FR']
}
]
});
Example 3: Combine global and country-specific categories¶
You can combine semantic categories that do not have country-specific subcategories with categories that do. This example specifies that Snowflake always classify data belonging to the NAME category and classify the PASSPORT category if the data pertains to United States passports:
CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE
my_classification_profile(
{
'minimum_object_age_for_classification_days': 0,
'snowflake_semantic_categories':
[
{
'category': 'NAME'
},
{
'category': 'PASSPORT',
'country_codes': ['US']
}
]
});
在数据库上设置分类配置文件¶
Implement sensitive data classification by setting a classification profile on a database. After you set the classification profile on the database, all tables and views within that database are automatically monitored by sensitive data classification.
您还可以在架构上设置分类。如果您在数据库中存在的架构设置了分类配置文件,而该架构中也与分类配置文件相关联,则在该架构上设置的配置文件将会覆盖数据库设置的配置文件。
To set a classification profile, use an ALTER DATABASE or ALTER SCHEMA command to set
the CLASSIFICATION_PROFILE parameter. For example, to set a classification profile my_profile so all tables and views in the my_db
database are monitored by sensitive data classification, run the following command:
ALTER DATABASE my_db
SET CLASSIFICATION_PROFILE = 'governance_db.classify_sch.my_profile';
访问控制¶
Here are the privileges and roles that let you work with classification profiles and enable sensitive data classification.
任务 |
所需权限/角色 |
备注 |
|---|---|---|
创建分类配置文件 |
SNOWFLAKE.CLASSIFICATION_ADMIN 数据库角色 |
有关向其他角色授予此数据库角色的信息,请参阅 使用 SNOWFLAKE 数据库角色。 |
对架构的 CREATE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE 权限 |
对于要创建分类配置文件实例的架构,您需要拥有对该架构的此权限。 |
|
对数据库和架构的 USAGE 权限 |
You need privileges on the schema where you want to create the classification profile instance. |
|
在数据库/架构上设置分类配置文件 |
以下其中一项:
|
默认情况下,数据库/架构所有者拥有 EXECUTE AUTO CLASSIFICATION 权限。 |
对架构数据库的任何权限 |
如果在架构上设置分类配置文件,则至少需要对包含该架构的数据库具有一项权限。 |
|
对数据库/架构的任何权限 |
对于包含要自动分类的表的架构,您至少需要对数据库/架构拥有一项权限。EXECUTE AUTO CLASSIFICATION 权限符合这一要求。 |
|
以下其中一项:
|
有关向其他角色授予 PRIVACY_USER 实例角色的信息,请参阅 实例角色。 |
|
对账户的 APPLY TAG 权限 |
||
在分类配置文件实例上调用 方法 |
<classification_profile>!PRIVACY_USER 实例角色 |
有关向其他角色授予此实例角色的信息,请参阅 实例角色。 |
列出分类配置文件 |
<classification_profile>!PRIVACY_USER 实例角色 |
|
删除分类配置文件 |
对分类配置文件实例的 OWNERSHIP 权限 |
有关向数据工程师角色授予这些权限和数据库角色的示例,请参阅 基本示例:自动对数据库中的表进行分类。
示例¶
基本示例:自动对数据库中的表进行分类¶
完成以下步骤,自动对数据库中的表进行分类:
作为管理员,向数据工程师授予所需的 角色和权限,以自动对数据库中的表进行分类。
USE ROLE ACCOUNTADMIN; GRANT USAGE ON DATABASE mydb TO ROLE data_engineer; GRANT EXECUTE AUTO CLASSIFICATION ON DATABASE mydb TO ROLE data_engineer; GRANT DATABASE ROLE SNOWFLAKE.CLASSIFICATION_ADMIN TO ROLE data_engineer; GRANT CREATE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE ON SCHEMA mydb.sch TO ROLE data_engineer; GRANT APPLY TAG ON ACCOUNT TO ROLE data_engineer;
切换到数据工程师角色:
USE ROLE data_engineer;
将分类配置文件 创建为 CLASSIFICATION_PROFILE 类的实例:
CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE my_classification_profile( { 'minimum_object_age_for_classification_days': 0, 'maximum_classification_validity_days': 30, 'auto_tag': true, 'classify_views': false });
对实例调用 DESCRIBE 方法以确认实例的属性:
SELECT my_classification_profile!DESCRIBE();
在架构上设置分类配置文件实例,启动后台进程,以监控架构中的表,并自动对其进行敏感数据分类。
ALTER DATABASE mydb SET CLASSIFICATION_PROFILE = 'mydb.sch.my_classification_profile';
备注
在架构上设置分类配置文件与 Snowflake 开始对架构进行分类之间会有一小时的延迟。
等待一小时后,调用 SYSTEM$GET_CLASSIFICATION_RESULT 存储过程,以获取自动分类的结果。
CALL SYSTEM$GET_CLASSIFICATION_RESULT('mydb.sch.t1');
示例:使用标签图和自定义分类器¶
作为管理员,向数据工程师授予所需的 角色和权限,以自动对数据库中的表进行分类并为列设置标签。
创建分类配置文件。
CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE my_classification_profile( { 'minimum_object_age_for_classification_days': 0, 'maximum_classification_validity_days': 30, 'auto_tag': true, 'classify_views': false });
对实例调用 SET_TAG_MAP 方法,将 标签图 添加到分类配置文件。这样就可以在包含敏感数据的列上自动应用自定义标签。
CALL my_classification_profile!SET_TAG_MAP( {'column_tag_map':[ { 'tag_name':'my_db.sch1.pii', 'tag_value':'sensitive', 'semantic_categories':['NAME'] }]});
或者,您还可以在创建分类配置文件时添加此标签图。
调用 SET_CUSTOM_CLASSIFIERS 方法,将 自定义分类器 添加到分类配置文件。这样就能根据用户定义的语义和隐私类别对敏感数据进行自动分类。
CALL my_classification_profile!set_custom_classifiers( { 'medical_codes': medical_codes!list(), 'finance_codes': finance_codes!list() });
或者,您还可以在创建分类配置文件时添加自定义分类器。
对实例调用 DESCRIBE 方法,确认标签图和自定义分类器已添加到分类配置文件。
SELECT my_classification_profile!DESCRIBE();
在数据库上设置分类配置文件实例。
ALTER DATABASE mydb SET CLASSIFICATION_PROFILE = 'mydb.sch.my_classification_profile';
将掩码策略附加到
tag_db.sch.pii标签,启用基于标签的掩码。ALTER TAG tag_db.sch.pii SET MASKING POLICY pii_mask;
示例:在启用自动分类前测试分类配置文件¶
作为管理员,向数据工程师授予所需的 角色和权限,以自动对架构中的表进行分类并为列设置标签。
使用标签图和自定义分类器创建分类配置文件:
CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE my_classification_profile( { 'minimum_object_age_for_classification_days':0, 'auto_tag':true, 'tag_map': { 'column_tag_map':[ { 'tag_name':'tag_db.sch.pii', 'tag_value':'highly sensitive', 'semantic_categories':['NAME','NATIONAL_IDENTIFIER'] }, { 'tag_name':'tag_db.sch.pii', 'tag_value':'sensitive', 'semantic_categories':['EMAIL','MEDICAL_CODE'] } ] }, 'classify_views': false, 'custom_classifiers': { 'medical_codes': medical_codes!list(), 'finance_codes': finance_codes!list() } } );
在启用自动分类之前,调用 SYSTEM$CLASSIFY 存储过程以测试
table1表上的标记映射。CALL SYSTEM$CLASSIFY( 'db.sch.table1', 'db.sch.my_classification_profile' );
输出中的
tags键包含关于标签是否已设置的详细信息(如果已设置,则为true,否则为false)、已设置标签的名称以及标签值:{ "classification_profile_config": { "classification_profile_name": "db.schema.my_classification_profile" }, "classification_result": { "EMAIL": { "alternates": [], "recommendation": { "confidence": "HIGH", "coverage": 1, "details": [], "privacy_category": "IDENTIFIER", "semantic_category": "EMAIL", "tags": [ { "tag_applied": true, "tag_name": "snowflake.core.semantic_category", "tag_value": "EMAIL" }, { "tag_applied": true, "tag_name": "snowflake.core.privacy_category", "tag_value": "IDENTIFIER" }, { "tag_applied": true, "tag_name": "tag_db.sch.pii", "tag_value": "sensitive" } ] }, "valid_value_ratio": 1 }, "FIRST_NAME": { "alternates": [], "recommendation": { "confidence": "HIGH", "coverage": 1, "details": [], "privacy_category": "IDENTIFIER", "semantic_category": "NAME", "tags": [ { "tag_applied": true, "tag_name": "snowflake.core.semantic_category", "tag_value": "NAME" }, { "tag_applied": true, "tag_name": "snowflake.core.privacy_category", "tag_value": "IDENTIFIER" }, { "tag_applied": true, "tag_name": "tag_db.sch.pii", "tag_value": "highly sensitive" } ] }, "valid_value_ratio": 1 } } }
在验证基于分类配置文件的自动分类会产生预期结果后,在数据库上设置分类配置文件实例。
ALTER DATABASE mydb SET CLASSIFICATION_PROFILE = 'mydb.sch.my_classification_profile';