Use SQL to set up sensitive data classification¶
The following sections describe how to use SQL to set up the automatic classification of sensitive data within a database. If you want to use a web interface to set up sensitive data classification, see 使用 Trust Center 设置敏感数据分类.
The basic workflow for using SQL to classify sensitive data consists of the following steps:
Create a classification profile that controls what happens during sensitive data classification.
Set the classification profile on a database or schema to automatically classify tables in the entity.
有关该工作流程的端到端示例,请参阅 示例。
关于分类配置文件¶
A classification profile defines the criteria that are used to automatically classify tables in a database. This criteria includes:
表在自动分类前应存在多长时间。
之前已分类的表应在多长时间内重新分类。
分类后,是否在列上自动设置系统和自定义标签。您可以决定是让 Snowflake 自动应用推荐的标签,还是选择先查看系统建议的标签分配,然后再由您自行应用。
在系统分类标签和用户定义的对象标签之间 映射,以便自动应用用户定义的标签。
Whether custom classifiers are used to classify data.
When a data engineer assigns the classification profile to a database, sensitive data in the tables that belong to the database is automatically classified on the schedule defined by the profile. A data engineer can assign the same classification profile to multiple databases, or create multiple classification profiles to set different classification criteria for different databases.
To use SQL to create a classification profile, run the CREATE CLASSIFICATION_PROFILE command to create an instance of the CLASSIFICATION_PROFILE class.
For an example of using the CREATE CLASSIFICATION_PROFILE command to create a classification profile, see 示例.
关于标签映射¶
You can use the classification profile to map SEMANTIC_CATEGORY system tags to one or more object tags. With this tag mapping, a column with sensitive data can be automatically assigned a user-defined tag based on its classification. The tag map can be added while creating the classification profile or later by calling the <classification_profile_name>!SET_TAG_MAP method.
Regardless of whether you are defining the tag map while creating the classification profile or after, the contents of the map are specified
as a JSON object. This JSON object contains the 'column_tag_map' key, which is an array of objects that specify a user-defined tag,
the string value of that tag, and the semantic categories to which the tag is being mapped.
以下是标签映射示例:
'tag_map': {
'column_tag_map': [
{
'tag_name':'tag_db.sch.pii',
'tag_value':'Highly Confidential',
'semantic_categories':[
'NAME',
'NATIONAL_IDENTIFIER'
]
},
{
'tag_name': 'tag_db.sch.pii',
'tag_value':'Confidential',
'semantic_categories': [
'EMAIL'
]
}
]
}
根据此映射,如果您有一列电子邮件地址,且分类过程确定该列包含这些地址,则会在包含电子邮件地址的列上设置 tag_db.sch.pii = 'Confidential' 标签。
如果您的标签映射包括多个映射标签、标签值和类别值的 JSON 对象,则 JSON 对象的顺序将决定在发生冲突时在列上设置哪个标签和值。按所需的赋值顺序从左到右指定 JSON 对象,如果是格式化 JSON,则从上到下。
小技巧
column_tag_map 字段中的每个对象只有一个必填键:tag_name。如果省略 tag_value 和 semantic_categories 键,用户定义的标签将应用于 SEMANTIC_CATEGORY 系统标签所应用的每一列,并且用户定义的标签值将与给定列的 SEMANTIC_CATEGORY 标签值相匹配。
If there is a conflict with a manually assigned tag and a tag applied by automatic classification, an error occurs. For information about tracking these errors, see 敏感数据分类故障排除.
Set a classification profile on a database¶
Implement sensitive data classification by setting a classification profile on a database. After you set the classification profile on the database, all tables and views within that database are automatically monitored by sensitive data classification.
You can also set a classification on a schema. If you set a classification profile on a schema that exists within a database that is also associated with a classification profile, the profile set on the schema overrides the profile set on the database.
To set a classification profile, use an ALTER DATABASE or ALTER SCHEMA command to set
the CLASSIFICATION_PROFILE parameter. For example, to set a classification profile my_profile so all tables and views in the my_db
database are monitored by sensitive data classification, run the following command:
ALTER DATABASE my_db
SET CLASSIFICATION_PROFILE = 'governance_db.classify_sch.my_profile';
访问控制¶
Here are the privileges and roles that let you work with classification profiles and enable sensitive data classification.
任务 |
所需权限/角色 |
备注 |
|---|---|---|
创建分类配置文件 |
SNOWFLAKE.CLASSIFICATION_ADMIN 数据库角色 |
有关向其他角色授予此数据库角色的信息,请参阅 使用 SNOWFLAKE 数据库角色。 |
对架构的 CREATE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE 权限 |
对于要创建分类配置文件实例的架构,您需要拥有对该架构的此权限。 |
|
USAGE on database and schema |
You need privileges on the schema where you want to create the classification profile instance. |
|
在数据库/架构上设置分类配置文件 |
以下其中一项:
|
默认情况下,数据库/架构所有者拥有 EXECUTE AUTO CLASSIFICATION 权限。 |
Any privilege on schema's database |
If setting a classification profile on a schema, you need at least one privilege on the database that contains that schema. |
|
对数据库/架构的任何权限 |
对于包含要自动分类的表的架构,您至少需要对数据库/架构拥有一项权限。EXECUTE AUTO CLASSIFICATION 权限符合这一要求。 |
|
以下其中一项:
|
有关向其他角色授予 PRIVACY_USER 实例角色的信息,请参阅 实例角色。 |
|
对账户的 APPLY TAG 权限 |
||
在分类配置文件实例上调用 方法 |
<classification_profile>!PRIVACY_USER 实例角色 |
有关向其他角色授予此实例角色的信息,请参阅 实例角色。 |
列出分类配置文件 |
<classification_profile>!PRIVACY_USER 实例角色 |
|
删除分类配置文件 |
对分类配置文件实例的 OWNERSHIP 权限 |
有关向数据工程师角色授予这些权限和数据库角色的示例,请参阅 Basic example: Automatically classifying tables in a database。
示例¶
Basic example: Automatically classifying tables in a database¶
Complete these steps to automatically classify a table in the database:
As an administrator, give the data engineer the roles and privileges they need to automatically classify tables in a database.
USE ROLE ACCOUNTADMIN; GRANT USAGE ON DATABASE mydb TO ROLE data_engineer; GRANT EXECUTE AUTO CLASSIFICATION ON DATABASE mydb TO ROLE data_engineer; GRANT DATABASE ROLE SNOWFLAKE.CLASSIFICATION_ADMIN TO ROLE data_engineer; GRANT CREATE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE ON SCHEMA mydb.sch TO ROLE data_engineer; GRANT APPLY TAG ON ACCOUNT TO ROLE data_engineer;
切换到数据工程师角色:
USE ROLE data_engineer;
将分类配置文件 创建为 CLASSIFICATION_PROFILE 类的实例:
CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE my_classification_profile( { 'minimum_object_age_for_classification_days': 0, 'maximum_classification_validity_days': 30, 'auto_tag': true, 'classify_views': true });
对实例调用 DESCRIBE 方法以确认实例的属性:
SELECT my_classification_profile!DESCRIBE();
在架构上设置分类配置文件实例,启动后台进程,以监控架构中的表,并自动对其进行敏感数据分类。
ALTER DATABASE mydb SET CLASSIFICATION_PROFILE = 'mydb.sch.my_classification_profile';
备注
在架构上设置分类配置文件与 Snowflake 开始对架构进行分类之间会有一小时的延迟。
等待一小时后,调用 SYSTEM$GET_CLASSIFICATION_RESULT 存储过程,以获取自动分类的结果。
CALL SYSTEM$GET_CLASSIFICATION_RESULT('mydb.sch.t1');
示例:使用标签图和自定义分类器¶
As an administrator, give the data engineer the roles and privileges they need to automatically classify tables in a database and set tags on columns.
创建分类配置文件。
CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE my_classification_profile( { 'minimum_object_age_for_classification_days': 0, 'maximum_classification_validity_days': 30, 'auto_tag': true, 'classify_views': true });
对实例调用 SET_TAG_MAP 方法,将 标签图 添加到分类配置文件。这样就可以在包含敏感数据的列上自动应用自定义标签。
CALL my_classification_profile!SET_TAG_MAP( {'column_tag_map':[ { 'tag_name':'my_db.sch1.pii', 'tag_value':'sensitive', 'semantic_categories':['NAME'] }]});
或者,您还可以在创建分类配置文件时添加此标签图。
调用 SET_CUSTOM_CLASSIFIERS 方法,将 自定义分类器 添加到分类配置文件。这样就能根据用户定义的语义和隐私类别对敏感数据进行自动分类。
CALL my_classification_profile!set_custom_classifiers( { 'medical_codes': medical_codes!list(), 'finance_codes': finance_codes!list() });
或者,您还可以在创建分类配置文件时添加自定义分类器。
对实例调用 DESCRIBE 方法,确认标签图和自定义分类器已添加到分类配置文件。
SELECT my_classification_profile!DESCRIBE();
Set the classification profile instance on the database.
ALTER DATABASE mydb SET CLASSIFICATION_PROFILE = 'mydb.sch.my_classification_profile';
将掩码策略附加到
tag_db.sch.pii标签,启用基于标签的掩码。ALTER TAG tag_db.sch.pii SET MASKING POLICY pii_mask;
示例:在启用自动分类前测试分类配置文件¶
作为管理员,向数据工程师授予所需的 角色和权限,以自动对架构中的表进行分类并为列设置标签。
使用标签图和自定义分类器创建分类配置文件:
CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE my_classification_profile( { 'minimum_object_age_for_classification_days':0, 'auto_tag':true, 'tag_map': { 'column_tag_map':[ { 'tag_name':'tag_db.sch.pii', 'tag_value':'highly sensitive', 'semantic_categories':['NAME','NATIONAL_IDENTIFIER'] }, { 'tag_name':'tag_db.sch.pii', 'tag_value':'sensitive', 'semantic_categories':['EMAIL','MEDICAL_CODE'] } ] }, 'classify_views': true 'custom_classifiers': { 'medical_codes': medical_codes!list(), 'finance_codes': finance_codes!list() } } );
在启用自动分类之前,调用 SYSTEM$CLASSIFY 存储过程以测试
table1表上的标记映射。CALL SYSTEM$CLASSIFY( 'db.sch.table1', 'db.sch.my_classification_profile' );
输出中的
tags键包含关于标签是否已设置的详细信息(如果已设置,则为true,否则为false)、已设置标签的名称以及标签值:{ "classification_profile_config": { "classification_profile_name": "db.schema.my_classification_profile" }, "classification_result": { "EMAIL": { "alternates": [], "recommendation": { "confidence": "HIGH", "coverage": 1, "details": [], "privacy_category": "IDENTIFIER", "semantic_category": "EMAIL", "tags": [ { "tag_applied": true, "tag_name": "snowflake.core.semantic_category", "tag_value": "EMAIL" }, { "tag_applied": true, "tag_name": "snowflake.core.privacy_category", "tag_value": "IDENTIFIER" }, { "tag_applied": true, "tag_name": "tag_db.sch.pii", "tag_value": "sensitive" } ] }, "valid_value_ratio": 1 }, "FIRST_NAME": { "alternates": [], "recommendation": { "confidence": "HIGH", "coverage": 1, "details": [], "privacy_category": "IDENTIFIER", "semantic_category": "NAME", "tags": [ { "tag_applied": true, "tag_name": "snowflake.core.semantic_category", "tag_value": "NAME" }, { "tag_applied": true, "tag_name": "snowflake.core.privacy_category", "tag_value": "IDENTIFIER" }, { "tag_applied": true, "tag_name": "tag_db.sch.pii", "tag_value": "highly sensitive" } ] }, "valid_value_ratio": 1 } } }
Having verified that automatic classification based on the classification profile will have the desired result, set the classification profile instance on the database.
ALTER DATABASE mydb SET CLASSIFICATION_PROFILE = 'mydb.sch.my_classification_profile';