Use SQL to set up sensitive data classification

The following sections describe how to use SQL to set up the automatic classification of sensitive data within a database. If you want to use a web interface to set up sensitive data classification, see 使用 Trust Center 设置敏感数据分类.

The basic workflow for using SQL to classify sensitive data consists of the following steps:

  1. Create a classification profile that controls what happens during sensitive data classification.

  2. Set the classification profile on a database or schema to automatically classify tables in the entity.

有关该工作流程的端到端示例,请参阅 示例

关于分类配置文件

A classification profile defines the criteria that are used to automatically classify tables in a database. This criteria includes:

  • 表在自动分类前应存在多长时间。

  • 之前已分类的表应在多长时间内重新分类。

  • 分类后,是否在列上自动设置系统和自定义标签。您可以决定是让 Snowflake 自动应用推荐的标签,还是选择先查看系统建议的标签分配,然后再由您自行应用。

  • 在系统分类标签和用户定义的对象标签之间 映射,以便自动应用用户定义的标签。

  • Whether custom classifiers are used to classify data.

When a data engineer assigns the classification profile to a database, sensitive data in the tables that belong to the database is automatically classified on the schedule defined by the profile. A data engineer can assign the same classification profile to multiple databases, or create multiple classification profiles to set different classification criteria for different databases.

To use SQL to create a classification profile, run the CREATE CLASSIFICATION_PROFILE command to create an instance of the CLASSIFICATION_PROFILE class.

For an example of using the CREATE CLASSIFICATION_PROFILE command to create a classification profile, see 示例.

关于标签映射

You can use the classification profile to map SEMANTIC_CATEGORY system tags to one or more object tags. With this tag mapping, a column with sensitive data can be automatically assigned a user-defined tag based on its classification. The tag map can be added while creating the classification profile or later by calling the <classification_profile_name>!SET_TAG_MAP method.

Regardless of whether you are defining the tag map while creating the classification profile or after, the contents of the map are specified as a JSON object. This JSON object contains the 'column_tag_map' key, which is an array of objects that specify a user-defined tag, the string value of that tag, and the semantic categories to which the tag is being mapped.

以下是标签映射示例:

'tag_map': {
  'column_tag_map': [
    {
      'tag_name':'tag_db.sch.pii',
      'tag_value':'Highly Confidential',
      'semantic_categories':[
        'NAME',
        'NATIONAL_IDENTIFIER'
      ]
    },
    {
      'tag_name': 'tag_db.sch.pii',
      'tag_value':'Confidential',
      'semantic_categories': [
        'EMAIL'
      ]
    }
  ]
}
Copy

根据此映射,如果您有一列电子邮件地址,且分类过程确定该列包含这些地址,则会在包含电子邮件地址的列上设置 tag_db.sch.pii = 'Confidential' 标签。

如果您的标签映射包括多个映射标签、标签值和类别值的 JSON 对象,则 JSON 对象的顺序将决定在发生冲突时在列上设置哪个标签和值。按所需的赋值顺序从左到右指定 JSON 对象,如果是格式化 JSON,则从上到下。

小技巧

column_tag_map 字段中的每个对象只有一个必填键:tag_name。如果省略 tag_valuesemantic_categories 键,用户定义的标签将应用于 SEMANTIC_CATEGORY 系统标签所应用的每一列,并且用户定义的标签值将与给定列的 SEMANTIC_CATEGORY 标签值相匹配。

If there is a conflict with a manually assigned tag and a tag applied by automatic classification, an error occurs. For information about tracking these errors, see 敏感数据分类故障排除.

Set a classification profile on a database

Implement sensitive data classification by setting a classification profile on a database. After you set the classification profile on the database, all tables and views within that database are automatically monitored by sensitive data classification.

You can also set a classification on a schema. If you set a classification profile on a schema that exists within a database that is also associated with a classification profile, the profile set on the schema overrides the profile set on the database.

To set a classification profile, use an ALTER DATABASE or ALTER SCHEMA command to set the CLASSIFICATION_PROFILE parameter. For example, to set a classification profile my_profile so all tables and views in the my_db database are monitored by sensitive data classification, run the following command:

ALTER DATABASE my_db
  SET CLASSIFICATION_PROFILE = 'governance_db.classify_sch.my_profile';
Copy

访问控制

Here are the privileges and roles that let you work with classification profiles and enable sensitive data classification.

任务

所需权限/角色

备注

创建分类配置文件

SNOWFLAKE.CLASSIFICATION_ADMIN 数据库角色

有关向其他角色授予此数据库角色的信息,请参阅 使用 SNOWFLAKE 数据库角色

对架构的 CREATE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE 权限

对于要创建分类配置文件实例的架构,您需要拥有对该架构的此权限。

USAGE on database and schema

You need privileges on the schema where you want to create the classification profile instance.

在数据库/架构上设置分类配置文件

以下其中一项:

  • 对账户的 EXECUTE AUTO CLASSIFICATION 权限

  • 对数据库/架构的 EXECUTE AUTO CLASSIFICATION 权限

默认情况下,数据库/架构所有者拥有 EXECUTE AUTO CLASSIFICATION 权限。

Any privilege on schema's database

If setting a classification profile on a schema, you need at least one privilege on the database that contains that schema.

对数据库/架构的任何权限

对于包含要自动分类的表的架构,您至少需要对数据库/架构拥有一项权限。EXECUTE AUTO CLASSIFICATION 权限符合这一要求。

以下其中一项:

  • 对分类配置文件实例的 OWNERSHIP 权限。

  • 分类配置文件上的 <classification_profile>!PRIVACY_USER 实例角色。

有关向其他角色授予 PRIVACY_USER 实例角色的信息,请参阅 实例角色

对账户的 APPLY TAG 权限

在分类配置文件实例上调用 方法

<classification_profile>!PRIVACY_USER 实例角色

有关向其他角色授予此实例角色的信息,请参阅 实例角色

列出分类配置文件

<classification_profile>!PRIVACY_USER 实例角色

删除分类配置文件

对分类配置文件实例的 OWNERSHIP 权限

有关向数据工程师角色授予这些权限和数据库角色的示例,请参阅 Basic example: Automatically classifying tables in a database

示例

Basic example: Automatically classifying tables in a database

Complete these steps to automatically classify a table in the database:

  1. As an administrator, give the data engineer the roles and privileges they need to automatically classify tables in a database.

    USE ROLE ACCOUNTADMIN;
    
    GRANT USAGE ON DATABASE mydb TO ROLE data_engineer;
    GRANT EXECUTE AUTO CLASSIFICATION ON DATABASE mydb TO ROLE data_engineer;
    
    GRANT DATABASE ROLE SNOWFLAKE.CLASSIFICATION_ADMIN TO ROLE data_engineer;
    GRANT CREATE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE ON SCHEMA mydb.sch TO ROLE data_engineer;
    
    GRANT APPLY TAG ON ACCOUNT TO ROLE data_engineer;
    
    Copy
  2. 切换到数据工程师角色:

    USE ROLE data_engineer;
    
    Copy
  3. 将分类配置文件 创建为 CLASSIFICATION_PROFILE 类的实例:

    CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE
      my_classification_profile(
        {
          'minimum_object_age_for_classification_days': 0,
          'maximum_classification_validity_days': 30,
          'auto_tag': true,
          'classify_views': true
        });
    
    Copy
  4. 对实例调用 DESCRIBE 方法以确认实例的属性:

    SELECT my_classification_profile!DESCRIBE();
    
    Copy
  5. 在架构上设置分类配置文件实例,启动后台进程,以监控架构中的表,并自动对其进行敏感数据分类。

    ALTER DATABASE mydb
     SET CLASSIFICATION_PROFILE = 'mydb.sch.my_classification_profile';
    
    Copy

    备注

    在架构上设置分类配置文件与 Snowflake 开始对架构进行分类之间会有一小时的延迟。

  6. 等待一小时后,调用 SYSTEM$GET_CLASSIFICATION_RESULT 存储过程,以获取自动分类的结果。

    CALL SYSTEM$GET_CLASSIFICATION_RESULT('mydb.sch.t1');
    
    Copy

示例:使用标签图和自定义分类器

  1. As an administrator, give the data engineer the roles and privileges they need to automatically classify tables in a database and set tags on columns.

  2. 创建分类配置文件。

    CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE
      my_classification_profile(
        {
          'minimum_object_age_for_classification_days': 0,
          'maximum_classification_validity_days': 30,
          'auto_tag': true,
          'classify_views': true
        });
    
    Copy
  3. 对实例调用 SET_TAG_MAP 方法,将 标签图 添加到分类配置文件。这样就可以在包含敏感数据的列上自动应用自定义标签。

    CALL my_classification_profile!SET_TAG_MAP(
      {'column_tag_map':[
        {
          'tag_name':'my_db.sch1.pii',
          'tag_value':'sensitive',
          'semantic_categories':['NAME']
        }]});
    
    Copy

    或者,您还可以在创建分类配置文件时添加此标签图。

  4. 调用 SET_CUSTOM_CLASSIFIERS 方法,将 自定义分类器 添加到分类配置文件。这样就能根据用户定义的语义和隐私类别对敏感数据进行自动分类。

    CALL my_classification_profile!set_custom_classifiers(
      {
        'medical_codes': medical_codes!list(),
        'finance_codes': finance_codes!list()
      });
    
    Copy

    或者,您还可以在创建分类配置文件时添加自定义分类器。

  5. 对实例调用 DESCRIBE 方法,确认标签图和自定义分类器已添加到分类配置文件。

    SELECT my_classification_profile!DESCRIBE();
    
    Copy
  6. Set the classification profile instance on the database.

    ALTER DATABASE mydb
     SET CLASSIFICATION_PROFILE = 'mydb.sch.my_classification_profile';
    
    Copy
  7. 将掩码策略附加到 tag_db.sch.pii 标签,启用基于标签的掩码。

    ALTER TAG tag_db.sch.pii SET MASKING POLICY pii_mask;
    
    Copy

示例:在启用自动分类前测试分类配置文件

  1. 作为管理员,向数据工程师授予所需的 角色和权限,以自动对架构中的表进行分类并为列设置标签。

  2. 使用标签图和自定义分类器创建分类配置文件:

    CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE my_classification_profile(
      {
        'minimum_object_age_for_classification_days':0,
        'auto_tag':true,
        'tag_map': {
          'column_tag_map':[
            {
              'tag_name':'tag_db.sch.pii',
              'tag_value':'highly sensitive',
              'semantic_categories':['NAME','NATIONAL_IDENTIFIER']
            },
            {
              'tag_name':'tag_db.sch.pii',
              'tag_value':'sensitive',
              'semantic_categories':['EMAIL','MEDICAL_CODE']
            }
          ]
        },
        'classify_views': true
        'custom_classifiers': {
          'medical_codes': medical_codes!list(),
          'finance_codes': finance_codes!list()
        }
      }
    );
    
    Copy
  3. 在启用自动分类之前,调用 SYSTEM$CLASSIFY 存储过程以测试 table1 表上的标记映射。

    CALL SYSTEM$CLASSIFY(
     'db.sch.table1',
     'db.sch.my_classification_profile'
    );
    
    Copy

    输出中的 tags 键包含关于标签是否已设置的详细信息(如果已设置,则为 true,否则为 false)、已设置标签的名称以及标签值:

    {
      "classification_profile_config": {
        "classification_profile_name": "db.schema.my_classification_profile"
      },
      "classification_result": {
        "EMAIL": {
          "alternates": [],
          "recommendation": {
            "confidence": "HIGH",
            "coverage": 1,
            "details": [],
            "privacy_category": "IDENTIFIER",
            "semantic_category": "EMAIL",
            "tags": [
              {
                "tag_applied": true,
                "tag_name": "snowflake.core.semantic_category",
                "tag_value": "EMAIL"
              },
              {
                "tag_applied": true,
                "tag_name": "snowflake.core.privacy_category",
                "tag_value": "IDENTIFIER"
              },
              {
                "tag_applied": true,
                "tag_name": "tag_db.sch.pii",
                "tag_value": "sensitive"
              }
            ]
          },
          "valid_value_ratio": 1
        },
        "FIRST_NAME": {
          "alternates": [],
          "recommendation": {
            "confidence": "HIGH",
            "coverage": 1,
            "details": [],
            "privacy_category": "IDENTIFIER",
            "semantic_category": "NAME",
            "tags": [
              {
                "tag_applied": true,
                "tag_name": "snowflake.core.semantic_category",
                "tag_value": "NAME"
              },
              {
                "tag_applied": true,
                "tag_name": "snowflake.core.privacy_category",
                "tag_value": "IDENTIFIER"
              },
              {
                "tag_applied": true,
                "tag_name": "tag_db.sch.pii",
                "tag_value": "highly sensitive"
              }
            ]
          },
          "valid_value_ratio": 1
        }
      }
    }
    
  4. Having verified that automatic classification based on the classification profile will have the desired result, set the classification profile instance on the database.

    ALTER DATABASE mydb
     SET CLASSIFICATION_PROFILE = 'mydb.sch.my_classification_profile';
    
    Copy
语言: 中文