Use SQL to set up sensitive data classification

The following sections describe how to use SQL to set up the automatic classification of sensitive data within a database. If you want to use a web interface to set up sensitive data classification, see 使用 Trust Center 设置敏感数据分类.

The basic workflow for using SQL to classify sensitive data consists of the following steps:

  1. Create a classification profile that controls what happens during sensitive data classification.

  2. Set the classification profile on a database or schema to automatically classify tables in the entity.

有关该工作流程的端到端示例,请参阅 示例

关于分类配置文件

A classification profile defines the criteria that are used to automatically classify tables in a database. This criteria includes:

  • 表在自动分类前应存在多长时间。

  • 之前已分类的表应在多长时间内重新分类。

  • 分类后,是否在列上自动设置系统和自定义标签。您可以决定是让 Snowflake 自动应用推荐的标签,还是选择先查看系统建议的标签分配,然后再由您自行应用。

  • 在系统分类标签和用户定义的对象标签之间 映射,以便自动应用用户定义的标签。

  • Whether custom classifiers are used to classify data.

When a data engineer assigns the classification profile to a database, sensitive data in the tables that belong to the database is automatically classified on the schedule defined by the profile. A data engineer can assign the same classification profile to multiple databases, or create multiple classification profiles to set different classification criteria for different databases.

To use SQL to create a classification profile, run the CREATE CLASSIFICATION_PROFILE command to create an instance of the CLASSIFICATION_PROFILE class.

For an example of using the CREATE CLASSIFICATION_PROFILE command to create a classification profile, see 示例.

关于标签映射

You can use the classification profile to map SEMANTIC_CATEGORY system tags to one or more object tags. With this tag mapping, a column with sensitive data can be automatically assigned a user-defined tag based on its classification. The tag map can be added while creating the classification profile or later by calling the <classification_profile_name>!SET_TAG_MAP method.

Regardless of whether you are defining the tag map while creating the classification profile or after, the contents of the map are specified as a JSON object. This JSON object contains the 'column_tag_map' key, which is an array of objects that specify a user-defined tag, the string value of that tag, and the semantic categories to which the tag is being mapped.

以下是标签映射示例:

'tag_map': {
  'column_tag_map': [
    {
      'tag_name':'tag_db.sch.pii',
      'tag_value':'Highly Confidential',
      'semantic_categories':[
        'NAME',
        'NATIONAL_IDENTIFIER'
      ]
    },
    {
      'tag_name': 'tag_db.sch.pii',
      'tag_value':'Confidential',
      'semantic_categories': [
        'EMAIL'
      ]
    }
  ]
}
Copy

根据此映射,如果您有一列电子邮件地址,且分类过程确定该列包含这些地址,则会在包含电子邮件地址的列上设置 tag_db.sch.pii = 'Confidential' 标签。

如果您的标签映射包括多个映射标签、标签值和类别值的 JSON 对象,则 JSON 对象的顺序将决定在发生冲突时在列上设置哪个标签和值。按所需的赋值顺序从左到右指定 JSON 对象,如果是格式化 JSON,则从上到下。

小技巧

column_tag_map 字段中的每个对象只有一个必填键:tag_name。如果省略 tag_valuesemantic_categories 键,用户定义的标签将应用于 SEMANTIC_CATEGORY 系统标签所应用的每一列,并且用户定义的标签值将与给定列的 SEMANTIC_CATEGORY 标签值相匹配。

If there is a conflict with a manually assigned tag and a tag applied by automatic classification, an error occurs. For information about tracking these errors, see 敏感数据分类故障排除.

Classify data using a subset of native semantic categories

By default, Snowflake classifies data into its native semantic categories whenever it identifies sensitive data. A semantic category represents a type of data, such as email addresses, credit card numbers, or social security numbers.

You can configure the classification profile to limit which types of data (semantic categories) to classify as sensitive. Snowflake classifies data only if it belongs to the subset of semantic categories that you specify in the profile.

The snowflake_semantic_categories key in a classification profile's configuration object defines the list of semantic categories that you want classified. You can specify which data to classify in two ways:

  • Classify by semantic category: Specify categories like NAME or EMAIL to classify all data of that type, regardless of location.

  • Classify by semantic category and country: For categories with country-specific subcategories (like TAX_IDENTIFIER or PASSPORT), you can use the country_codes key to classify only data from specific countries. For a list of two-letter country codes and supported semantic subcategories, see 敏感数据分类的原生语义类别.

You can mix both approaches in the same classification profile. The following examples demonstrate each variation:

Example 1: Classify specific semantic categories

This example configures a classification profile so data is classified only if Snowflake identifies it as belonging to the semantic categories NAME and NATIONAL_IDENTIFIER. All other types of data are not classified.

CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE
  my_classification_profile(
    {
      'minimum_object_age_for_classification_days': 0,
      'snowflake_semantic_categories':
        [
          {'category': 'NAME'},
          {'category': 'NATIONAL_IDENTIFIER'}
        ]
    });
Copy

Example 2: Classify a category with specific country codes

For semantic categories that have country-specific subcategories, you can use the country_codes key to limit classification to specific countries. The country_codes value is a two-letter country code.

This example classifies data only if Snowflake identifies that it is a tax identifier in Italy (IT) or France (FR):

CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE
  my_classification_profile(
    {
      'minimum_object_age_for_classification_days': 0,
      'snowflake_semantic_categories':
        [
          {
            'category': 'TAX_IDENTIFIER',
            'country_codes': ['IT', 'FR']
          }
        ]
    });
Copy

Example 3: Combine global and country-specific categories

You can combine semantic categories that do not have country-specific subcategories with categories that do. This example specifies that Snowflake always classify data belonging to the NAME category and classify the PASSPORT category if the data pertains to United States passports:

CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE
  my_classification_profile(
    {
      'minimum_object_age_for_classification_days': 0,
      'snowflake_semantic_categories':
        [
          {
            'category': 'NAME'
          },
          {
            'category': 'PASSPORT',
            'country_codes': ['US']
          }
        ]
    });
Copy

在数据库上设置分类配置文件

Implement sensitive data classification by setting a classification profile on a database. After you set the classification profile on the database, all tables and views within that database are automatically monitored by sensitive data classification.

您还可以在架构上设置分类。如果您在数据库中存在的架构设置了分类配置文件,而该架构中也与分类配置文件相关联,则在该架构上设置的配置文件将会覆盖数据库设置的配置文件。

To set a classification profile, use an ALTER DATABASE or ALTER SCHEMA command to set the CLASSIFICATION_PROFILE parameter. For example, to set a classification profile my_profile so all tables and views in the my_db database are monitored by sensitive data classification, run the following command:

ALTER DATABASE my_db
  SET CLASSIFICATION_PROFILE = 'governance_db.classify_sch.my_profile';
Copy

访问控制

Here are the privileges and roles that let you work with classification profiles and enable sensitive data classification.

任务

所需权限/角色

备注

创建分类配置文件

SNOWFLAKE.CLASSIFICATION_ADMIN 数据库角色

有关向其他角色授予此数据库角色的信息,请参阅 使用 SNOWFLAKE 数据库角色

对架构的 CREATE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE 权限

对于要创建分类配置文件实例的架构,您需要拥有对该架构的此权限。

对数据库和架构的 USAGE 权限

You need privileges on the schema where you want to create the classification profile instance.

在数据库/架构上设置分类配置文件

以下其中一项:

  • 对账户的 EXECUTE AUTO CLASSIFICATION 权限

  • 对数据库/架构的 EXECUTE AUTO CLASSIFICATION 权限

默认情况下,数据库/架构所有者拥有 EXECUTE AUTO CLASSIFICATION 权限。

对架构数据库的任何权限

如果在架构上设置分类配置文件,则至少需要对包含该架构的数据库具有一项权限。

对数据库/架构的任何权限

对于包含要自动分类的表的架构,您至少需要对数据库/架构拥有一项权限。EXECUTE AUTO CLASSIFICATION 权限符合这一要求。

以下其中一项:

  • 对分类配置文件实例的 OWNERSHIP 权限。

  • 分类配置文件上的 <classification_profile>!PRIVACY_USER 实例角色。

有关向其他角色授予 PRIVACY_USER 实例角色的信息,请参阅 实例角色

对账户的 APPLY TAG 权限

在分类配置文件实例上调用 方法

<classification_profile>!PRIVACY_USER 实例角色

有关向其他角色授予此实例角色的信息,请参阅 实例角色

列出分类配置文件

<classification_profile>!PRIVACY_USER 实例角色

删除分类配置文件

对分类配置文件实例的 OWNERSHIP 权限

有关向数据工程师角色授予这些权限和数据库角色的示例,请参阅 基本示例:自动对数据库中的表进行分类

示例

基本示例:自动对数据库中的表进行分类

完成以下步骤,自动对数据库中的表进行分类:

  1. 作为管理员,向数据工程师授予所需的 角色和权限,以自动对数据库中的表进行分类。

    USE ROLE ACCOUNTADMIN;
    
    GRANT USAGE ON DATABASE mydb TO ROLE data_engineer;
    GRANT EXECUTE AUTO CLASSIFICATION ON DATABASE mydb TO ROLE data_engineer;
    
    GRANT DATABASE ROLE SNOWFLAKE.CLASSIFICATION_ADMIN TO ROLE data_engineer;
    GRANT CREATE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE ON SCHEMA mydb.sch TO ROLE data_engineer;
    
    GRANT APPLY TAG ON ACCOUNT TO ROLE data_engineer;
    
    Copy
  2. 切换到数据工程师角色:

    USE ROLE data_engineer;
    
    Copy
  3. 将分类配置文件 创建为 CLASSIFICATION_PROFILE 类的实例:

    CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE
      my_classification_profile(
        {
          'minimum_object_age_for_classification_days': 0,
          'maximum_classification_validity_days': 30,
          'auto_tag': true,
          'classify_views': false
        });
    
    Copy
  4. 对实例调用 DESCRIBE 方法以确认实例的属性:

    SELECT my_classification_profile!DESCRIBE();
    
    Copy
  5. 在架构上设置分类配置文件实例,启动后台进程,以监控架构中的表,并自动对其进行敏感数据分类。

    ALTER DATABASE mydb
     SET CLASSIFICATION_PROFILE = 'mydb.sch.my_classification_profile';
    
    Copy

    备注

    在架构上设置分类配置文件与 Snowflake 开始对架构进行分类之间会有一小时的延迟。

  6. 等待一小时后,调用 SYSTEM$GET_CLASSIFICATION_RESULT 存储过程,以获取自动分类的结果。

    CALL SYSTEM$GET_CLASSIFICATION_RESULT('mydb.sch.t1');
    
    Copy

示例:使用标签图和自定义分类器

  1. 作为管理员,向数据工程师授予所需的 角色和权限,以自动对数据库中的表进行分类并为列设置标签。

  2. 创建分类配置文件。

    CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE
      my_classification_profile(
        {
          'minimum_object_age_for_classification_days': 0,
          'maximum_classification_validity_days': 30,
          'auto_tag': true,
          'classify_views': false
        });
    
    Copy
  3. 对实例调用 SET_TAG_MAP 方法,将 标签图 添加到分类配置文件。这样就可以在包含敏感数据的列上自动应用自定义标签。

    CALL my_classification_profile!SET_TAG_MAP(
      {'column_tag_map':[
        {
          'tag_name':'my_db.sch1.pii',
          'tag_value':'sensitive',
          'semantic_categories':['NAME']
        }]});
    
    Copy

    或者,您还可以在创建分类配置文件时添加此标签图。

  4. 调用 SET_CUSTOM_CLASSIFIERS 方法,将 自定义分类器 添加到分类配置文件。这样就能根据用户定义的语义和隐私类别对敏感数据进行自动分类。

    CALL my_classification_profile!set_custom_classifiers(
      {
        'medical_codes': medical_codes!list(),
        'finance_codes': finance_codes!list()
      });
    
    Copy

    或者,您还可以在创建分类配置文件时添加自定义分类器。

  5. 对实例调用 DESCRIBE 方法,确认标签图和自定义分类器已添加到分类配置文件。

    SELECT my_classification_profile!DESCRIBE();
    
    Copy
  6. 在数据库上设置分类配置文件实例。

    ALTER DATABASE mydb
     SET CLASSIFICATION_PROFILE = 'mydb.sch.my_classification_profile';
    
    Copy
  7. 将掩码策略附加到 tag_db.sch.pii 标签,启用基于标签的掩码。

    ALTER TAG tag_db.sch.pii SET MASKING POLICY pii_mask;
    
    Copy

示例:在启用自动分类前测试分类配置文件

  1. 作为管理员,向数据工程师授予所需的 角色和权限,以自动对架构中的表进行分类并为列设置标签。

  2. 使用标签图和自定义分类器创建分类配置文件:

    CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE my_classification_profile(
      {
        'minimum_object_age_for_classification_days':0,
        'auto_tag':true,
        'tag_map': {
          'column_tag_map':[
            {
              'tag_name':'tag_db.sch.pii',
              'tag_value':'highly sensitive',
              'semantic_categories':['NAME','NATIONAL_IDENTIFIER']
            },
            {
              'tag_name':'tag_db.sch.pii',
              'tag_value':'sensitive',
              'semantic_categories':['EMAIL','MEDICAL_CODE']
            }
          ]
        },
        'classify_views': false,
        'custom_classifiers': {
          'medical_codes': medical_codes!list(),
          'finance_codes': finance_codes!list()
        }
      }
    );
    
    Copy
  3. 在启用自动分类之前,调用 SYSTEM$CLASSIFY 存储过程以测试 table1 表上的标记映射。

    CALL SYSTEM$CLASSIFY(
     'db.sch.table1',
     'db.sch.my_classification_profile'
    );
    
    Copy

    输出中的 tags 键包含关于标签是否已设置的详细信息(如果已设置,则为 true,否则为 false)、已设置标签的名称以及标签值:

    {
      "classification_profile_config": {
        "classification_profile_name": "db.schema.my_classification_profile"
      },
      "classification_result": {
        "EMAIL": {
          "alternates": [],
          "recommendation": {
            "confidence": "HIGH",
            "coverage": 1,
            "details": [],
            "privacy_category": "IDENTIFIER",
            "semantic_category": "EMAIL",
            "tags": [
              {
                "tag_applied": true,
                "tag_name": "snowflake.core.semantic_category",
                "tag_value": "EMAIL"
              },
              {
                "tag_applied": true,
                "tag_name": "snowflake.core.privacy_category",
                "tag_value": "IDENTIFIER"
              },
              {
                "tag_applied": true,
                "tag_name": "tag_db.sch.pii",
                "tag_value": "sensitive"
              }
            ]
          },
          "valid_value_ratio": 1
        },
        "FIRST_NAME": {
          "alternates": [],
          "recommendation": {
            "confidence": "HIGH",
            "coverage": 1,
            "details": [],
            "privacy_category": "IDENTIFIER",
            "semantic_category": "NAME",
            "tags": [
              {
                "tag_applied": true,
                "tag_name": "snowflake.core.semantic_category",
                "tag_value": "NAME"
              },
              {
                "tag_applied": true,
                "tag_name": "snowflake.core.privacy_category",
                "tag_value": "IDENTIFIER"
              },
              {
                "tag_applied": true,
                "tag_name": "tag_db.sch.pii",
                "tag_value": "highly sensitive"
              }
            ]
          },
          "valid_value_ratio": 1
        }
      }
    }
    
  4. 在验证基于分类配置文件的自动分类会产生预期结果后,在数据库上设置分类配置文件实例。

    ALTER DATABASE mydb
     SET CLASSIFICATION_PROFILE = 'mydb.sch.my_classification_profile';
    
    Copy