使用动态数据掩码

本主题说明如何在 Snowflake 中配置和使用动态数据掩码。

To learn more about using a masking policy with a tag, see Tag-based masking policies.

使用动态数据掩码

下面列出了在 Snowflake 中配置和使用动态数据掩码的简要步骤:

  1. 向安全或隐私官的自定义角色授予掩码策略管理权限。
  2. 将该自定义角色授予相应的用户。
  3. 安全或隐私官创建和定义掩码策略,并将其应用于包含敏感数据的列。
  4. 在 Snowflake 中执行查询。请注意以下事项:
    • Snowflake 会动态重写查询,并将掩码策略 SQL 表达式应用于列。
    • 对于在掩码策略中指定的列,查询中出现该列的每个位置(例如投影、join 谓词、where 子句谓词、order by 和 group by)都会发生列重写。
    • Users see masked data based on the execution context conditions defined in the masking policies. For more information on the execution context in Dynamic Data Masking policies, see Advanced Column-level Security topics.

对从 Apache Spark™ 查询的 Apache Iceberg 表强制执行动态数据掩码策略

Snowflake supports enforcing dynamic data masking policies on Apache Iceberg tables that you query from Apache Spark™ through Snowflake Horizon Catalog. For more information, see Enforce data protection policies when querying Apache Iceberg™ tables from Apache Spark™.

第 1 步:授予自定义角色掩码策略权限

A security or privacy officer should serve as the masking policy administrator (i.e. custom role: MASKING_ADMIN) and have the privileges to define, manage, and apply masking policies to columns.

Snowflake provides the following privileges to grant to a security or privacy officer for Column-level Security masking policies:

PrivilegeObjectDescription
CREATE MASKING POLICYSchemaThis privilege controls who can create masking policies.
APPLY MASKING POLICYAccount

This privilege controls who can [un]set masking policies on columns and is granted to the ACCOUNTADMIN role by default.
This privilege only allows applying a masking policy to a column and does not provide any additional table privileges described in Access control privileges.

APPLYMasking policy

Optional. This policy-level privilege can be used by a policy owner to decentralize the [un]set operations of a given masking policy on columns to the object owners (i.e. the role that has the OWNERSHIP privilege on the object).
Snowflake supports discretionary access control where object owners are also considered data stewards.
If the policy administrator trusts the object owners to be data stewards for protected columns, then the policy administrator can use this privilege to decentralize applying the policy [un]set operations.

The following example creates the MASKING_ADMIN role and grants masking policy privileges to that role.

Create a masking policy administrator custom role:

use role useradmin;
CREATE ROLE masking_admin;

Grant privileges to masking_admin role:

use role securityadmin;
GRANT CREATE MASKING POLICY on SCHEMA <db_name.schema_name> to ROLE masking_admin;
GRANT APPLY MASKING POLICY on ACCOUNT to ROLE masking_admin;

Allow table_owner role to set or unset the ssn_mask masking policy (optional):

GRANT APPLY ON MASKING POLICY ssn_mask to ROLE table_owner;

Where:

  • {db_name.schema_name}

    Specifies the identifier for the schema for which the privilege should be granted.

For more information, see:

第 2 步:向用户授予自定义角色

Grant the MASKING_ADMIN custom role to a user serving as the security or privacy officer.

GRANT ROLE masking_admin TO USER jsmith;

第 3 步:创建掩码策略

使用 MASKING_ADMIN 角色创建掩码策略并将其应用于列。

在此代表性示例中,具有 ANALYST 角色的用户将看到未掩码的值。没有 ANALYST 角色的用户将看到完整的掩码。

CREATE OR REPLACE MASKING POLICY email_mask AS (val string) RETURNS string ->
  CASE
    WHEN CURRENT_ROLE() IN ('ANALYST') THEN val
    ELSE '*********'
  END;

Tip

If you want to update an existing masking policy and need to see the current definition of the policy, call the GET_DDL function or run the DESCRIBE MASKING POLICY command.

第 4 步:将掩码策略应用于表或视图列

These examples assume that a masking policy is not applied to the table column when the table is created and the view column when the view is created. You can optionally apply a masking policy to a table column when you create the table with a CREATE TABLE statement or a view column with a CREATE VIEW statement.

执行以下语句,以将策略应用于表列或视图列。

-- apply masking policy to a table column

ALTER TABLE IF EXISTS user_info MODIFY COLUMN email SET MASKING POLICY email_mask;

-- apply the masking policy to a view column

ALTER VIEW user_info_v MODIFY COLUMN email SET MASKING POLICY email_mask;

第 5 步:查询 Snowflake 中的数据

在 Snowflake 中执行两个不同的查询(一个使用 ANALYST 角色,另一个使用其他角色),以验证没有 ANALYST 角色的用户是否看到完整的掩码。

-- using the ANALYST role

USE ROLE analyst;
SELECT email FROM user_info; -- should see plain text value

-- using the PUBLIC role

USE ROLE PUBLIC;
SELECT email FROM user_info; -- should see full data mask

具有可记忆函数的掩码策略

This example uses a memoizable function to cache the result of a query on the mapping table that determines whether a role is authorized to view PII data. A data engineer uses a masking policy to protect the columns in the table.

下面的过程将引用这些对象:

  • A table that contains PII data, employee_data:

    +----------+-------------+---------------+
    | USERNAME |     ID      | PHONE_NUMBER  |
    +----------+-------------+---------------+
    | JSMITH   | 12-3456-89  | 1555-523-8790 |
    | AJONES   | 12-0124-32  | 1555-125-1548 |
    +----------+-------------+---------------+
  • A mapping table that determines whether a particular role is authorized to view data, auth_role_t:

    +---------------+---------------+
    | ROLE          | IS_AUTHORIZED |
    +---------------+---------------+
    | DATA_ENGINEER | TRUE          |
    | DATA_STEWARD  | TRUE          |
    | IT_ADMIN      | TRUE          |
    | PUBLIC        | FALSE         |
    +---------------+---------------+

完成以下步骤,创建一个可调用带实参的可记忆函数的掩码策略:

  1. Create a memoizable function that queries the mapping table. The function returns an array of roles based on the value of the is_authorized column:

    CREATE FUNCTION is_role_authorized(arg1 VARCHAR)
    RETURNS BOOLEAN
    MEMOIZABLE
    AS
    $$
      SELECT ARRAY_CONTAINS(
     arg1::VARIANT,
     (SELECT ARRAY_AGG(role) FROM auth_role WHERE is_authorized = TRUE)
      )
    $$;
  2. Call the memoizable function to cache the query results. In this example, pass the value TRUE as the argument value because the resultant array serves as the source of allowed roles to access the data protected by the masking policy:

    SELECT is_role_authorized(IT_ADMIN);
    +---------------------------------------------+
    |         is_role_authorized(IT_ADMIN)        |
    +---------------------------------------------+
    |                    TRUE                     |
    +---------------------------------------------+
  3. Create a masking policy to protect the id column. The policy calls the memoizable function to determine whether the role used to query the table is authorized to see the data in the protected column:

    CREATE OR REPLACE MASKING POLICY empl_id_mem_mask
    AS (val VARCHAR) RETURNS VARCHAR ->
    CASE
      WHEN is_role_authorized(CURRENT_ROLE()) THEN val
      ELSE NULL
    END;
  4. Set the masking policy on the table with an ALTER TABLE … ALTER COLUMN command:

    ALTER TABLE employee_data MODIFY COLUMN id
      SET MASKING POLICY empl_id_mem_mask;
  5. 查询表以测试策略:

    USE ROLE data_engineer;
    SELECT * FROM employee_data;

该查询返回未掩码数据。

However, if you switch roles to the PUBLIC role and repeat the query in this step, the values in the id are replaced with NULL.

其他掩码策略示例

以下是可在动态数据掩码策略正文中使用的其他代表性示例。

Allow a production account to see unmasked values and all other accounts (e.g. development, test) to see masked values.

case
  when current_account() in ('<prod_account_identifier>') then val
  else '*********'
end;

Return NULL for unauthorized users:

case
  when current_role() IN ('ANALYST') then val
  else NULL
end;

Return a static masked value for unauthorized users:

CASE
  WHEN current_role() IN ('ANALYST') THEN val
  ELSE '********'
END;

Return a hash value using SHA2 , SHA2_HEX for unauthorized users. Using a hashing function in a masking policy may result in collisions; therefore, exercise caution with this approach. For more information, see Advanced Column-level Security topics.

CASE
  WHEN current_role() IN ('ANALYST') THEN val
  ELSE sha2(val) -- return hash of the column value
END;

Apply a partial mask or full mask:

CASE
  WHEN current_role() IN ('ANALYST') THEN val
  WHEN current_role() IN ('SUPPORT') THEN regexp_replace(val,'.+\@','*****@') -- leave email domain unmasked
  ELSE '********'
END;

Using timestamps.

case
  WHEN current_role() in ('SUPPORT') THEN val
  else date_from_parts(0001, 01, 01)::timestamp_ntz -- returns 0001-01-01 00:00:00.000
end;

Important

Currently, Snowflake does not support different input and output data types in a masking policy, such as defining the masking policy to target a timestamp and return a string (e.g. ***MASKED***); the input and output data types must match.

A workaround is to cast the actual timestamp value with a fabricated timestamp value. For more information, see DATE_FROM_PARTS and CAST , ::.

Using a UDF:

CASE
  WHEN current_role() IN ('ANALYST') THEN val
  ELSE mask_udf(val) -- custom masking function
END;

On variant data:

CASE
   WHEN current_role() IN ('ANALYST') THEN val
   ELSE OBJECT_INSERT(val, 'USER_IPADDRESS', '****', true)
END;

Using a custom entitlement table. Note the use of EXISTS in the WHEN clause. Always use EXISTS when including a subquery in the masking policy body. For more information on subqueries that Snowflake supports, see Working with Subqueries.

CASE
  WHEN EXISTS
    (SELECT role FROM <db>.<schema>.entitlement WHERE mask_method='unmask' AND role = current_role()) THEN val
  ELSE '********'
END;

Using DECRYPT on previously encrypted data with either ENCRYPT or ENCRYPT_RAW, with a passphrase on the encrypted data:

case
  when current_role() in ('ANALYST') then DECRYPT(val, $passphrase)
  else val -- shows encrypted value
end;

Using a <JavaScript UDF on JSON (VARIANT):

In this example, a JavaScript UDF masks location data in a JSON string. It is important to set the data type as VARIANT in the UDF and the masking policy. If the data type in the table column, UDF, and masking policy signature do not match, Snowflake returns an error message because it cannot resolve the SQL.

-- Flatten the JSON data

create or replace table <table_name> (v variant) as
select value::variant
from @<table_name>,
  table(flatten(input => parse_json($1):stationLocation));

-- JavaScript UDF to mask latitude, longitude, and location data

CREATE OR REPLACE FUNCTION full_location_masking(v variant)
  RETURNS variant
  LANGUAGE JAVASCRIPT
  AS
  $$
    if ("latitude" in V) {
      V["latitude"] = "**latitudeMask**";
    }
    if ("longitude" in V) {
      V["longitude"] = "**longitudeMask**";
    }
    if ("location" in V) {
      V["location"] = "**locationMask**";
    }

    return V;
  $$;

  -- Grant UDF usage to ACCOUNTADMIN

  grant ownership on function FULL_LOCATION_MASKING(variant) to role accountadmin;

  -- Create a masking policy using JavaScript UDF

  create or replace masking policy json_location_mask as (val variant) returns variant ->
    CASE
      WHEN current_role() IN ('ANALYST') THEN val
      else full_location_masking(val)
      -- else object_insert(val, 'latitude', '**locationMask**', true) -- limited to one value at a time
    END;

Using the GEOGRAPHY data type:

In this example, a masking policy uses the TO_GEOGRAPHY function to convert all GEOGRAPHY data in a column to a fixed point, the longitude and latitude for Snowflake in San Mateo, California, for users whose CURRENT_ROLE is not ANALYST.

create masking policy mask_geo_point as (val geography) returns geography ->
  case
    when current_role() IN ('ANALYST') then val
    else to_geography('POINT(-122.35 37.55)')
  end;

Set the masking policy on a column with the GEOGRAPHY data type and set the GEOGRAPHY_OUTPUT_FORMAT value for the session to GeoJSON:

alter table mydb.myschema.geography modify column b set masking policy mask_geo_point;
alter session set geography_output_format = 'GeoJSON';
use role public;
select * from mydb.myschema.geography;

Snowflake returns the following:

---+--------------------+
 A |         B          |
---+--------------------+
 1 | {                  |
   |   "coordinates": [ |
   |     -122.35,       |
   |     37.55          |
   |   ],               |
   |   "type": "Point"  |
   | }                  |
 2 | {                  |
   |   "coordinates": [ |
   |     -122.35,       |
   |     37.55          |
   |   ],               |
   |   "type": "Point"  |
   | }                  |
---+--------------------+

The query result values in column B depend on the GEOGRAPHY_OUTPUT_FORMAT parameter value for the session. For example, if the parameter value is set to WKT, Snowflake returns the following:

alter session set geography_output_format = 'WKT';
select * from mydb.myschema.geography;

---+----------------------+
 A |         B            |
---+----------------------+
 1 | POINT(-122.35 37.55) |
 2 | POINT(-122.35 37.55) |
---+----------------------+

For examples using other context functions and role hierarchy, see Advanced Column-level Security topics.

后续主题: