自由格式 SQL 查询

数据提供商可以允许通过模板或自由格式查询向分析运行者公开其数据。当数据提供商在数据集上启用自由格式查询时,任何有权访问该数据产品的分析运行者都可以在其环境中针对该数据集运行 SQL 查询。

分析运行者和数据提供商必须都已加入 Collaboration,数据才能变为可用状态。

概述

以下是在 Clean Room 中针对数据运行自由格式查询的步骤:

数据提供商

  1. Register a data offering that contains one or more datasets where allowed_analyses: template_and_freeform_sql is specified.

如果数据提供商想要将 Snowflake 策略应用于数据集中的列,则必须在注册数据之前创建这些策略,并将策略与数据产品规范中的列相关联。

  1. 以标准方式将数据产品链接到 Collaboration 中。

分析运行者

After the collaboration is installed on their account, the analysis runner calls VIEW_DATA_OFFERINGS. If there is a value in the freeform_sql_view_name column, the dataset can be queried directly against the view named in that column.

Any policies listed in freeform_sql_column_policies are applied to the data by the collaboration. Any policies applied directly to the source data by the data provider are enforced, but won’t be shown in that column.

以下章节提供了有关数据提供商和分析步骤的详细信息。

注册自由格式查询数据集(数据提供商)

以下步骤显示了如何在数据产品注册期间启用自由格式查询:

  1. Specify allowed_analyses: template_and_freeform_sql in the collaboration specification. This enables the dataset to be queried using either a template or free-form query.

    ...
    datasets:
    - alias: customers_view
      data_object_fqn: PROVIDER_DB.DATA_SCH.CUSTOMERS
      object_class: custom
      allowed_analyses: template_and_freeform_sql
      schema_and_template_policies:
     HASHED_EMAIL:
       category: join_standard
       column_type: hashed_email_b64_encoded
    ...

    Only the columns listed under schema_and_template_policies are available for querying via templates or free-form queries.

  2. 如果您想在自由格式查询中应用 Snowflake 策略而不将其应用于源数据,请执行以下步骤:

    1. 以标准方式创建 Snowflake 策略。不要将它们应用于您的表。

      CREATE OR REPLACE AGGREGATION POLICY PROVIDER_DB.DATA_SCH.MIN_GROUP_SIZE_POLICY
        AS () RETURNS AGGREGATION_CONSTRAINT ->
          AGGREGATION_CONSTRAINT(MIN_GROUP_SIZE => 5);

创建 Collaboration 的角色必须对数据库、架构和策略对象具有 USAGE 权限。

这些策略是动态链接的;您对这些策略所做的任何更改都会立即影响使用这些策略的任何数据集,即使该数据产品已经注册并链接。

  1. Assign your policies in the data offering specification under the freeform_sql_policies field. Important: All column names used under freeform_sql_policies must use the auto-generated column name if the column has been renamed. Renaming affects only join-standard category columns.

这些策略不会直接应用于源表,仅应用于由 Collaboration 注册的视图。

schema_and_template_policies:
  HASHED_EMAIL:                                  # Source column name.
    category: join_standard
    column_type: hashed_email_b64_encoded        # Column is renamed to the column_type value.
  STATUS:
    category: passthrough
  AGE_BAND:
    category: passthrough
  DAYS_ACTIVE:
    category: passthrough
  INCOME_BRACKET:
    category: passthrough
freeform_sql_policies:          # Apply agg, join, and masking policies created by the data owner to these columns.
  aggregation_policy:
    name: PROVIDER_DB.DATA_SCH.MIN_GROUP_SIZE_POLICY
    entity_keys:
- HASHED_EMAIL_B64_ENCODED
  join_policy:
    name: PROVIDER_DB.DATA_SCH.EMAIL_JOIN_POLICY
    columns:
- HASHED_EMAIL_B64_ENCODED    # This is the renamed column.
  masking_policies:
    - name: PROVIDER_DB.DATA_SCH.MASK_INCOME_POLICY
columns:
  - INCOME_BRACKET
  1. 通过调用 REGISTER_DATA_OFFERING 以标准方式注册数据产品。

运行自由格式查询(分析运行者)

When an analysis runner calls VIEW_DATA_OFFERINGS, if a value appears in the freeform_sql_view_name column, the free-form SQL view can be queried directly, without using a template. All Snowflake policies applied to the source table or defined in the data offering’s freeform_sql_policies section are enforced in the queries.

CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.VIEW_DATA_OFFERINGS($collaboration_name);
ColumnValue
TEMPLATE_VIEW_NAMEdata_provider.provider_customers_V1.customers
TEMPLATE_JOIN_COLUMNShashed_email_b64_encoded
ANALYSIS_ALLOWED_COLUMNSSTATUS, AGE_BAND, DAYS_ACTIVE, INCOME_BRACKET
ACTIVATION_ALLOWED_COLUMNS
FREEFORM_SQL_VIEW_NAMESFDCR_FREEFORM_SQL_DEMO.FREEFORM_SQL.DATA_PROVIDER_PROVIDER_CUSTOMERS_V1_CUSTOMERS
FREEFORM_SQL_COLUMN_POLICIES
{
  "aggregation_policy": {"entity_keys": ["HASHED_EMAIL_B64_ENCODED"]},
  "masking_policy": {"columns": ["INCOME_BRACKET"]},
  "join_policy": {"columns": ["HASHED_EMAIL_B64_ENCODED"]},
  "no_policy": {"columns": ["DAYS_ACTIVE", "AGE_BAND", "STATUS"]}
}
SHARED_BYdata_provider
SHARED_WITH["data_consumer"]
DATA_OFFERING_IDprovider_customers_V1

You must use the value from freeform_sql_view_name, not the value from template_view_name.

SELECT status, COUNT(*) AS customer_count
  FROM SFDCR_FREEFORM_SQL_DEMO.FREEFORM_SQL.DATA_PROVIDER_PROVIDER_CUSTOMERS_V1_CUSTOMERS AS t
  GROUP BY status
  ORDER BY customer_count DESC;

示例:双方协作

以下示例演示了一个双方 Collaboration,其中一方(“提供商”)是 Collaboration 所有者和使用者的数据提供商。另一方(“使用者”)是分析运行者,可以运行模板并使用提供商提供的数据,还可以根据数据提供商规范中定义的策略对数据运行自由格式 SQL 查询。

To run this example, you must have two separate accounts with Snowflake Data Clean Rooms installed.

You can either download the files and upload them to your Snowflake account, or copy and paste the example code into worksheets in two separate accounts by using Snowsight.

Download the source SQL files, and then upload them into two separate accounts that have Snowflake Data Clean Rooms installed: