教程:开始使用协作 Clean Room (API)

简介

本教程面向希望使用 Snowflake Data Clean Room API 创建和使用协作 Clean Room 的开发者。您将完成一个双账户场景,其中两个协作者会共享数据、注册模板并运行分析。

您将学习以下内容

本教程将为您展示操作方法:

  • 在 Clean Room 注册表中注册数据产品和模板。
  • 使用 YAML 协作规范创建协作。
  • 从第二个账户加入协作。
  • 将模板和数据产品关联到现有协作。
  • 以具有不同权限的不同协作者身份运行分析。

运行本教程的要求

  • Two Snowflake accounts, Enterprise Edition or higher, each with the Snowflake Data Clean Rooms environment installed. If clean rooms isn’t installed, see Installing the Snowflake Data Clean Rooms environment.
  • SAMOOHA_APP_ROLE 必须授予每个账户中的用户。

Note

本教程需要 两个独立的 Snowflake 账户。您将在一个账户中运行 Alice 的步骤,在另一个账户中运行 Bob 的步骤。每个部分标题都会指示要使用的账户。

The tutorial includes code snippets with <placeholders> that you should replace with the appropriate values.

Collaboration 基础知识

协作 Clean Room 允许多方安全地共享和分析数据,而不会相互暴露原始数据。Collaboration 由 YAML 定义,会列出协作者、他们的数据以及各方可以执行的操作的规范。

本教程中使用的关键概念:

  • 协作规范:定义协作者及其别名、角色、数据产品和模板的 YAML 文档。
  • 协作角色:每个协作者都被分配到一个或多个角色:
    • 所有者:创建和管理协作。每个协作有且仅有一个所有者。
    • 数据提供商:贡献其他协作者可在分析中使用的数据产品。
    • Analysis runner: Runs templates against the shared data. Each analysis runner has a list of data providers and templates available to use.

在本教程中:

  • Alice 是协作的 所有者数据提供商分析运行者

  • Bob数据提供商分析运行者

  • 数据产品:由数据提供商关联到协作的表。

  • 模板:分析运行者对数据产品执行的已注册 JinjaSQL 查询。

在本教程中,Alice 和 Bob 各贡献一个数据产品。Alice 注册了一个只有 Alice 才能运行的模板。Bob 注册了一个 Alice 和 Bob 都可以运行的模板。两个模板都在一个共享列上联接两个协作者的数据。

Alice:注册资源

Alice 的账户 中运行以下步骤,以设置会话环境并创建示例数据:

USE WAREHOUSE APP_WH;
USE ROLE SAMOOHA_APP_ROLE;

-- Secondary roles must be disabled to call link_data_offerings.
USE SECONDARY ROLES NONE;

-- Create sample data for Alice.
CREATE DATABASE IF NOT EXISTS ALICE_DB;
CREATE SCHEMA IF NOT EXISTS ALICE_DB.ALICE_SCH;
CREATE OR REPLACE TABLE ALICE_DB.ALICE_SCH.ALICE_DATA AS
  SELECT * FROM SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS LIMIT 100;

Note

In real-world usage, we recommend assigning more fine-grained privileges to your users instead of using the top-level SAMOOHA_APP_ROLE role.

接下来,您将注册要在协作中使用的数据产品和模板。

注册数据产品

数据产品是一个已注册的数据集,具有控制协作者如何使用数据的列级策略。您将根据您创建的示例数据创建数据产品。

注册 Alice 的数据产品,以便将其包含在协作规范中。数据产品由 YAML 数据产品规范定义。

CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.REGISTRY.REGISTER_DATA_OFFERING(
    $$
    api_version: 2.0.0
    spec_type: data_offering
    version: v1
    name: alice_customer_data
    datasets:
     - alias: customer_list
       data_object_fqn: ALICE_DB.ALICE_SCH.ALICE_DATA
       object_class: custom
       allowed_analyses: template_only
       schema_and_template_policies:
         hashed_email:
           category: join_standard
           column_type: hashed_email_b64_encoded
         status:
           category: passthrough
    $$
    );

数据产品规范定义了以下属性:

  • datasets: A list of tables or views to share.
  • alias: A short name used to reference this dataset within this spec and by templates.
  • allowed_analyses: Restricts usage to templates only (no free-form SQL).
  • schema_and_template_policies: Defines the format, naming, and availability of columns from this data source. The data offering exposes only two columns from your source table: hashed_email (which must be used as a join column) and status.

保存来自响应的数据产品 ID。创建协作规范时将需要它。

-- Save the ID.
SET alice_data_offering_id = '<alice_data_offering_id>';

-- View your registered data offerings.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.REGISTRY.VIEW_REGISTERED_DATA_OFFERINGS();

注册模板

模板是 JinjaSQL 分析运行者在协作中执行的查询。注册一个模板,该模板在哈希电子邮件列上联接两个表,并计算按状态分组的匹配项:

CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.REGISTRY.REGISTER_TEMPLATE(
$$
api_version: 2.0.0
spec_type: template
name: alice_only_template
version: v1
type: sql_analysis
description: Joins two tables on hashed email and counts matches grouped by status.
template:
  SELECT T.status, COUNT(*)
    FROM IDENTIFIER( {{ source_table[0] }} ) AS T
      JOIN IDENTIFIER( {{ source_table[1] }} ) AS T1
      ON T.hashed_email_b64_encoded = T1.hashed_email_b64_encoded
    GROUP BY T.status;
$$);

The template uses two tables source_table[0] and source_table[1]. These are data offerings present in the collaboration. The analysis runner passes in the names of the tables to use when they run the analysis.

Note

This tutorial uses T and T1 as table aliases for simplicity. In production templates, you should use the standard aliases p, p1, p2, and so on, which are required for Snowflake Data Clean Room policy enforcement.

保存响应中的模板 ID:

-- Save the ID.
SET alice_template_id = '<alice_only_template_id>';

-- View all registered templates.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.REGISTRY.VIEW_REGISTERED_TEMPLATES();

Alice:创建协作

Alice 的账户 中继续。

现在资源已注册,Alice 创建协作。协作由列出协作者及其角色、数据产品和模板的 YAML 规范定义。

使用协作规范调用 INITIALIZE。在运行之前请仔细查看 YAML:

-- Replace the <...> placeholders with the appropriate values.
-- Get your account data sharing ID:
--   SELECT CURRENT_ORGANIZATION_NAME() || '.' || CURRENT_ACCOUNT_NAME();

CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.INITIALIZE(
$$
api_version: 2.0.0
spec_type: collaboration
name: api_tutorial_collaboration
owner: alice
collaborator_identifier_aliases:
  alice: <alice_account_data_sharing_id>
  bob: <bob_account_data_sharing_id>
analysis_runners:
  alice:
    data_providers:
      alice:
        data_offerings:
        - id: <alice_data_offering_id>
      bob:
        data_offerings: []
    templates:
    - id: <alice_only_template_id>
  bob:
    data_providers:
      alice:
        data_offerings:
        - id: <alice_data_offering_id>
      bob:
        data_offerings: []
$$,
'APP_WH'
);

了解协作规范

The collaboration specification uses the aliases defined in the collaborator_identifier_aliases section to refer to all collaborators.

协作定义了以下角色和关系:

  • analysis_runners lists the collaborators that can run analyses in this collaboration. Only collaborators listed at the top level here can run analyses. The list of analysis runners cannot be modified after the collaboration is created.

  • 每个分析运行者条目都具有以下元素:

    • data_providers: Only these data providers can supply data to this analysis runner. This list cannot be modified later.
    • data_offerings: Only these data offerings from the listed data providers can supply data to this analysis runner. The data offerings list can be updated after a collaboration is created.
  • Only the templates listed for an analysis runner can be used. The template list can be modified later. Notice that this collaboration currently shares the alice_only_template with Alice.

等待协作的创建和联接

INITIALIZE creates and joins the owner to the collaboration. This process is asynchronous. Call GET_STATUS until Alice’s status is JOINED:

SET collaboration_name = '<collaboration_name>';

-- Check status. Repeat until the status is JOINED.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.GET_STATUS($collaboration_name);

-- Verify the collaboration is visible.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.VIEW_COLLABORATIONS();

启用模板自动审批

每当协作者要求在协作中与您共享模板时,所有指定的接收者都必须批准该请求,然后才能共享模板。本教程不涉及审批流程,因此请运行以下代码以自动批准所有请求:

CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.ENABLE_TEMPLATE_AUTO_APPROVAL(
  $collaboration_name
);

Bob:加入协作

切换到 Bob 的账户 并运行以下步骤。

设置会话环境:

USE WAREHOUSE APP_WH;
USE ROLE SAMOOHA_APP_ROLE;

-- Secondary roles must be disabled to call join or link_data_offering.
USE SECONDARY ROLES NONE;

查看协作邀请,然后加入:

-- See which collaborations you are invited to, or have joined.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.VIEW_COLLABORATIONS();

-- Add the SOURCE_NAME and OWNER_ACCOUNT values from the response.
SET collaboration_name = '<collaboration_name>';
SET collaborator_data_sharing_id = '<alice_account_data_sharing_id>';

-- Review and join the collaboration.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.REVIEW(
  $collaboration_name,
  $collaborator_data_sharing_id
);
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.JOIN($collaboration_name);

Joining is asynchronous. Call GET_STATUS until Bob’s status is JOINED:

-- Check status. Repeat until the status is JOINED.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.GET_STATUS($collaboration_name);

Bob:关联模板

Bob 的账户 中继续。

Bob 创建并注册模板。此模板根据哈希电子邮件列联接两个表,并返回两个表中状态的交叉表。模板采用 JinjaSQL 编写;模板规范采用 YAML 编写,其中包含嵌入的 JinjaSQL 模板。

CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.REGISTRY.REGISTER_TEMPLATE(
    $$
    api_version: 2.0.0
    spec_type: template
    name: bob_shared_template
    version: v1
    type: sql_analysis
    description: Cross-tabulates statuses from two tables joined on hashed email.
    template:
      SELECT T.status AS status_1, T1.status AS status_2, COUNT(*) AS match_count
        FROM IDENTIFIER({{ source_table[0] }}) AS T
          JOIN IDENTIFIER({{ source_table[1] }}) AS T1
          ON T.hashed_email_b64_encoded = T1.hashed_email_b64_encoded
        GROUP BY T.status, T1.status
        ORDER BY match_count DESC;
    $$
);
SET bob_template_id = '<bob_shared_template_id>';

现在请求将模板关联到协作,并与 Alice 和 Bob 共享。(请注意,您必须明确请求与自己共享模板;模板不会自动与注册它们的账户共享。)

由于 Alice 启用了模板自动批准功能,因此 Alice 会自动批准请求:

CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.ADD_TEMPLATE_REQUEST(
  $collaboration_name,
  $bob_template_id,
  ['alice', 'bob']
);

模板应该会很快获得批准并添加。

-- View the status of update requests.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.VIEW_UPDATE_REQUESTS($collaboration_name);

Bob:关联数据产品

Bob 的账户 中继续。

创建示例数据并注册数据产品:

-- Create sample data.
CREATE DATABASE IF NOT EXISTS BOB_DB;
CREATE SCHEMA IF NOT EXISTS BOB_DB.BOB_SCH;
CREATE OR REPLACE TABLE BOB_DB.BOB_SCH.BOB_DATA AS
  SELECT * FROM SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS_2 LIMIT 100;

-- Register Bob's data offering.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.REGISTRY.REGISTER_DATA_OFFERING(
    $$
    api_version: 2.0.0
    spec_type: data_offering
    version: v1
    name: bob_customer_data
    datasets:
     - alias: my_customer_list
       data_object_fqn: BOB_DB.BOB_SCH.BOB_DATA
       object_class: custom
       allowed_analyses: template_only
       schema_and_template_policies:
         hashed_email:
           category: join_standard
           column_type: hashed_email_b64_encoded
         status:
           category: passthrough
    $$
);
SET bob_data_offering_id = '<bob_data_offering_id>';

The collaboration specification lists bob as a potential data provider for both alice and bob. Link the data offering into the collaboration, and share it with both alice and bob:

CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.LINK_DATA_OFFERING(
  $collaboration_name,
  $bob_data_offering_id,
  ['alice', 'bob']
);

-- Verify that both data offerings are now available.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.VIEW_DATA_OFFERINGS(
  $collaboration_name
);

与其他协作者共享数据时,无需获得批准。

Alice:运行分析

切换回 Alice 的账户

Alice runs the alice_only_template, which is available only to Alice. The template joins Alice’s data offering with Bob’s data offering on the hashed email column and groups results by status.

首先,查看可用的数据产品和模板:

-- View available data offerings.
-- Note the view names in the TEMPLATE_VIEW_NAME column; you need these for the analysis spec.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.VIEW_DATA_OFFERINGS(
  $collaboration_name
);

-- View available templates.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.VIEW_TEMPLATES(
  $collaboration_name
);

Now, run the analysis. Replace the placeholders with actual values. Replace the source_tables names with view names from VIEW_DATA_OFFERINGS.

CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.RUN(
  $collaboration_name,
    $$
    api_version: 2.0.0
    spec_type: analysis
    description: Alice runs the alice_only_template with both data offerings.
    template: '<alice_only_template_id>'
    template_configuration:
      view_mappings:
        source_tables:
          - '<alice_data_offering_view_name>'
          - '<bob_data_offering_view_name>'
    $$
  );

结果显示了 Alice 数据中按状态分组的匹配记录数。

Bob:运行分析

切换到 Bob 的账户

Bob runs the bob_shared_template, which is available to both collaborators. This template cross-tabulates the statuses from both tables.

Replace the source_tables placeholders with actual view names from VIEW_DATA_OFFERINGS.

-- View available data offerings and templates.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.VIEW_DATA_OFFERINGS(
  $collaboration_name
);
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.VIEW_TEMPLATES(
  $collaboration_name
);

-- Run the analysis.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.RUN(
    $collaboration_name,
    $$
    api_version: 2.0.0
    spec_type: analysis
    description: Bob runs the bob_shared_template with both data offerings.
    template: '<bob_shared_template>'
    template_configuration:
      view_mappings:
        source_tables:
          - '<alice_data_offering_view_name>'
          - '<bob_data_offering_view_name>'
    $$
);

结果显示两个数据产品的状态的交叉表,以及每个组合的匹配记录数。

Bob cannot run alice_only_template because Alice did not include Bob as a permitted user for that template in the collaboration specification. Try running it to see what happens.

Alice:清理资源

切换到 Alice 的账户,清理所有使用的资源。

Tearing down a collaboration is a multi-step process. Call TEARDOWN, wait for the status to reach LOCAL_DROP_PENDING, and then call TEARDOWN again:

-- Start the teardown process.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.TEARDOWN($collaboration_name);

-- Check status. Repeat until the status is LOCAL_DROP_PENDING.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.GET_STATUS($collaboration_name);

-- Complete the teardown.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.TEARDOWN($collaboration_name);

-- Clean up the sample database.
DROP DATABASE IF EXISTS ALICE_DB;

切换到 Bob 的账户并清理示例数据库:

DROP DATABASE IF EXISTS BOB_DB;

Note

解除协作不会从任一账户的注册表中删除已注册的模板或数据产品。

摘要

在本教程中,您学习了如何:

  • 在 Clean Room 注册表中注册模板和数据产品。
  • 使用定义协作者、角色、数据和模板的 YAML 规范创建协作。
  • 从第二个账户加入协作。
  • 使用更新请求,将模板和数据产品关联到现有协作。
  • 以具有不同访问权限级别的不同协作者身份运行分析。

后续步骤