基本多方 Collaboration

简介

本主题介绍创建基本多方 Collaboration 的步骤。它演示了如何注册模板和数据产品、如何向 Collaboration 的初始版本添加数据,以及协作者在 Collaboration 创建后如何添加资源。它还演示了如何在 Collaboration 中使用模板和数据资源运行查询。

基本 Clean Room 协作工作流程

以下是基本的多方 Clean Room Collaboration 场景:

  1. The collaboration owner registers any templates or data offerings that they want to appear in the initial configuration of the collaboration.

  2. The owner optionally asks any intended collaborators to register templates or data offerings that they want to appear in the initial configuration of the collaboration. Collaborators then give the resource IDs of the registered items to the owner.

  3. The owner creates a collaboration. The collaboration is defined by a collaboration YAML spec that lists the collaborators, their collaboration roles, and all resources that should be present in the initial version of the collaboration.

    • 创建 Collaboration 时,协作者集及其 Collaboration 角色是固定的。
    • 如果 Collaboration 角色允许,协作者可以在创建 Collaboration 后添加其他资源。
    • If your collaboration shares data with users in other cloud hosting regions, the sharer must enable Cross-Cloud Auto-Fulfillment on their account.
  4. Collaborators review and join the collaboration.

  5. Collaborators can then optionally link additional resources to the collaboration, such as templates and data offerings, depending on their collaboration roles. Additional resources can be added to a collaboration at any time.

  6. Analysis runners can run any templates assigned to them in the collaboration, using any data available to them in the collaboration. The analysis runner bears the cost of the analysis. Templates can be designed either to return query results in the response or to activate results to the caller or another collaborator.

以下部分介绍每个步骤的详细信息。

创建协作

To create a collaboration, you design a collaboration spec that defines all the collaborators and their collaboration roles.

If you want to make resources available in a collaboration as soon as it is created, the collaboration owner registers and links those resources before creating the collaboration, and includes the resource IDs in the collaboration spec.

如果所有者希望使用协作者提供的资源,则所有者还可以提示这些用户注册其资源,并向所有者提供资源 IDs 以包含在协作规范中。所有者还需在 Collaboration 规范中指明:某些资源当前尚未关联,但未来可以进行关联。

然后,所有者调用 INITIALIZE 以开始创建协作。默认情况下,INITIALIZE 还会自动将所有者联结到协作。这是一个异步过程,因此所有者必须调用 GET_STATUS,直到状态为 JOINED。

以下代码片段演示了如何创建和加入 Collaboration。

  CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.INITIALIZE(
    $$
    api_version: 2.0.0
    spec_type: collaboration
    name: my_first_collaboration
    owner: alice
    collaborator_identifier_aliases:
      alice: example_com.acct_abc
      bob: another_example.acct_xyz
    analysis_runners:
      bob:
        data_providers:
          alice:
            data_offerings: [] -- alice has not provided data to bob, but can do so in the future.
          bob:
            data_offerings: [customers_v1]  -- bob has registered a data offering and made it available to himself.
        templates: []   -- No templates available yet for bob.
      alice:
        data_providers:
          alice:
            data_offerings: []
          bob:
            data_offerings: []
        templates: []
    $$,
    'APP_WH'            -- Use this warehouse for initialization.
  );                    --  XSMALL or SMALL warehouses are recommended for initialization.
  SET collaboration_name = 'my_first_collaboration';

  -- INITIALIZE automatically joins the owner. Check status until JOINED.
  CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.GET_STATUS($collaboration_name);

  -- Collaboration is visible here when it's joined.
  CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.VIEW_COLLABORATIONS();

脚本说明:

  • The collaboration consists of two collaborators, with the aliases alice and bob. You can use a full data-sharing ID anywhere you use an alias, but that is much less user-friendly.
  • alice is the owner.
  • Both alice and bob are analysis runners.
  • Both alice and bob are data providers to each other.
  • If you are a data provider, you must include the data_offerings field. This field can be populated or empty, indicating that there are no data offerings now, but they can be added later.
  • alice isn’t providing data to bob or herself, but can do so later (lines 14, 22).
  • bob has already registered a data offering, and provided it to himself in the initial collaboration (line 16).
  • bob isn’t providing data to alice, but can do so later (line 24).
  • Neither alice nor bob has templates available yet, but they can be assigned later (lines 18, 25). Note that the templates field is optional for an analysis runner. If you omit this field during initialization, collaborators can still assign templates to this analysis runner later.

将资源关联到 Collaboration

协作者可以根据其 Collaboration 角色将资源关联到 Collaboration 中,或移除已关联到 Collaboration 中的资源。将资源关联到 Collaboration 有两个步骤:

  1. The resource owner creates a resource definition spec for the resource and uses it to register the resource in their account. You can register the resource in your account’s default registry, or use a custom registry.
  2. A collaborator links the resource into a collaboration. Resources can be linked into a collaboration either when the collaboration is created, by hard-coding the resource ID into the YAML definition used to create the collaboration, or after the collaboration is created and joined, by calling the appropriate procedure to link the resource into the collaboration.

链接资源后,指定的协作者就可以使用该资源。某些资源类型(例如模板)可以由任何协作者链接;其他资源(例如数据产品)只能由具有数据提供商 Collaboration 角色的用户链接。但请注意,您必须先加入 Collaboration,您贡献的资源才能在该 Collaboration 中生效。

If you share data with users in other cloud hosting regions, the sharer must enable Cross-Cloud Auto-Fulfillment on their account.

资源仅可供 Collaboration 规范指定的协作者使用。

Note

对现有 Collaboration 的更新(例如关联或移除资源)是异步操作,需要一定时间才能完成。调用 VIEW_UPDATE_REQUESTS 以查看更新的状态。在资源尚未完全可用时使用,可能会导致行为不一致。

资源支持版本控制;但是,使用新版本创建新资源不会将以前的版本从协作中移除。资源通过组合用户提供的名称和版本(以及数据产品的别名)进行唯一命名。

To learn more about using resources in your collaboration, see Resources.

查看和加入 Collaboration

您必须加入 Collaboration 才能在 Collaboration 中共享资源并运行分析。

  • The creator joins automatically when calling INITIALIZE if auto_join_warehouse is provided. If auto_join_warehouse isn’t provided, the creator calls JOIN after INITIALIZE is complete.
  • 非创建者 调用 REVIEW,然后调用 JOIN。
    • REVIEW 返回 Collaboration 及其资源的概述。您仅可以调用 REVIEW 一次。
    • JOIN 在您的账户中安装 Collaboration Clean Room 并加入 Collaboration。

INITIALIZE 和 JOIN 均为异步过程,需要几分钟才能完成。您必须调用 GET_STATUS 查看每个步骤何时完成。

Important

如果您账户的云托管区域与 Collaboration 所有者不同,REVIEW 会触发额外的异步设置步骤。重复调用 REVIEW,直到返回成功响应,表示设置已完成。

联接是一个异步过程;调用 GET_STATUS 以查看您的状态何时列为 JOINED.

运行分析

如果您在 Collaboration 中拥有分析运行者的角色,则可以对 Collaboration 中与您共享的数据源运行分析。

Collaboration 支持两种类型的查询:

  • Template analyses. These queries run a template (a templated JinjaSQL statement) linked into the collaboration. Templates can be either analysis templates, which return results immediately to you, or activation templates, which save results to the Snowflake account of a designated participant.
  • Free-form SQL queries. If allowed by a data provider, you can access specified data offerings using SQL when signed in with your collaborator credentials. You run SQL queries directly, without calling a Collaboration API procedure, by accessing the fully qualified view name exposed by the collaboration.

分析运行者承担分析的运行成本。

协作规范决定了您是否可以运行模板、激活结果或运行自由格式的 SQL 查询。协作规范中描述了您的功能以及可供您使用的数据和模板。

Note

Columns from the data sources might have new names when exposed to the template or user. See Source column renaming to learn how and when source columns are renamed. Templates and user-provided arguments (such as a join column name) must use the final name, not the original name, if the column is renamed.

在以下部分中了解有关所有这些分析类型的更多信息。

从模板运行分析

要从模板运行分析,请查看您可以运行的模板列表,查看您可以使用的数据产品列表,然后调用 RUN,将您的值作为单个参数或作为 YAML 格式的分析规范。

Tables that you pass into the source_tables field in the run configuration populate the source_table parameter in the template. The template’s my_table parameter is not populated or used unless you are using Snowflake Standard Edition with your own data.

Note

Resource installation is asynchronous. If a template was just installed, it can take a short while before it is available to run. If the template includes a code spec, it can take additional time before the template is available. See how to determine when a code spec is available.

The following example lists data offerings and templates that the user can access, then runs an analysis using the sales_join_template template (which is assumed to be listed by VIEW_TEMPLATES), passing in five named arguments to the template.

-- See which data offerings are available.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.VIEW_DATA_OFFERINGS($collaboration_name);

-- See which templates you can run.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.VIEW_TEMPLATES($collaboration_name);

-- Pass in the arguments in analysis YAML format.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.RUN(
  $collaboration_name,
  $$
    api_version: 2.0.0
    spec_type: analysis
    name: My_analysis
    description: Sales results Q2 2025
    template: sales_join_template

    template_configuration:
      view_mappings:
        source_tables:
          -  user1_alias.data_offering_v1.table_1
          -  user2_alias.another_data_offering_v1.table_2
      arguments:                                            -- The template defines conv_purchase_id and the other four arguments.
         conv_purchase_id: PURCHASE_ID                      -- You must examine a template to see which arguments it supports.
         conv_purchase_amount: PURCHASE_AMOUNT
         publisher_impression_id: IMPRESSION_ID
         publisher_campaign_name: CAMPAIGN_NAME
         publisher_device_type: DEVICE_TYPE
  $$ );

对数据启用并运行自由格式的 SQL 查询

数据提供商可以授予分析运行者权限,允许其针对数据产品运行任意 SQL 查询。这意味着分析运行者可以直接对数据产品运行任意 SQL 查询,而无需调用模板。

To learn more about free-form SQL queries, see Free-form SQL queries.

使用 Standard Edition 时,使用自己的数据运行分析

If you use Standard Edition, you can run an analysis in the standard way. However, you can’t link data into the collaboration to share with other users. The only way to pass your own datasets into a template is to use the technique described here.

要在 Snowflake Standard Edition 上的 Collaboration 中使用您自己的数据,请执行以下操作:

  1. 通过调用 REGISTER_DATA_OFFERING 注册您的数据产品。
  2. Call LINK_LOCAL_DATA_OFFERING to link your data into the collaboration for you to use. No other collaborators can see or access data linked locally.
  3. Use the data offering ID when you call RUN.
  • If you are using the parameterized version of RUN, pass your data offering IDs to the local_template_view_names parameter
  • If you are using the YAML version of RUN, provide your data offering IDs in the local_view_mappings.my_tables stanza of the request
  • If you are using the parameterized version of RUN, pass your data-offering IDs to the local_template_view_names parameter.

Tip

local_template_view_names and local_view_mappings.my_tables populate the my_table parameter in the template.

The following example shows how to run a template using the YAML format version of the run procedure. This example includes the my_tables field, which is populated by calling LINK_LOCAL_DATA_OFFERING.

-- See what data offerings are available. Your own local data will be listed here as well.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.VIEW_DATA_OFFERINGS($collaboration_name);

-- Pass in the arguments in analysis YAML format.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.RUN(
  $collaboration_name,
  $$
    api_version: 2.0.0
    spec_type: analysis
    name: my_analysis
    description: Cross-purchase results for Q4 2025
    template: mytemplate_v1

    template_configuration:
      view_mappings:
        source_tables:
          - ADVERTISER1.ADVERTISER_DATA_V1.CUSTOMERS
          - PUBLISHER.ADVERTISER_DATA_V1.CUSTOMERS
      local_view_mappings:
        my_tables:
          - PARTNER.MY_DATA_V1.MY_CUSTOMERS # Populate my_table array with my own table.
      arguments:  # Template arguments, as name: value pairs
         conv_purchase_id: PURCHASE_ID
         conv_purchase_amount: PURCHASE_AMOUNT
         publisher_impression_id: IMPRESSION_ID
         publisher_campaign_name: CAMPAIGN_NAME
         publisher_device_type: DEVICE_TYPE
  $$ );

激活结果

如果数据提供商和 Collaboration 规范允许,您可以将分析结果保存到您自己的 Snowflake 账户,或指定协作者的 Snowflake 账户。模板要么激活结果,要么立即返回结果,但不能两者兼而有之。

To learn more about activation, see Activating query results.

退出或删除协作

  • Non-owners leave a collaboration by calling LEAVE. Any data offerings they have provided will be removed from the collaboration. You can’t rejoin a collaboration after leaving it.
  • Collaboration 所有者无法退出 Collaboration;因为所有权无法转让。协作所有者可以通过调用 TEARDOWN,为所有协作者删除协作。

这两个过程都是异步过程。您必须调用 GET_STATUS 以监控状态,并在 GET_STATUS 将状态显示为 LOCAL_DROP_PENDING 时,再次调用 LEAVE 或 TEARDOWN。

示例

以下 SQL 示例演示了如何创建并运行一个基础 Collaboration:

双方 Collaboration 示例

以下示例演示了一个两方 Collaboration:一方(名为“alice”)是 Collaboration 创建者,同时是她自己和“bob”的数据提供商,也担任分析运行者。“bob”是他自己和“alice”的数据提供商,同时也是分析运行者。

该示例演示以下操作:

  • 创建协作。
  • 注册模板和数据产品。
  • 在 Collaboration 创建时关联模板和数据产品。
  • 加入协作。
  • 将其他资源关联到现有 Collaboration。
  • 运行分析。

To run this example, you must have two separate accounts with Snowflake Data Clean Rooms installed.

You can either download the files and upload them to your Snowflake account, or copy and paste the example code into worksheets in two separate accounts by using Snowsight.

Download the source SQL files, then upload them into two separate accounts that have Snowflake Data Clean Rooms installed:

单方 Collaboration 示例

此示例演示了在仅使用单个账户进行测试时,如何创建和使用 Collaboration。

该示例演示了如何创建带有数据产品和模板的 Collaboration,在 Collaboration 创建后添加更多数据产品和模板,并运行分析。

You can either download the file and upload it to your Snowflake account, or copy and paste the example code into a worksheet by using Snowsight.

Download the source SQL file, then upload it into a Snowflake account that has Snowflake Data Clean Rooms installed: