Provider-run analysis¶
默认的 Clean Room 配置仅允许用户在 Clean Room 中进行分析。不过,提供商可以请求使用者允许在特定 Clean Room 中利用使用者数据运行模板。可使用 Clean Room UI 或代码启用和运行由提供商运行的分析。
备注
提供商可以通过两种方法在 Clean Room 中进行分析:作为标准模板,在每次查询时返回结果;另一种是激活,将结果保存到提供商账户中的文件。如果您需要为自己持久保存数据或将数据导出到第三方,或者需要根据大型数据集优化结果,通常最好将 结果激活 到您的账户。如果您想使用新的参数或数据重新运行一个模板,并且不需要保留结果,使用此处描述的标准提供商运行的查询就是正确的做法。
支持的模板¶
由提供商运行的分析可在给定 Clean Room 的模板级别上启用。以下模板支持由提供商运行的分析:
重叠与分段分析
SQL 查询(仅限 UI)
自定义模板(仅限 API)
账单详情¶
Provider-run analyses are run in the consumer's account, and consumers are billed for a provider-run analysis. To stop incurring additional costs from provider analyses, the consumer must uninstall the clean room.
使用者可以通过执行以下查询来估算提供商在过去 N 天内使用的 Credit,并将之前的天数指定为负数:
-- Estimate the number of credits consumed in the past 5 days.
SELECT * FROM TABLE(SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.LIBRARY.PRA_CONSUMPTION_UDTF(-5));
Choosing and limiting the warehouse size and type¶
Clean rooms use auto-scaling logic based on dataset sizes to choose a warehouse for your analysis. However, the provider can explicitly choose a warehouse size using the API.
A consumer can limit the size and type of warehouses available to the provider when running a given template. Limiting warehouse sizes can be done only in the API, not the UI.
实现由提供商运行的分析¶
重要
如果使用者和提供商位于不同的云区域,则需要在两个账户和两个 Clean Room 中启用 Cross-Cloud Auto-Fulfillment。
以下是在新 Clean Room 中启用由提供商运行的分析的步骤:
提供商使用 支持的模板 之一创建并配置 Clean Room。
In the Share Clean Room step of clean room configuration, the provider turns on Enable run analysis & query next to their own account to enable them to run any templates in this clean room that support provider analysis.
This setting cannot be changed after a clean room is created; if you want to change permission for a specific account to run queries in a published clean room, you must delete the clean room and create a new one.
The consumer joins and configures the clean room as normal for all templates in the clean room, including any templates that support provider analysis. If the consumer does not want to enable a provider to run a specific template, they can omit required details for that template.
When the consumer joins the clean room, they are warned before joining that provider-run analysis is enabled for that clean room.
The consumer can run queries as soon as the clean room is joined, but there is a delay of up to 30 minutes before the provider can run the template. This setup delay is only for the initial join step; if the provider later adds other provider-run templates, the provider can run them as soon as the consumer configures their clean room for that template.
The clean room is now available for both provider run (after the initial setup delay) and consumer run (no delay) analyses.
The consumer is billed for all analyses in this clean room, whether run by the provider or consumer.
The provider can enable and disable both provider- and consumer-run analysis in a clean room by making the appropriate API calls.
However, any time the provider changes the provider-run analysis setting, the clean room must be re-installed by all consumers for the change to take effect. Because it can be difficult to force all collaborators to re-install a clean room, it is more reliable for the provider to delete a published, shared clean room when changing the analysis permissions, then create a new clean room with the desired permissions.
重要
Any templates run by the provider require column names or aliases for all columns generated in the results. If a column is an
aggregation function (SUM(*)) or calls a custom function (cleanroom.my_function(p.hashed_email)), you must explicitly
provide an alias for the column name: SELECT SUM(*) AS TOTAL FROM mydb.mysch.T;
以下是创建新的 Clean Room 的一般流程,该 Clean Room 允许由提供商运行的分析:
提供商
以标准方式创建和配置 Clean Room、数据和策略。
以标准方式添加使用者。
Enable provider-run analysis for specific consumer accounts in the clean room by calling
provider.enable_provider_run_analysis. Call this procedure only after adding consumers to a clean room, but before any consumer installs the clean room. Each consumer account must approve this request separately, or their data will not be accessible for provider-run analyses in this clean room.发布 Clean Room。
让使用者知晓 Clean Room 已发布、Clean Room 的名称以及您想在 Clean Room 中运行的模板。
使用者
Install the clean room and link in your data in the standard way.
Set join and column policies on your data. If you do not set both policies in a clean room, the provider cannot run a template using your data. This is unlike consumer-run analyses, where total absence of a policy means that all columns are approved for that policy type.
Allow provider analysis for specific templates in the clean room by calling either
consumer.enable_templates_for_provider_run(for multiple templates) orconsumer.approve_template(for one template).
If the provider changes a template after the consumer approves it, the consumer must approve the template again. Until the template is re-approved, the old cached version of the approved template will be run by the provider.
(Optional) Provider-run analyses happen in the consumer's account, and are billed to the consumer. If you want to limit the warehouse type or sizes available to a provider when running a template, call
consumer.set_provider_run_configuration.告知提供商您已经安装了 Clean Room,并批准了由提供商运行的分析。
提供商
在使用者安装 Clean Room 后,必须启用使用者与提供商账户之间的数据共享,从而让分析能够访问使用者数据。此过程取决于提供商和使用者是在同一个云区域还是不同的云区域中:
如果提供商和使用者位于 同一个云区域, 提供商会调用
provider.mount_request_logs_for_all_consumers。如果以后有新的使用者账户安装 Clean Room,而您又想在此模板中使用他们的数据,则必须重新运行此过程才能访问这些数据。如果提供商和使用者位于 不同的云区域,则提供商和使用者必须启用 Cross-Cloud Auto-Fulfillment。
Run the analysis by calling
provider.submit_analysis_requestwith the template name, the table names, and the template arguments. You can optionally specify a warehouse size and type, as shown later in this topic.Save the response ID, which is required to check the status and results of the analysis.
通过调用
provider.check_analysis_status来查看分析状态。当状态报告为COMPLETED时,调用provider.get_analysis_result来获取分析结果。
小技巧
如果为 Clean Room 更改了由提供商运行的分析设置,则必须卸载然后重新安装 Clean Room 才能使更改生效。
Specifying warehouse limits and choosing a warehouse type
Here is how a consumer sets a warehouse size and type limitation, and how a provider chooses a warehouse when running an analysis:
The consumer calls
consumer.set_provider_run_configurationand specifies which warehouse sizes and types a provider can use for a specified template.CALL samooha_by_snowflake_local_db.consumer.set_provider_run_configuration( $cleanroom_name, { $template1: { 'warehouse_type': 'STANDARD', 'warehouse_size': ['MEDIUM', 'LARGE']} });
The provider calls
provider.view_warehouse_sizes_for_templateto see which warehouse sizes and types are permitted for provider-run analyses on that template.CALL samooha_by_snowflake_local_db.provider.view_warehouse_sizes_for_template( $cleanroom_name, $template_name, $consumer_account_loc );
The provider specifies which size and type warehouse to use in their analysis run request. A provider can specify only warehouse sizes and types that the consumer permits for that template.
CALL samooha_by_snowflake_local_db.provider.submit_analysis_request( $cleanroom_name, $consumer_locator_id, $template1, ['SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS'], ['SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS'], object_construct( 'dimensions', ['c.REGION_CODE'], 'measure_type', ['AVG'], 'measure_column', ['c.DAYS_ACTIVE'], 'warehouse_type', 'STANDARD', -- Any other value would cause the request to fail. 'warehouse_size', 'LARGE' -- Only MEDIUM and LARGE supported. ) );
安装并运行代码示例
您可以下载并安装完整的 Clean Room 运行示例,该示例可启用并运行由供应商运行的分析。要运行此示例,需要在同一组织和云托管区域内拥有两个 Snowflake 账户,并安装了 Clean Room 环境。
Install the notebook in both your provider and consumer accounts.
To upload a notebook, do the following:
In the navigation menu, select Projects » Notebooks.
Select + Notebook » Import .ipynb file.
选择您下载的 .ipynb 文件。
根据需要命名文件,并选择数据库和架构。
保留默认仓库
APP_WH。选择 Create。
在提供商账户中打开 Notebook,并完成提供商部分,以创建 Clean Room。
在使用者账户中打开 Notebook,并完成使用者部分,以安装和配置 Clean Room 并运行模板。
小技巧
以下过程管理哪一方可以在 Clean Room 环境中运行分析:
Consumer-run analysis (allowed by default): Changes are applied immediately.
provider.enable_consumer_run_analysis
provider.disable_consumer_run_analysis
Provider-run analysis (disabled by default): Changes require reinstallation by the consumer.
:code:`provider.enable_provider_run_analysis`(需要使用者调用 consumer.enable_templates_for_provider_run 来批准)。
provider.disable_provider_run_analysis