Provider-run analysis¶
Overview¶
默认的 Clean Room 配置仅允许用户在 Clean Room 中进行分析。不过,提供商可以请求使用者允许在特定 Clean Room 中利用使用者数据运行模板。可使用 Clean Room UI 或代码启用和运行由提供商运行的分析。
The following diagram shows the data flow and the main components in a basic provider-run analysis:
In a basic provider-run analysis, the consumer and provider both link their data into the clean room. Source data is linked into the clean room as private views in the account where the data lives.
When the provider runs an analysis, the provider's data is shared with the clean room app in the consumer's account. The analysis runs on the consumer's account.
The encrypted results are temporarily written to the consumer DB in the consumer's account.
The encrypted results are copied to the analysis results back share on the provider's account (also called the governance back share) and decrypted. Because the analysis runs on the consumer's account, the consumer is billed for the analysis.
For more information, see Snowflake Data Clean Rooms:安装的对象.
Templates that support provider-run analyses¶
The following templates support provider-run analyses:
Audience Overlap & Segmentation
SQL Query (UI only)
自定义模板(仅限 API)
Billing and cost details¶
Provider-run analyses run in the consumer's account, and consumers are billed for a provider-run analysis. To stop incurring costs from provider-run analyses, the consumer must uninstall the clean room.
A consumer can estimate the number of credits consumed by the provider within the last N days by executing the following query. Specify the number of previous days as a negative number.
-- Estimate the number of credits consumed in the past 5 days.
SELECT * FROM TABLE(SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.LIBRARY.PRA_CONSUMPTION_UDTF(-5));
When a provider runs an analysis in the clean rooms UI, the clean room uses auto-scaling logic based on dataset sizes to choose a warehouse for your analysis.
When a provider creates and runs a clean room using the API, the provider can explicitly choose a warehouse size using the API. The consumer can limit the size and type of warehouses available to the provider when running a given template.
General notes¶
Providers can activate results to their own account using the UI or the API, or to third-party providers if using the UI. For information about how to enable activation and view results, see 在 Clean Room 中实施激活.
如果使用者和提供商位于不同的云区域,则需要在两个账户和两个 Clean Room 中启用 Cross-Cloud Auto-Fulfillment。
Note that provider-run cross-cloud queries can take some time to run because provider source data must be replicated from the provider to the consumer, and query results from the consumer to the provider, all across cloud regions.
Any templates run by the provider require column names or aliases for all columns generated in the results. If a column is aggregated (for example,
SUM(col1)) or calls a custom function (for example,cleanroom.my_function(p.hashed_email)), you must explicitly specify an alias for the column name as shown here:SELECT SUM(col1) AS TOTAL FROM my_db.my_sch.T; -- Correct SELECT SUM(col1) FROM my_db.my_sch.T; -- Error: aggregated column needs an explicit alias.
Provider-run analyses in the UI¶
Here is how to enable provider-run analysis in a new clean room using the clean rooms UI:
The provider creates and configures a clean room, using one of the supported templates. Configure the clean room up to the Share Clean Room step.
In the Share Clean Room step of clean room configuration, the provider selects Enable run analysis & query next to their own account to enable them to run all templates in this clean room that support provider-run analysis.
This setting cannot be changed after a clean room is created; if you want to change permission for a specific account to run queries in a published clean room, you must delete the clean room and create a new one.
The consumer joins and configures the clean room as is usual for all templates in the clean room, including any templates that support provider analysis. If the consumer does not want to enable a provider to run a specific template, they can omit required details for that template.
When the consumer joins the clean room, they are warned before joining that provider-run analysis is enabled for that clean room.
The consumer can run queries as soon as the clean room is joined, but there is a delay of up to 30 minutes before the provider can run the template. This setup delay is only occurs during the initial join step; if the provider later adds other provider-run templates, the provider can run them as soon as the consumer configures their clean room for that template.
After the join step completes, the clean room is available for both provider run analyses and consumer run analyses.
Important:
Providers must wait about 10 minutes after the consumer installs the clean room before they can run an analysis. The delay is for additional background configuration required for provider-run analyses.
The consumer is billed for all analyses in this clean room, whether run by the provider or consumer.
Provider-run analyses in the API¶
Here is how to enable provider-run analysis in a new clean room using the clean rooms API:
提供商
以标准方式创建和配置 Clean Room、数据和策略。
以标准方式添加使用者。
Enable provider-run analysis for specific consumer accounts in the clean room by calling
provider.enable_provider_run_analysis.Important:
You must call
provider.enable_provider_run_analysisafter adding consumers to a clean room, but before any consumer installs the clean room. Each consumer account must approve this request for their data to be accessible for provider-run analyses in this clean room.Any time the provider changes the provider-run analysis setting for a clean room, the clean room must be re-installed by all consumers for the change to take effect. Because it can be difficult to force all collaborators to re-install a clean room, it is more reliable for the provider to delete a published, shared clean room when changing the analysis permissions, then create a new clean room with the desired permissions.
发布 Clean Room。
让使用者知晓 Clean Room 已发布、Clean Room 的名称以及您想在 Clean Room 中运行的模板。
使用者
Install the clean room and link in your data in the standard way.
Set any join and column policies needed on your data.
Allow provider analysis for specific templates in the clean room by calling either
consumer.enable_templates_for_provider_run(for multiple templates) orconsumer.approve_template(for one template).备注
If the provider changes a template after the consumer approves it, the consumer must approve the template again. Until the template is re-approved, the old cached version of the approved template will be run by the provider.
(Optional) Provider-run analyses are billed to the consumer. A consumer can limit the warehouse type or sizes available for provider-run analyses: see Restricting warehouse size and type limits.
告知提供商您已经安装了 Clean Room,并批准了由提供商运行的分析。
提供商
在使用者安装 Clean Room 后,必须启用使用者与提供商账户之间的数据共享,从而让分析能够访问使用者数据。此过程取决于提供商和使用者是在同一个云区域还是不同的云区域中:
If the provider and consumer are in the same cloud region, the provider calls
provider.mount_request_logs_for_all_consumersonce. If a new consumer account installs the clean room later and you want to use their data in this template, you must re-run this procedure to be able to access that data.If the provider and consumer are in different cloud regions, the provider and consumer must enable cross-cloud auto-fulfillment. When a provider runs an analysis across regions, the query can take some time to complete, because query data is sent from the provider's region to the consumer's region and back.
Call
provider.view_warehouse_sizes_for_templateto see if the consumer has limited the type and size of warehouse used for the analysis. If the consumer has limited your warehouse selection, you must provide supportedwarehouse_typeandwarehouse_sizevalues in your analysis request in the next step. If the consumer has not specified warehouse limits, those fields are optional in your request. For more information, see Restricting warehouse size and type limits.Run the analysis by calling
provider.submit_analysis_requestwith the template name, the table names, and the template arguments. If the consumer has specified limits on warehouse sizes or types, you must also specify the warehouse size and type in your request.Save the request ID returned by
provider.submit_analysis_request; the ID is needed to check the status and results of the analysis.
Check the status of the analysis by calling
provider.check_analysis_status. When status is reported asCOMPLETED, callprovider.get_analysis_resultto get the analysis results.
Restricting warehouse size and type limits¶
Here is how a consumer sets a warehouse size and type limitation, and how a provider chooses a warehouse size and type when running an analysis:
The consumer calls
consumer.set_provider_run_configurationand specifies which warehouse sizes and types a provider can use for a specified template.CALL samooha_by_snowflake_local_db.consumer.set_provider_run_configuration( $cleanroom_name, { $template_name: { 'warehouse_type': 'STANDARD', 'warehouse_size': ['MEDIUM', 'LARGE']} });
The provider calls
provider.view_warehouse_sizes_for_templateto see which warehouse sizes and types are permitted for provider-run analyses on that template.CALL samooha_by_snowflake_local_db.provider.view_warehouse_sizes_for_template( $cleanroom_name, $template_name, $consumer_account_loc );
The provider specifies which supported warehouse size and type to use in their analysis run request.
CALL samooha_by_snowflake_local_db.provider.submit_analysis_request( $cleanroom_name, $consumer_locator_id, $template_name, ['SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS'], ['SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS'], object_construct( 'dimensions', ['c.REGION_CODE'], 'measure_type', ['AVG'], 'measure_column', ['c.DAYS_ACTIVE'], 'warehouse_type', 'STANDARD', -- Any other value would cause the request to fail. 'warehouse_size', 'LARGE' -- Only MEDIUM and LARGE supported. ) );
Install and run the code example¶
You can download and install a complete running example to create and run a provider-run analysis. To run this example, you need two Snowflake accounts in the same organization and cloud hosting region with the Snowflake Data Clean Room environment installed.
Install the notebook in both your provider and consumer accounts.
To upload a notebook, do the following:
In the navigation menu, select Projects » Notebooks.
Select + Notebook » Import .ipynb file.
选择您下载的 .ipynb 文件。
根据需要命名文件,并选择数据库和架构。
保留默认仓库
APP_WH。选择 Create。
在提供商账户中打开 Notebook,并完成提供商部分,以创建 Clean Room。
在使用者账户中打开 Notebook,并完成使用者部分,以安装和配置 Clean Room 并运行模板。
小技巧
以下过程管理哪一方可以在 Clean Room 环境中运行分析:
Consumer-run analysis (allowed by default): Changes are applied immediately.
provider.enable_consumer_run_analysis
provider.disable_consumer_run_analysis
Provider-run analysis (disabled by default): Changes require reinstallation by the consumer.
provider.enable_provider_run_analysis(requires the consumer to approve by calling consumer.enable_templates_for_provider_run)
provider.disable_provider_run_analysis