自由格式 SQL 查询¶
数据提供商可以允许通过模板或自由格式查询向分析运行者公开其数据。当数据提供商在数据集上启用自由格式查询时,任何有权访问该数据产品的分析运行者都可以在其环境中针对该数据集运行 SQL 查询。
分析运行者和数据提供商必须都已加入 Collaboration,数据才能变为可用状态。
概述¶
以下是在 Clean Room 中针对数据运行自由格式查询的步骤:
数据提供商
注册包含一个或多个指定了
allowed_analyses: template_and_freeform_sql的数据集的数据产品。如果数据提供商想要将 Snowflake 策略应用于数据集中的列,则必须在注册数据之前创建这些策略,并将策略与数据产品规范中的列相关联。
以标准方式将数据产品链接到 Collaboration 中。
分析运行者
在 Collaboration 安装到其账户后,分析运行者将调用 VIEW_DATA_OFFERINGS。如果 freeform_sql_view_name 列中有值,则可以直接针对该列中命名的视图查询数据集。
freeform_sql_column_policies 中列出的任何策略都由 Collaboration 应用于数据。数据提供商直接应用于源数据的任何策略都会被强制执行,但不会显示在该列中。
以下章节提供了有关数据提供商和分析步骤的详细信息。
注册自由格式查询数据集(数据提供商)¶
以下步骤显示了如何在数据产品注册期间启用自由格式查询:
在 Collaboration 规范中指定
allowed_analyses: template_and_freeform_sql。这将启用使用模板或自由格式查询来查询数据集。... datasets: - alias: customers_view data_object_fqn: PROVIDER_DB.DATA_SCH.CUSTOMERS object_class: custom allowed_analyses: template_and_freeform_sql schema_and_template_policies: HASHED_EMAIL: category: join_standard column_type: hashed_email_b64_encoded ...
只有
schema_and_template_policies下列出的列可用于通过模板或自由格式查询进行查询。如果您想在自由格式查询中应用 Snowflake 策略而不将其应用于源数据,请执行以下步骤:
以标准方式创建 Snowflake 策略。不要将它们应用于您的表。
CREATE OR REPLACE AGGREGATION POLICY PROVIDER_DB.DATA_SCH.MIN_GROUP_SIZE_POLICY AS () RETURNS AGGREGATION_CONSTRAINT -> AGGREGATION_CONSTRAINT(MIN_GROUP_SIZE => 5);
创建 Collaboration 的角色必须对数据库、架构和策略对象具有 USAGE 权限。
这些策略是动态链接的;您对这些策略所做的任何更改都会立即影响使用这些策略的任何数据集,即使该数据产品已经注册并链接。
在
freeform_sql_policies字段下的数据产品规范中分配您的策略。重要提示:如果列已被重命名,则freeform_sql_policies下使用的所有列名称必须使用 自动生成的列名称。重命名仅影响 join-standard 类别列。这些策略不会直接应用于源表,仅应用于由 Collaboration 注册的视图。
schema_and_template_policies: HASHED_EMAIL: # Source column name. category: join_standard column_type: hashed_email_b64_encoded # Column is renamed to the column_type value. STATUS: category: passthrough AGE_BAND: category: passthrough DAYS_ACTIVE: category: passthrough INCOME_BRACKET: category: passthrough freeform_sql_policies: # Apply agg, join, and masking policies created by the data owner to these columns. aggregation_policy: name: PROVIDER_DB.DATA_SCH.MIN_GROUP_SIZE_POLICY entity_keys: - HASHED_EMAIL_B64_ENCODED join_policy: name: PROVIDER_DB.DATA_SCH.EMAIL_JOIN_POLICY columns: - HASHED_EMAIL_B64_ENCODED # This is the renamed column. masking_policies: - name: PROVIDER_DB.DATA_SCH.MASK_INCOME_POLICY columns: - INCOME_BRACKET
通过调用 REGISTER_DATA_OFFERING 以标准方式注册数据产品。
运行自由格式查询(分析运行者)¶
当分析运行者调用 VIEW_DATA_OFFERINGS 时,如果 freeform_sql_view_name 列中出现值,则可以直接查询自由格式 SQL 视图,而无需使用模板。所有应用于源表或在 :ref:` 数据产品的 <label-dcr_collaboration_data_yaml>` freeform_sql_policies 部分中定义的 Snowflake 策略都会在查询中强制执行。
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.VIEW_DATA_OFFERINGS($collaboration_name);
列 |
值 |
|---|---|
TEMPLATE_VIEW_NAME |
|
TEMPLATE_JOIN_COLUMNS |
|
ANALYSIS_ALLOWED_COLUMNS |
|
ACTIVATION_ALLOWED_COLUMNS |
|
FREEFORM_SQL_VIEW_NAME |
|
FREEFORM_SQL_COLUMN_POLICIES |
{
"aggregation_policy": {"entity_keys": ["HASHED_EMAIL_B64_ENCODED"]},
"masking_policy": {"columns": ["INCOME_BRACKET"]},
"join_policy": {"columns": ["HASHED_EMAIL_B64_ENCODED"]},
"no_policy": {"columns": ["DAYS_ACTIVE", "AGE_BAND", "STATUS"]}
}
|
SHARED_BY |
|
SHARED_WITH |
|
DATA_OFFERING_ID |
|
您必须使用来自 freeform_sql_view_name 的值,而不是 来自 template_view_name 的值。
SELECT status, COUNT(*) AS customer_count
FROM SFDCR_FREEFORM_SQL_DEMO.FREEFORM_SQL.DATA_PROVIDER_PROVIDER_CUSTOMERS_V1_CUSTOMERS AS t
GROUP BY status
ORDER BY customer_count DESC;
示例:双方协作¶
以下示例演示了一个双方 Collaboration,其中一方(“提供商”)是 Collaboration 所有者和使用者的数据提供商。另一方(“使用者”)是分析运行者,可以运行模板并使用提供商提供的数据,还可以根据数据提供商规范中定义的策略对数据运行自由格式 SQL 查询。
要运行此示例,您必须拥有两个已安装 Snowflake Data Clean Rooms 的不同账户。
您可以下载文件并将其上传到您的 Snowflake 账户,也可以使用 Snowsight 将示例代码复制并粘贴到两个单独账户的工作表中。
下载源 SQL 文件,然后将它们上传到两个已安装 Snowflake Data Clean Rooms 的独立账户:
-- ============================================================================
-- Free-form SQL Collaboration Demo: Data Provider
-- ============================================================================
-- This example demonstrates a Snowflake Data Clean Rooms collaboration using
-- freeform SQL policies. The data provider creates a sample dataset with
-- Snowflake aggregation, join, and masking policies, registers a data offering
-- that permits freeform SQL queries, creates a template, and initializes a
-- collaboration with one other collaborator (data_consumer).
--
-- For more information, see:
-- docs.snowflake.com/user-guide/cleanrooms/free-form-sql.rst
-- docs.snowflake.com/user-guide/cleanrooms/spec-reference
-- ============================================================================
-- ============================================================================
-- SETUP: Create sample database, schema, table, and policies.
-- ============================================================================
USE ROLE SAMOOHA_APP_ROLE;
USE WAREHOUSE APP_WH;
-- You can't use secondary roles with most collaboration procedures.
USE SECONDARY ROLES NONE;
CREATE DATABASE IF NOT EXISTS PROVIDER_DB;
CREATE SCHEMA IF NOT EXISTS PROVIDER_DB.DATA_SCH;
-- Create a table with 300 rows from the sample CUSTOMERS table.
CREATE OR REPLACE TABLE PROVIDER_DB.DATA_SCH.CUSTOMERS AS
SELECT HASHED_EMAIL, STATUS, AGE_BAND, REGION_CODE, DAYS_ACTIVE, INCOME_BRACKET
FROM SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS
LIMIT 300;
-- Create an aggregation policy that requires a minimum group size of 5.
CREATE OR REPLACE AGGREGATION POLICY PROVIDER_DB.DATA_SCH.MIN_GROUP_SIZE_POLICY
AS () RETURNS AGGREGATION_CONSTRAINT ->
AGGREGATION_CONSTRAINT(MIN_GROUP_SIZE => 5);
-- Create an inactive join policy. You will modify this later.
CREATE OR REPLACE JOIN POLICY PROVIDER_DB.DATA_SCH.EMAIL_JOIN_POLICY
AS () RETURNS JOIN_CONSTRAINT ->
JOIN_CONSTRAINT(JOIN_REQUIRED => FALSE);
-- Create a masking policy that replaces the original value with a fixed string.
CREATE OR REPLACE MASKING POLICY PROVIDER_DB.DATA_SCH.MASK_INCOME_POLICY
AS (val STRING) RETURNS STRING ->
'***MASKED***';
-- ============================================================================
-- Register a data offering with freeform SQL policies.
-- ============================================================================
-- The data offering enables freeform SQL queries (template_and_freeform_sql)
-- and attaches three Snowflake policies to protect data in freeform queries:
-- * Aggregation policy on hashed_email: enforces a minimum group size of 5.
-- * Join policy on hashed_email: requires joins to include this column.
-- * Masking policy on income_bracket: masks the column value in query results.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.REGISTRY.REGISTER_DATA_OFFERING(
$$
api_version: 2.0.0
spec_type: data_offering
version: V1
name: provider_customers
description: Customer dataset with freeform SQL policies.
datasets:
- alias: customers
data_object_fqn: PROVIDER_DB.DATA_SCH.CUSTOMERS
object_class: custom
allowed_analyses: template_and_freeform_sql
schema_and_template_policies:
HASHED_EMAIL:
category: join_standard
column_type: hashed_email_b64_encoded
STATUS:
category: passthrough
AGE_BAND:
category: passthrough
DAYS_ACTIVE:
category: passthrough
INCOME_BRACKET:
category: passthrough
freeform_sql_policies:
aggregation_policy:
name: PROVIDER_DB.DATA_SCH.MIN_GROUP_SIZE_POLICY
entity_keys:
- HASHED_EMAIL_B64_ENCODED
join_policy:
name: PROVIDER_DB.DATA_SCH.EMAIL_JOIN_POLICY
columns:
- HASHED_EMAIL_B64_ENCODED
masking_policies:
- name: PROVIDER_DB.DATA_SCH.MASK_INCOME_POLICY
columns:
- INCOME_BRACKET
$$
);
-- Save the data offering ID returned by the registration call.
SET data_offering_id = '<data_offering_id>';
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.REGISTRY.VIEW_REGISTERED_DATA_OFFERINGS();
-- ============================================================================
-- Register a template with a simple one-table query.
-- ============================================================================
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.REGISTRY.REGISTER_TEMPLATE(
$$
api_version: 2.0.0
spec_type: template
name: status_summary
version: V1
type: sql_analysis
description: Returns a count of customers grouped by status.
template:
SELECT status, COUNT(*) AS customer_count
FROM IDENTIFIER({{ source_table[0] }})
GROUP BY status
ORDER BY customer_count DESC;
$$
);
-- Save the template ID returned by the registration call.
SET template_id = '<template_id>';
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.REGISTRY.VIEW_REGISTERED_TEMPLATES();
-- ============================================================================
-- Create the collaboration.
-- ============================================================================
-- Replace the <...> placeholders with the appropriate values.
-- Get your account data sharing ID with:
-- SELECT CURRENT_ORGANIZATION_NAME() || '.' || CURRENT_ACCOUNT_NAME();
-- In this collaboration, the consumer can run templated and free-form queries
-- against the provider's data. The provider/owner isn't an analysis runner.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.INITIALIZE(
$$
api_version: 2.0.0
spec_type: collaboration
name: freeform_sql_demo
owner: data_provider
collaborator_identifier_aliases:
data_provider: <provider_account_data_sharing_id>
data_consumer: <consumer_account_data_sharing_id>
analysis_runners:
data_consumer:
data_providers:
data_provider:
data_offerings:
- id: <data_offering_id>
templates:
- id: <template_id>
$$,
'APP_WH'
);
SET collaboration_name = 'freeform_sql_demo';
-- INITIALIZE automatically joins the owner. Repeat until status is JOINED.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.GET_STATUS($collaboration_name);
-- Verify that the collaboration is visible.
-- Collaboration spec is in COLLABORATION_SPEC column.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.VIEW_COLLABORATIONS() ->>
SELECT * FROM $1 WHERE "SOURCE_NAME" = $collaboration_name;
-- SWITCH TO data_consumer account to join and run analyses.
-- Update the join policy associated with HASHED_EMAIL_B64_ENCODED.
-- All queries on that data offering now require joins on HASHED_EMAIL_B64_ENCODED.
-- Re-run any of the previously successful free-form queries and they will fail.
ALTER JOIN POLICY PROVIDER_DB.DATA_SCH.EMAIL_JOIN_POLICY SET BODY ->
JOIN_CONSTRAINT(JOIN_REQUIRED => TRUE);
-- ============================================================================
-- CLEANUP: Delete the collaboration, registered resources, and sample data.
-- ============================================================================
-- Teardown is a multi-step process. Call TEARDOWN, then wait for GET_STATUS
-- to report LOCAL_DROP_PENDING, then call TEARDOWN again.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.TEARDOWN($collaboration_name);
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.GET_STATUS($collaboration_name);
-- When GET_STATUS reports LOCAL_DROP_PENDING, call TEARDOWN again to complete.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.TEARDOWN($collaboration_name);
-- ============================================================================
-- Free-form SQL Collaboration Demo: Data Consumer
-- ============================================================================
-- This example demonstrates joining a Snowflake Data Clean Rooms collaboration
-- as an analysis runner. The data consumer joins a collaboration created by
-- the data provider, views available templates and data offerings, runs an
-- analysis using the provider's template, and then runs several free-form SQL
-- queries directly against the data-offering views.
--
-- The data offering in this collaboration has three free-form SQL policies:
-- * Aggregation policy (hashed_email): minimum group size of 5.
-- * Join policy (hashed_email): joins must include this column. Currently inactive.
-- * Masking policy (income_bracket): values are replaced with '***MASKED***'.
--
-- For more information, see:
-- docs.snowflake.com/user-guide/cleanrooms/free-form-sql.rst
-- docs.snowflake.com/user-guide/cleanrooms/spec-reference
-- ============================================================================
-- ============================================================================
-- Join the collaboration
-- ============================================================================
USE ROLE SAMOOHA_APP_ROLE;
USE WAREHOUSE APP_WH;
-- You can't use secondary roles with most collaboration procedures.
USE SECONDARY ROLES NONE;
-- View available collaborations. Look for the collaboration created by the data provider.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.VIEW_COLLABORATIONS();
-- Use the SOURCE_NAME column value from the response to VIEW_COLLABORATIONS().
SET collaboration_name = 'freeform_sql_demo';
-- Use the OWNER_ACCOUNT column value from the response to VIEW_COLLABORATIONS().
SET collaborator_data_sharing_id = '<provider_data_sharing_id>';
-- Review the collaboration spec before joining.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.REVIEW($collaboration_name, $collaborator_data_sharing_id);
-- Join the collaboration. Joining is asynchronous; call GET_STATUS until JOINED.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.JOIN($collaboration_name);
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.GET_STATUS($collaboration_name);
-- ============================================================================
-- View available templates and data offerings
-- ============================================================================
-- View data offerings shared with you in this collaboration.
-- Set a variable to use in future queries.
-- Note that the view name used by templates != the view name used for free-form SQL queries.
-- Templates use the TEMPLATE_VIEW_NAME value.
-- Free-form queries use the FREEFORM_SQL_VIEW_NAME value.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.VIEW_DATA_OFFERINGS($collaboration_name);
SET template_view_name = '<template_view_name>';
SET freeform_view_name = '<freeform_view_name>';
-- View templates available to you in this collaboration.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.VIEW_TEMPLATES($collaboration_name);
-- ============================================================================
-- Run an analysis using the provider's template
-- ============================================================================
-- Replace the placeholders with the template name/version from VIEW_TEMPLATES
-- and the view name from VIEW_DATA_OFFERINGS.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.RUN(
$collaboration_name,
$$
api_version: 2.0.0
spec_type: analysis
description: Count customers grouped by status.
template: '<status_summary_template_name_and_version>'
template_configuration:
view_mappings:
source_tables:
- '<template_view_name>'
$$
);
-- ============================================================================
-- Free-form SQL queries: Queries that SUCCEED
-- ============================================================================
-- The following queries run directly against the data-offering view.
-- Query 1: Count customers grouped by status.
-- Succeeds because the aggregation produces groups larger than 5.
SELECT status, COUNT(*) AS customer_count
FROM IDENTIFIER( $freeform_view_name ) AS t
GROUP BY status
ORDER BY customer_count DESC;
-- Query 2: Count customers grouped by age_band.
-- Succeeds because the aggregation produces groups larger than 5.
SELECT age_band, COUNT(*) AS customer_count
FROM IDENTIFIER( $freeform_view_name ) AS t
GROUP BY age_band
ORDER BY age_band;
-- Query 3: Select income_bracket to demonstrate the masking policy.
-- The query succeeds, but income_bracket values are replaced with '***MASKED***'
-- because the masking policy is applied to this column.
SELECT income_bracket, COUNT(*) AS customer_count
FROM IDENTIFIER( $freeform_view_name ) AS t
GROUP BY income_bracket;
-- Query 4: Combine masked and unmasked columns.
-- income_bracket is masked; status and age_band are not.
SELECT status, age_band, income_bracket, COUNT(*) AS customer_count
FROM IDENTIFIER( $freeform_view_name ) AS t
GROUP BY status, age_band, income_bracket
ORDER BY customer_count DESC;
-- Query 5: Group by a high-cardinality column.
-- Succeeds, but shows no values for hashed_email_b64_encoded because
-- grouping by hashed_email_b64_encoded produces groups of 1.
SELECT hashed_email_b64_encoded, COUNT(*) AS row_count
FROM IDENTIFIER( $freeform_view_name ) AS t
GROUP BY hashed_email_b64_encoded;
-- ============================================================================
-- Free-form SQL queries: Queries that FAIL
-- ============================================================================
-- Query 6: Select individual rows without aggregation.
-- FAILS because the aggregation policy requires a minimum group size of 5.
SELECT hashed_email_b64_encoded, status, age_band
FROM IDENTIFIER( $freeform_view_name ) AS t
LIMIT 10;
-- Query 8: Select a column not listed in the data offering.
-- FAILS because region_code is not included in schema_and_template_policies,
-- so it is not exposed in the data-offering view, although it is present in the source data.
SELECT region_code, COUNT(*) AS customer_count
FROM IDENTIFIER( $freeform_view_name ) AS t
GROUP BY region_code;
-- SWITCH TO provider account, update the JOIN policy, and re-run the successful
-- queries, which will now fail.