Design custom templates

关于 Clean Room 模板

Clean room templates are written in JinjaSQL (https://github.com/sripathikrishnan/jinjasql). JinjaSQL is an extension to the Jinja templating language. A JinjaSQL template evaluates to a SQL statement when run in a clean room. The JinjaSQL templating language provides logic statements and run-time variable replacement, which enables the template to be customized at run time. For example, a user can provide table and column names when they run the template, and the template can adjust itself based on the values passed in.

模板一般有两种类型:

  • Analysis templates, which evaluate to a SQL DQL statement (a SELECT statement) that returns query results immediately to the template runner.

  • Activation templates, which are used to activate results to a Snowflake account, rather than showing results in the immediate environment. An activation template is very similar to an analysis template with a few extra requirements, and it evaluates to a DDL statement (CREATE TABLE).

Creating, sharing, and running a custom template

Any collaborator can register and share templates with specific analysis runners in a collaboration.

Let's start by looking at a simple SQL query, and how it would be written as a template.

1. The JinjaSQL template

Here is a simple SQL query that joins two tables by email and shows the overlap count per city:

SELECT COUNT(*), city FROM table_1
  INNER JOIN table_2
  ON table_1.hashed_email = table_2.hashed_email
  GROUP BY city;

Here is how that query would look as a JinjaSQL template that allows the caller to choose the JOIN and GROUP BY columns, as well as the tables used. The template includes some filters that enforce Snowflake Data Clean Room policies.

SELECT COUNT(*), IDENTIFIER({{ group_by_col | column_policy }})
  FROM IDENTIFIER({{ source_table[0] }}) AS p1
  INNER JOIN IDENTIFIER({{ source_table[1] }}) AS p2
  ON IDENTIFIER({{ p1_join_col | join_policy }}) = IDENTIFIER({{ p2_join_col | join_policy }})
  GROUP BY IDENTIFIER({{ group_by_col | column_policy }});

模板备注:

  • Values within {{ double bracket pairs }} are variables. The values are populated by the caller.

  • group_by_col, source_table, p1_join_col, and p2_join_col are all variables populated by the caller. These variables have arbitrary names chosen by the template designer.

  • source_table is a standard Snowflake-defined variable. This variable defines the views to use in the query. These views are datasets within data offerings that are linked into the clean room. Collaborators can list available datasets by calling VIEW_DATA_OFFERINGS.

  • A dataset must be aliased as lowercase p if you want to enforce Snowflake Data Clean Room policies on it. If a template uses multiple datasets, the first is p or p1, and additional datasets are indexed as p2, p3, and so on.

  • 所有列名和表名都需要 IDENTIFIER,因为 {{双括号}} 中的变量计算为字符串字面量,而其并非有效标识符。

  • JinjaSQL filters are applied to columns to enforce Snowflake Data Clean Room policies on the column. Snowflake implements custom filters join_policy and column_policy, which verify whether a column complies with join or column policies in the clean room respectively, and fail the query if it doesn't. A filter is applied to a column name as {{ column_name | filter_name }}.

所有这些要点将在后面详细讨论。

2. The Collaboration template

A template is added to a collaboration by embedding it in a YAML specification and registering it, then linking it.

CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.REGISTRY.REGISTER_TEMPLATE(
  $$
  api_version: 2.0.0
  spec_type: template
  name: my_test_template
  version: 2026_01_12_V1
  type: sql_analysis
  description: A test template
  methodology: Join on single column with a single group by value
  parameters:
  - name: source_tables
    description: Tables from both sides which can be listed in any order, aliased with p1 or p2
    required: true
  - name: p1_join_col
    description: Column to join on from first table specified under source_tables, aliased with p1
    required: true
  - name: p2_join_col
    description: Column to join on from second table specified under source_tables, , aliased with p2
    required: true
  - name: group_by_col
    description: Column which results should be grouped group aliased with respective table p1 or p2
    required: true

  template:
    SELECT COUNT(*), IDENTIFIER({{ group_by_col | column_policy }})
    FROM IDENTIFIER({{ source_table[0] }}) AS p1
    INNER JOIN IDENTIFIER({{ source_table[1] }}) AS p2
    ON IDENTIFIER({{ p1_join_col | join_policy }}) = IDENTIFIER({{ p2_join_col | join_policy }})
    GROUP BY IDENTIFIER({{ group_by_col | column_policy }});

$$);

You must request to share a template with a given analysis runner, who can accept or reject the request. Additionally, all data providers for that analysis runner must accept the request for the template to be shared.

-- Request to share template with only Collaborator3.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.ADD_TEMPLATE_REQUEST(
  $collaboration_name,
  $template_id,
  ['Collaborator3']
);

3. Running the template

Here is how an analysis runner might run this template in code. Note how column names are qualified by the table aliases declared in the template.

CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.RUN( $collaboration_name,
$$
api_version: 2.0.0
spec_type: analysis
name: example_run
description: Example run for template
template: $template_id

template_configuration:
  view_mappings:
    source_tables:
      - collaborator_1.data_offering_1.dataset_1
      - collaborator_2.data_offering_2.dataset_2
  arguments:
     p1_join_col: p1.hashed_email
     p2_join_col: p2.hashed_email
     group_by_col: p2.device_type

$$ );

开发自定义模板

Clean Room 模板就是 JinjaSQL 模板。要创建模板,您应该熟悉以下主题:

  • Jinja templating basics (https://jinja.palletsprojects.com/en/stable/)

  • The JinjaSQL extension to Jinja (https://github.com/sripathikrishnan/jinjasql).

You can use Cortex Code to validate the SQL output of your JinjaSQL templates based on variable inputs that should be provided. See example prompts below that you can copy into Cortex Code to get final SQL outputs you can test:

示例:

Resolve the following Jinja template into SQL based on the variables defined:

Jinja Template:
 SELECT IDENTIFIER({{ col1 | column_policy }}), IDENTIFIER({{ col2 | column_policy }})
  FROM IDENTIFIER({{ source_table[0] }}) AS p1
  JOIN IDENTIFIER({{ source_table[1] }}) AS p2
  ON  IDENTIFIER({{ p1_join_col | join_policy }}) = IDENTIFIER({{ p2_join_col | join_policy }})
  {% if where_phrase %} WHERE {{ where_phrase | sqlsafe }}{% endif %};

Variable Inputs:
source_table: SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS, SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS
col1: p1.status
col2: p1.age_band
p1_join_col: p1.hashed_email
p2_join_col: p2.hashed_email
where_phrase: p1.household_size > 2

The rendered template looks like this:

SELECT IDENTIFIER('p1.status'), IDENTIFIER('p1.age_band')
FROM IDENTIFIER('SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS') AS p1
JOIN IDENTIFIER('SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS') AS p2
ON  IDENTIFIER('p1.hashed_email') = IDENTIFIER('p2.hashed_email')
WHERE p1.household_size > 2;

Try running the SQL statement above in your environment to see if it works and gets the expected results.

Then test your template without a WHERE clause:

Resolve the following Jinja template into SQL based on the variables defined:

Jinja Template:
 SELECT IDENTIFIER({{ col1 | column_policy }}), IDENTIFIER({{ col2 | column_policy }})
  FROM IDENTIFIER({{ source_table[0] }}) AS p1
  JOIN IDENTIFIER({{ source_table[1] }}) AS p2
  ON  IDENTIFIER({{ p1_join_col | join_policy }}) = IDENTIFIER({{ p2_join_col | join_policy }})
  {% if where_phrase %} WHERE {{ where_phrase | sqlsafe }}{% endif %};

Variable Inputs:
source_table: SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS, SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS
col1: p1.status
col2: p1.age_band
p1_join_col: p1.hashed_email
p2_join_col: p2.hashed_email

Rendered template:

SELECT IDENTIFIER('p1.status'), IDENTIFIER('p1.age_band')
FROM IDENTIFIER('SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS') AS p1
JOIN IDENTIFIER('SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS') AS p2
ON  IDENTIFIER('p1.hashed_email') = IDENTIFIER('p2.hashed_email');

Add the template into your clean room, and test with an analysis run spec.

数据保护

Templates can access only datasets linked into the clean room by collaborators.

Collaborators specify join, column, and activation policies on their datasets to ensure that only those columns can be used as an input for a template variable.

重要

The template must include the appropriate JinjaSQL policy filter on a column for the policy to be applied.

自定义模板语法

Snowflake Data Clean Room 支持 V3 JinjaSQL,并提供如前所述的一些扩展。

本部分包括以下主题:

模板命名规则

When creating a template, names must contain only letters, numbers, or underscores. Template names are assigned in the template specification's name field when you register the template.

有效名称示例:

  • my_template

  • activation_template_1

无效名称示例:

  • my template - 不允许存在空格

  • my_template! - Special characters not allowed

模板变量

Template callers can pass in values to template variables. JinjaSQL syntax enables variable binding for any variable name within {{ double_brackets }}, but Snowflake reserves a few variable names that you shouldn't override, as described below.

小心

All variables, whether Snowflake-defined or custom, are populated by the user and should be treated with appropriate caution. Analysis templates must resolve to a single SELECT statement (activation templates resolve to a script block). Remember that all variables are passed in by the caller.

Snowflake 定义的变量

All clean room templates have access to the following global variables defined by Snowflake, but passed in by the analysis runner:

source_table:

A zero-based string array of tables and views from data offerings linked into the collaboration via LINK_DATA_OFFERING that can be used by the template.

Example: SELECT col1 FROM IDENTIFIER({{ source_table[0] }}) AS p;

my_table:

In a Collaboration clean room, my_table is used only by Snowflake Standard Edition users. For these users, my_table is a zero-based string array of datasets that the analysis runner linked by calling LINK_LOCAL_DATA_OFFERING.

Example: SELECT col1 FROM IDENTIFIER({{ my_table[0] }}) AS c;

自定义变量

Template creators can include arbitrary variables in a template that can be populated by the analysis runner. These variables can have any Jinja-compliant name except for the Snowflake-defined variables or table alias names. You should provide guidance in the parameter section of the template for required and optional variables.

自定义变量可由模板访问,如这里所示的自定义变量 max_income

SELECT income FROM my_db.my_sch.customers WHERE income < {{ max_income }};

Analysis runners pass variables when calling RUN as defined in the analysis run spec.

正确解析变量

传入模板的字符串值解析为最终模板中的字符串字面量。如果绑定变量处理不当,可能会导致 SQL 解析或逻辑错误:

  • SELECT {{ my_col }} FROM p; - This resolves to SELECT 'my_col' from p; which simply returns the string "my_col" - probably not what you want.

  • SELECT age FROM {{ source_table[0] }} AS p; - This resolves to SELECT age FROM 'somedb.somesch.source_table' AS p;, which causes a parsing error because a table must be an identifier, not a literal string.

  • SELECT age FROM IDENTIFIER({{ source_table[0] }}) AS p {{ where_clause }}; - Passing in "WHERE age < 50" evaluates to SELECT age FROM mytable AS p 'WHERE age < 50';, which is a parsing error because of the literal string WHERE clause.

因此,在适当情况下,您必须解析变量。下面介绍如何在模板中正确解析变量:

Resolving table and column names

指定表或列名称的变量必须通过以下两种方式之一转换为模板中的标识符:

  • IDENTIFIER: For example: SELECT IDENTIFIER({{ my_column }}) FROM p;

  • sqlsafe (https://github.com/sripathikrishnan/jinjasql?tab=readme-ov-file#sql-safe-strings): This JinjaSQL filter resolves identifier strings to SQL text. An equivalent statement to the previous bullet is SELECT {{ my_column | sqlsafe }} FROM p;

Your particular usage dictates when to use IDENTIFIER or sqlsafe. For example, p.{{ my_column | sqlsafe }} can't easily be rewritten using IDENTIFIER.

Resolving dynamic SQL

当有字符串变量用作字面量 SQL 时,例如 WHERE 子句,请使用模板中的 sqlsafe 筛选器。例如:

SELECT age FROM IDENTIFIER({{ source_table[0] }}) AS p WHERE {{ where_clause }};

If a user passes in "age < 50" to where_clause, the query would resolve to SELECT age FROM sometable AS p WHERE 'age < 50'; which is invalid SQL because of the literal string WHERE condition. In this case, you should use the sqlsafe filter:

SELECT age FROM IDENTIFIER( {{ source_table[0] }} ) as p {{ where_clause | sqlsafe }};

必填表别名

At the top level of your query, all source_table datasets must be aliased as p, and all my_table datasets must be aliased as c, in order for Snowflake to validate join and column policies correctly in the query. Any column that must be verified against join or column policies must be qualified with the lowercase p or c table alias.

If you use multiple source_table or my_table datasets in your query, add a numeric, sequential 1-based suffix to each table alias after the first. So: p or p1, p2, p3, and so on for the first, second, and third source_table datasets, and c or c1, c2, c3, and so on for the first, second, and third my_table datasets. The p or c index should be sequential without gaps (that is, create the aliases p1, p2, and p3, not p1, p2, and p4).

示例

SELECT p1.col1 FROM IDENTIFIER({{ source_table[0] }}) AS p1
UNION
SELECT p2.col1 FROM IDENTIFIER({{ source_table[1] }}) AS p2;

Custom clean room template filters

Snowflake 支持所有 标准 Jinja 筛选器 (https://jinja.palletsprojects.com/en/stable/templates/#builtin-filters) 和大多数标准 JinjaSQL 筛选器 (https://github.com/search?q=repo%3Asripathikrishnan%2Fjinjasql+self.env.filters+path%3Ajinjasql%2Fcore.py&type=code&path+jinjasql%2Fcore.py=),以及一些扩展:

join_policy:

Succeeds if the column is in the join policy of the data owner; fails otherwise. See 将数据保护策略应用于数据产品.

column_policy:

Succeeds if the column is in the column policy of the data owner; fails otherwise. See 将数据保护策略应用于数据产品.

activation_policy:

Succeeds if the column is in the activation policy of the data owner; fails otherwise. See 将数据保护策略应用于数据产品.

join_and_column_policy:

Succeeds if the column is in the join or column policy of the data owner; fails otherwise. See 将数据保护策略应用于数据产品.

identifier:

This JinjaSQL filter is not supported by Snowflake templates.

小技巧

JinjaSQL 语句是从左到右评估:

  • {{ my_col | column_policy }} 正确

  • {{ my_col | sqlsafe | column_policy }} 正确

  • {{ column_policy | my_col }} 不正确

  • {{ my_col | column_policy | sqlsafe }} Incorrect: column_policy will be checked against the my_col value as a string, which is an error.

强制实施 Clean Room 策略

Clean rooms don't automatically check clean room policies against columns used in a template. If you want to enforce a policy against a column:

  • You must apply the appropriate policy filter to that column in the template. For example:

FROM IDENTIFIER({{ source_table[0] }}) AS p1
JOIN IDENTIFIER({{ source_table[1] }}) AS p2
  ON IDENTIFIER({{ p1_join_col | join_policy }}) = IDENTIFIER({{ p2_join_col | join_policy }})

Policies are checked only against columns of tables referenced in a source_table variable, which refer to views shared within the clean room. Policies are not checked against columns of tables referenced in a my_table variable, which are local tables not shared within the clean room.

Note that column names can't be ambiguous when testing policies. So if you have columns with the same name in two tables, you must qualify the column name in order to test the policy against that column.

Access considerations and best practices

A template is always executed in context to the clean room application role. A collaborator does not have direct access to any data within the clean room that is restricted to template access only; all access is through the native application roles and the template outputs.

As best practice, you should follow the below for templates you create or use in a clean room:

  • Ensure a policy filter is applied any time a column variable is used in a template, so that collaborator policies are respected.

  • Wrap user-provided variables with IDENTIFIER() when possible to strengthen templates against SQL injection attacks.

激活模板

A template can also be used to save query results to a table outside of the clean room; this is called activation. An activation template is an analysis template with the following additional requirements:

  • 激活模板是计算结果为 SQL 脚本块的 JinjaSQL 语句,可以是简单的 SELECT 语句,这点与分析模板不同。

  • Activation templates must create an internal table in the clean room to store results. The table generated by the template must have the prefix cleanroom.activation_data_, for example: cleanroom.activation_data_my_results

  • All columns in the internal results table should have the value activation_allowed: TRUE in their data offering specification.

  • The script block should end with a RETURN statement that returns the name of the generated table without the cleanroom.activation_data_ prefix, for example: RETURN 'my_results'.

  • The template itself has no naming requirements.

Here is an example activation template specification:

api_version: 2.0.0
spec_type: template
name: my_activation_template
version: v0
type: sql_activation
description: Activation template that creates segment data
template: |
  BEGIN
      CREATE OR REPLACE TABLE cleanroom.activation_data_analysis_results AS
      SELECT
          {{ group_by_column | sqlsafe }} AS bucket_label,
          {{ activation_column | sqlsafe | activation_policy }} AS activation_label,
          COUNT(DISTINCT {{ join_column | sqlsafe }}) AS overlap_count
      FROM IDENTIFIER({{ source_table[0] }}) AS p
      GROUP BY {{ group_by_column | sqlsafe }},
               {{ activation_column | sqlsafe }};
      RETURN 'analysis_results';
  END;
parameters:
  - name: join_column
    description: Join column name
    required: true
    default: "p.IP_ADDRESS"
  - name: group_by_column
    description: Group by column name
    required: true
    default: "p.CAMPAIGN_NAME"
  - name: activation_column
    description: Activation column name
    required: true
    default: "p.DEVICE_TYPE"

Learn how to implement activation in a collaboration: Activating query results.

后续步骤

掌握模板系统后,请阅读使用模板类型实现 Clean Room 的详细说明:

  • Activation templates create a results table after a successful run and is shared outside of the clean room. Depending on the collaboration specification, the results table can be shared to the analysis runner or other collaborators.

  • Code bundles are used to upload custom Python UDFs and UDTFs into a collaboration. Templates in the collaboration can run these functions to perform complex data actions.

  • Internal tables are used to store intermediary or persistent results, which can be used downstream to support multistep workflows. These tables are accessible to templates or custom uploaded code inside the clean room.

更多信息

  • Jinja 文档 (https://jinja.palletsprojects.com/en/stable/)

  • JinjaSQL 文档 (https://github.com/sripathikrishnan/jinjasql)