Upload and use custom functions in Collaboration Clean Rooms¶
Introduction¶
Any collaborator can upload custom Python UDFs and UDTFs into a collaboration. Templates in the collaboration can run these functions to perform complex data actions. Common usage includes machine learning or customized data manipulation within a query. Your uploaded code can import and use packages from an approved bundle of Python packages (https://repo.anaconda.com/pkgs/snowflake/) and the Snowpark API packages.
Note
Python is the only coding language supported for custom UDFs.
The following sections show you how to upload and use a custom function.
Define and use custom code bundles¶
Here is how to upload and use a custom function:
The code submitter:
Creates and registers the code by calling REGISTER_CODE_SPEC.
The code can be inline in the spec, or linked from a stage.
Creates a template that references the code bundle spec by ID in the template’s
code_specsarray. Add this field as a peer of the template and parameters fields as shown in this example:parameters: - name: <parameter_name> description: <parameter_description> required: <true_or_false> default: <default_value> type: <data_type> code_specs: # Optional: List of code bundles used by this template - <code_spec_id> # One or more code spec IDs. template: | <template_content>
Registers the template and then links the template into the collaboration.
Note
Snowflake scans uploaded code for security issues. If a security issue is found, the code, and the containing template, will not be added to the collaboration.
The analysis runner:
Runs the template in the standard way by calling
RUN.
Create and register the code bundle spec¶
The first step in uploading custom code is to create and register the code bundle spec.
Custom functions are defined in a YAML code bundle spec. Each code bundle exposes one or more functions that can be called by a template. The code bundle spec can either include the code in the spec inline, or link to code that lives on a Snowflake stage.
A collaborator registers a spec by calling REGISTRY.REGISTER_CODE_SPEC, which returns the bundle ID. Any collaborator with any role can register and link a code bundle.
After the code bundle is linked into the collaboration, that code bundle is visible to anyone in the collaboration who can access a template that links the code bundle. Call VIEW_CODE_SPECS to list accessible code bundles in a collaboration.
Anyone who can see a code bundle in a collaboration can see and use it in their own templates in that collaboration. Any inline code can be viewed by any member of the collaboration, but staged artifact code can’t be viewed by collaborators.
The following code bundle spec that exposes a single Python UDF called normalize_value, which calls the normalize function defined in that spec:
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.REGISTRY.REGISTER_CODE_SPEC(
$$
api_version: 2.0.0
spec_type: code_spec
name: custom_udf
version: v1
functions:
- name: normalize_value
type: UDF
language: PYTHON
handler: normalize
arguments:
- name: value
type: FLOAT
returns: FLOAT
code_body: |
def normalize(value):
return value / 100.0
$$
);
Create and register the calling template¶
After the code spec is registered, the collaborator then registers a template that uses this code bundle. To use a code bundle, add the bundle spec ID in the template’s code_specs field.
A template calls a custom function using the syntax cleanroom.spec_name$function_name. Note the literal . and $ name scoping marks.
Note
Use the spec name, not the spec ID, to reference a function in your template.
In the following example, a template uses function normalize_value from the code bundle custom_udf:
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.REGISTRY.REGISTER_TEMPLATE(
$$
api_version: 2.0.0
spec_type: template
name: normalization_template
version: v1
type: sql_analysis
code_specs:
- custom_udf_v1 -- Imports the code bundle.
template: |
SELECT cleanroom.custom_udf$normalize_value(metric, 0, 100) -- Calls the UDF.
AS normalized
FROM {{ source_tables[0] }}
$$
);
Add the template to a collaboration¶
Add the template that calls your function to the collaboration in the standard way. For more information, see Templates.
Snowflake validates and uploads to the collaboration when the calling template is added to a collaboration. Snowflake scans uploaded code for security issues before installing the code.
The following example shows a request to add a template to an existing collaboration:
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.ADD_TEMPLATE_REQUEST(
'my_collaboration',
'normalization_template_v1',
['consumer']
);
Submit a new version of your code bundle¶
Every registered code spec must have a unique name + version across all registries in your account. A template loads a specific name and version of a code spec. If you want to create or consume a new version of your code, you must submit a new version of the template that references the new code version in the code_specs field. You don’t need to change the template body. For example:
Step 1: Consume version 1 of the code bundle:
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.REGISTRY.REGISTER_TEMPLATE(
$$
api_version: 2.0.0
spec_type: template
name: normalization_template
version: v1
type: sql_analysis
code_specs:
- custom_udf_v1 -- Bundle ID includes the version number.
template: |
SELECT cleanroom.custom_udf$normalize_value(metric, 0, 100) -- Calls the UDF.
AS normalized
FROM {{ source_tables[0] }}
$$
);
Step 2: Update and register the new version of your code bundle, and then update your template to use the new version:
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.REGISTRY.REGISTER_TEMPLATE(
$$
api_version: 2.0.0
spec_type: template
name: normalization_template
version: v2 -- Update the template version.
type: sql_analysis
code_specs:
- custom_udf_v2 -- Use the new code bundle.
template: |
SELECT cleanroom.custom_udf$normalize_value(metric, 0, 100) -- No change needed here.
AS normalized
FROM {{ source_tables[0] }}
$$
);
Notice that function names don’t include the version, so you don’t need to change the calling code in the template body when you upload a new version of a function.
Code bundle specification¶
This specification defines a bundle of one or more code functions or procedures that can be called by a template.
A code bundle spec has a maximum of 5 functions plus procedures.
Identifiers in the code bundle spec have the following general requirements:
Names: Must be valid Snowflake identifiers that start with a letter and contain only alphanumeric characters and underscores.
Quoted identifiers: Double-quoted identifiers are supported for names with special characters.
Case sensitivity: Unquoted identifiers are case-insensitive; quoted identifiers preserve case.
api_version: 2.0.0 # Required: Must be "2.0.0"
spec_type: code_spec # Required: Must be "code_spec"
name: <identifier> # Required: Unique name of this code bundle.
version: <version_id> # Required: Alphanumeric with underscores (max 20 chars)
description: <description_text> # Optional: Description (max 1,000 chars)
artifacts: # Optional: Staged files for import
- alias: <identifier> # One or more artifact items...
stage_path: <stage_path> # Required: Full stage path. See below for additional requirements.
description: <description_text> # Optional: Description (max 500 chars)
content_hash: <sha256_hash> # Optional: SHA-256 hash for integrity verification
functions: # Required if no procedures defined
- name: <identifier> # One or more functions...
type: UDF | UDTF # Required: Function type
language: PYTHON # Required: Currently only PYTHON supported
runtime_version: <python_version> # Optional: Python runtime (3.10 - 3.14)
handler: <handler> # Required: Handler function
arguments: # Optional: One or more function arguments
- name: <arg_name> # Argument name
type: <sql_type> # Snowflake SQL type of this argument
returns: <sql_type> # Required: Snowflake return type
packages: # Optional: Package dependencies
- <package_name> # One or more package items...
imports: # Optional: Artifact aliases to import
- <artifact_alias> # One or more import items...
code_body: | # Optional: Inline Python code (max 12 MB)
<inline_python_code>
description: <description_text> # Optional: Description of this function.
procedures: # Required if no functions defined
- name: <identifier> # One or more procedure items...
language: PYTHON # Required: Currently only PYTHON supported
runtime_version: <python_version> # Optional: Python runtime version
handler: <handler> # Required: Handler function
arguments: # Optional: One or more procedure arguments
- name: <arg_name> # Argument name
type: <sql_type> # Snowflake SQL type of this argument
returns: <sql_type> # Optional: Return type
packages: # Optional: Package dependencies
- <package_name> # One or more package items...
imports: # Optional: Artifact aliases to import
- <artifact_alias> # One or more import items...
code_body: | # Optional: Inline Python code
# inline python_code ...
description: <description_text> # Optional: Description of this procedure.
api_versionThe version of the Collaboration API used. Must be
2.0.0.spec_typeSpecification type identifier. Must be
code_spec.name: identifierA unique name for this code bundle spec within this registry. Must be a valid Snowflake identifier with a maximum of 75 characters. This is used as the last name segment when calling the function in a template:
cleanroom.code_spec_name$function_nameversion: version_idCustom version identifier. Must be alphanumeric with underscores, maximum 20 characters.
description: description_text(Optional)A description of the code bundle spec (maximum 1,000 characters).
artifacts(Optional)A list of staged files or packages that can be imported by your functions or procedures, and optionally exposed via handler functions. Maximum of 5 per spec.
alias: identifierAn alias for referencing this artifact in imports. When referencing this alias within this spec, use the bare alias name rather than
cleanroom.spec_name$alias; that is, use the bare function name to reference another function in this spec.stage_path: stage_pathFull stage path to the artifact file; For example,
@DB.SCHEMA.STAGE/path/file.whl.
The stage must be internal. External stages are not supported.
The stage must have DIRECTORY enabled: The stage containing artifacts must have
DIRECTORY = TRUEset.Stage path format: Must follow
@[DB.]SCHEMA.STAGE/path/to/file.extformat.No path traversal: Stage paths can’t contain
..or\.This artifact must exist: The file must exist at the specified stage path when the code bundle is registered.
The stage must have SNOWFLAKE_SSE server-side encryption enabled. When creating or altering the stage, set
ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE').If you push, delete, or update a staged code file, you must call
ALTER STAGE stage name REFRESHto ensure that the collaboration has the latest information from the stage. Code updates are supported only before you register the code spec, as this is when the version is assigned and the hash checksum calculated.
description: description_text(Optional)A description of the artifact (maximum 500 characters).
content_hash: sha256_hash(Optional)SHA-256 hash for integrity verification (64 hex characters).
functions(Required if no procedures are defined)A list of UDF or UDTF definitions.
name identifierThe function name to expose to the calling template. Must be a valid Snowflake identifier.
typeThe function type. One of
UDForUDTF.languageThe function language. Currently only
PYTHONis supported.runtime_version: python_version(Optional)Python runtime version to use. Supported versions:
3.10to3.14.handler: handlerThe name of the handler function in the function code to call when
nameis called.arguments(Optional)Function arguments as a list of name-type pairs. Types must be valid Snowflake SQL types.
returns: sql_typeThe return type. For UDFs, use a SQL type such as
STRINGorFLOAT. For UDTFs, useTABLE(column_definitions).packages(Optional)A list of packages used by this code. This can be any of these Anaconda Python packages (https://repo.anaconda.com/pkgs/snowflake/) or these Snowpark API packages. For example:
snowflake-snowpark-python,numpy.imports(Optional)A list of artifacts to import. These must be aliases from the artifacts list in this spec.
code_body(Optional)Inline Python code. Mutually exclusive with staged imports. Maximum size is 12 MB.
description: description_text(Optional)A description of the function (maximum 500 characters).
procedures(Required if no functions defined)A list of stored procedure definitions. Fields are similar to
functions, except there is notypefield.
API reference¶
The following procedures are used to manage custom code bundles in a collaboration:
REGISTER_CODE_SPEC¶
- Schema:
REGISTRY
Registers a code bundle. This stores the code in the clean rooms environment in the REGISTRY.CODE_SPECS table. After a code spec is registered, it can be used by a template.
Every code spec registered must have a unique name + version across all registries in your account.
Syntax¶
REGISTER_CODE_SPEC( ['<registry_name>' ,] <code_spec> )
Arguments¶
registry_name(Optional)Name of a custom registry in which to register this code spec. If not specified, registers the code bundle in the default account registry.
code_specCode bundle spec definition in YAML format, as a string.
Returns¶
The generated code bundle spec ID.
Examples¶
Register a code bundle in the default registry:
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.REGISTRY.REGISTER_CODE_SPEC(
$$
api_version: 2.0.0
spec_type: code_spec
name: custom_udf
version: v1
description: Custom UDF for data normalization
functions:
- name: normalize_value
type: UDF
language: PYTHON
runtime_version: "3.10"
handler: normalize
arguments:
- name: value
type: FLOAT
- name: min_val
type: FLOAT
- name: max_val
type: FLOAT
returns: FLOAT
code_body: |
def normalize(value, min_val, max_val):
if max_val == min_val:
return 0.0
return (value - min_val) / (max_val - min_val)
$$
);
Register a code bundle in a custom registry:
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.REGISTRY.REGISTER_CODE_SPEC(
'my_custom_registry',
$$
api_version: 2.0.0
spec_type: code_spec
name: custom_udf
version: v1
description: Custom UDF for data normalization
functions:
- name: normalize_value
type: UDF
language: PYTHON
runtime_version: "3.10"
handler: normalize
arguments:
- name: value
type: FLOAT
- name: min_val
type: FLOAT
- name: max_val
type: FLOAT
returns: FLOAT
code_body: |
def normalize(value, min_val, max_val):
if max_val == min_val:
return 0.0
return (value - min_val) / (max_val - min_val)
$$
);
Access requirements¶
If you’re not using the SAMOOHA_APP_ROLE role, you must use a role that was granted any of the following privileges.
To register a code spec in the default registry:
GRANT_PRIVILEGE_ON_ACCOUNT_TO_ROLE('REGISTER CODE SPEC', 'role name')
To register items in a custom registry:
You have read and write privileges on any custom registry that you created yourself.
To access a custom registry created by another user, you need
GRANT_PRIVILEGE_ON_OBJECT_TO_ROLE('REGISTER', 'REGISTRY', 'MY_REGISTRY', 'role name').
VIEW_REGISTERED_CODE_SPECS¶
- Schema:
REGISTRY
Lists all code bundle specs registered by this role in the local account registry.
Syntax¶
VIEW_REGISTERED_CODE_SPECS( [ '<registry_name>' ] )
Arguments¶
registry_name(Optional)Name of a custom registry to list code specs from. If not specified, lists code specs from the default account registry.
Returns¶
A table that lists the details of all code bundles that you have registered in this account. The table includes the following columns:
CODE_SPEC_ID: ID of the code bundle spec.NAME: Code bundle spec name.VERSION: Code bundle spec version.CODE_SPEC: Full YAML specification of the code bundle spec.
Example¶
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.REGISTRY.VIEW_REGISTERED_CODE_SPECS();
Access requirements¶
If you’re not using the SAMOOHA_APP_ROLE role, you must use a role that was granted any of the following privileges.
To see items in the default registry:
GRANT_PRIVILEGE_ON_ACCOUNT_TO_ROLE('VIEW REGISTERED CODE SPECS', 'role name')GRANT_PRIVILEGE_ON_ACCOUNT_TO_ROLE('REVIEW COLLABORATION', 'role name')GRANT_PRIVILEGE_ON_ACCOUNT_TO_ROLE('CREATE COLLABORATION', 'role name')
To see items in a specific registry:
You have read and write privileges on any custom registry that you created yourself.
To access a custom registry created by another user, you need
GRANT_PRIVILEGE_ON_OBJECT_TO_ROLE('READ', 'REGISTRY', 'MY_REGISTRY', 'role name').
VIEW_CODE_SPECS¶
- Schema:
COLLABORATION
Returns all code bundle specs that are referenced by any template that you created or can run in the specified collaboration.
Syntax¶
VIEW_CODE_SPECS( <collaboration_name> )
Arguments¶
collaboration_nameID of the collaboration.
Returns¶
A table that lists the code bundles available in the specified collaboration. The table includes the following columns:
CODE_SPEC_ID: ID of this code bundle spec.CODE_SPEC: Full YAML specification of the code bundle spec.SHARED_BY: Collaborator alias that shared the code bundle spec.
Example¶
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.VIEW_CODE_SPECS(
$collaboration_id
);
Access requirements¶
If you’re not using the SAMOOHA_APP_ROLE role, you must use a role that was granted any of the following privileges:
GRANT_PRIVILEGE_ON_OBJECT_TO_ROLE('VIEW CODE SPECS', 'COLLABORATION', 'collaboration name', 'role name')GRANT_PRIVILEGE_ON_ACCOUNT_TO_ROLE('REVIEW COLLABORATION', 'role name')GRANT_PRIVILEGE_ON_ACCOUNT_TO_ROLE('CREATE COLLABORATION', 'role name')
Example specs¶
Inline UDF with code body¶
A simple UDF with inline Python code:
api_version: 2.0.0
spec_type: code_spec
name: string_utils
version: v1
description: String utility functions
functions:
- name: clean_string
type: UDF
language: PYTHON
runtime_version: "3.10"
handler: clean
arguments:
- name: input_str
type: STRING
returns: STRING
description: Removes leading/trailing whitespace and converts to lowercase
code_body: |
def clean(input_str):
if input_str is None:
return None
return input_str.strip().lower()
- name: extract_domain
type: UDF
language: PYTHON
runtime_version: "3.10"
handler: extract
arguments:
- name: email
type: STRING
returns: STRING
description: Extracts domain from email address
code_body: |
def extract(email):
if email is None or '@' not in email:
return None
return email.split('@')[1]
UDTF (User-Defined Table Function)¶
This example YAML defines a UDTF that returns multiple rows:
api_version: 2.0.0
spec_type: code_spec
name: tokenizer
version: v1
description: Text tokenization UDTF
functions:
- name: tokenize_text
type: UDTF
language: PYTHON
runtime_version: "3.10"
handler: Tokenizer
arguments:
- name: text
type: STRING
- name: delimiter
type: STRING
returns: TABLE(token STRING, position INTEGER)
description: Splits text into tokens and returns each with its position
code_body: |
class Tokenizer:
def process(self, text, delimiter):
if text is None:
return
tokens = text.split(delimiter if delimiter else ' ')
for i, token in enumerate(tokens):
yield (token.strip(), i)
Staged artifact with wheel package¶
Be sure to read the stage_path documentation requirements for linking to staged code in your code spec.
This example YAML uses a staged Python wheel package:
api_version: 2.0.0
spec_type: code_spec
name: ml_scoring
version: v2
description: ML scoring functions using custom library
artifacts:
- alias: ml_lib
stage_path: "@MY_DB.PUBLIC.CODE_STAGE/libs/ml_scoring_lib-1.0.0-py3-none-any.whl"
description: Custom ML scoring library
content_hash: "a1b2c3d4e5f6..."
functions:
- name: predict_score
type: UDF
language: PYTHON
runtime_version: "3.10"
handler: ml_scoring_lib.predictor.predict
arguments:
- name: features
type: ARRAY
returns: FLOAT
packages:
- numpy
- scikit-learn
imports:
- ml_lib
description: Predicts score using trained ML model
Stored procedure¶
This example YAML defines a stored procedure for data processing:
api_version: 2.0.0
spec_type: code_spec
name: data_processor
version: v1
description: Data processing procedures
procedures:
- name: aggregate_metrics
language: PYTHON
runtime_version: "3.10"
handler: process
arguments:
- name: table_name
type: STRING
- name: group_column
type: STRING
returns: STRING
packages:
- snowflake-snowpark-python
description: Aggregates metrics by specified column
code_body: |
def process(session, table_name, group_column):
df = session.table(table_name)
result = df.group_by(group_column).count()
result.write.mode("overwrite").save_as_table("aggregated_results")
return f"Aggregated {df.count()} rows into aggregated_results"
Multiple Python files as staged artifacts¶
Be sure to read the stage_path documentation requirements for linking to staged code in your code spec.
This example YAML uses multiple staged Python source files:
api_version: 2.0.0
spec_type: code_spec
name: analytics_suite
version: v3
description: Analytics suite with multiple modules
artifacts:
- alias: utils
stage_path: "@MY_DB.PUBLIC.CODE_STAGE/analytics/utils.py"
description: Utility functions
- alias: transformers
stage_path: "@MY_DB.PUBLIC.CODE_STAGE/analytics/transformers.py"
description: Data transformation functions
- alias: validators
stage_path: "@MY_DB.PUBLIC.CODE_STAGE/analytics/validators.py"
description: Validation functions
functions:
- name: transform_and_validate
type: UDF
language: PYTHON
runtime_version: "3.10"
handler: transformers.transform_validate
arguments:
- name: data
type: OBJECT
returns: OBJECT
imports:
- utils
- transformers
- validators
description: Transforms and validates input data
Troubleshooting code bundles¶
- Error:
CodeSpecAlreadyExistsException- Cause:
Code bundle spec with same name and version already registered.
- Solution:
Use a different version or update the existing version.
- Error:
SpecValidationError- Cause:
YAML doesn’t conform to schema.
- Solution:
Check required fields and format.
- Error:
CodeSpecStageNotAccessibleError- Cause:
Stage referenced in artifact isn’t accessible.
- Solution:
Grant access to stage or verify stage exists.
- Error:
CodeSpecArtifactNotFoundAtStageError- Cause:
File not found at specified stage path.
- Solution:
Upload file to stage before registering.
- Error:
StageDirectoryNotEnabledError- Cause:
Stage doesn’t have DIRECTORY enabled.
- Solution:
Enable directory on the stage:
ALTER STAGE ... SET DIRECTORY = (ENABLE = TRUE)
- Error:
CodeSpecNotFoundForOwnerException- Cause:
Template references unregistered code bundle spec.
- Solution:
Register code bundle spec before registering template.