Upload and run custom functions in a clean room

Overview

You can upload custom Python UDFs and UDTFs into your clean room and run them from your templates to perform complex data actions. These actions include machine learning or customized data manipulation within a query, as part of a single-step or multi-step flow. Python is the only coding language supported for custom UDFs.

Your uploaded code can import and use packages from an approved bundle of Python packages (https://repo.anaconda.com/pkgs/snowflake/) and the Snowpark API.

Both providers and consumers can upload custom Python code into a clean room, although the process is different for providers and consumers. Each bundle of uploaded code can define multiple functions that call each other, but a bundle exposes only one handler function. This handler function can be called by templates created or run by anyone who uses the clean room. If the code creates internal tables, these tables can be accessed as described in Designing multi-step flows.

You can’t view uploaded code, even your own code. Uploaded code can’t be deleted, but it can be updated.

This page shows you how to upload and run custom Python UDFs and UDTFs as a provider or a consumer.

Tip

For background information about how to develop your own Python UDFs in a clean room, see the following topics:

Updating custom functions

You can upload or overwrite an existing function that you uploaded, but you can’t delete an existing function.

Uploading a function with the exact same signature as one that you previously uploaded will overwrite the existing function. The signature is the case-insensitive function name of the external handler, and the data types of all parameters, in the same order. Parameter names don’t matter. You can’t overwrite a function uploaded by another account.

Because the signature must match when you update a function, you cannot change the signature of an existing function: if you upload the function foo(name VARIANT age INTEGER) and then upload the function foo(name VARIANT age FLOAT), the second function will be added to the clean room in addition to the first, because the argument types differ.

Provider-submitted code

Provider-submitted functions can be uploaded as inline code or from a Snowflake stage. Both techniques are covered here.

Your uploaded code can natively import and use packages from an approved set of Python packages (https://repo.anaconda.com/pkgs/snowflake/). If you need a non-default package, you must use Snowpark Container Services in a clean room to host your code.

Tip

After updating provider-written code, you should update the default release directive and then call provider.create_or_update_cleanroom_listing to propagate the changes to consumers. If you do not call provider.create_or_update_cleanroom_listing, your default version will not update for consumers that are currently using the clean room.

Here is a high level view of how a provider adds code to a clean room:

  1. The provider creates and configures the clean room in the normal way.

  2. The provider uploads code by calling provider.load_python_into_cleanroom. You can either upload your code inline directly within that procedure, or upload a code file to a stage, then provide the stage location to that procedure.

    Although your code can include multiple functions, only one handler is exposed for each upload. If you expose multiple functions to templates, upload each handler by calling provider.load_python_into_cleanroom.

  3. After each successful code upload, a new patch version of the clean room is generated. You must then increase the default version by calling provider.set_default_release_directive with the new patch number. If the clean room is exposed externally, security checks are run before installing your code, and you must call provider.view_cleanroom_scan_status to confirm that security checks have passed before incrementing the default version.

  4. You create and upload a custom template that calls your code. The template calls the handler function using the cleanroom scope, that is: cleanroom.my_function. For example, a template that calls a custom simple_add function that you uploaded might look like this:

    SELECT cleanroom.simple_add(1, 2), cleanroom.simple_add({{ price | sqlsafe | int }}, {{ tax | sqlsafe | int }})
    
    Copy
  5. The consumer runs your template the same way as any other template.

    Tip

    If the consumer encounters a mount error when they install a clean room with custom code, this can indicate a syntax error in the code.

You can find code examples demonstrating this flow in the provider-written code example section.

Important notes about versioning

Every time the provider uploads a function, it increases the patch number (and there is a limit of 99 patch numbers). Therefore, do your best to test and debug your code thoroughly before adding it to the clean room to reduce version updates during development.

If you do update a patch number, customers using the clean room UI might need to refresh the page to see the change. Customers using the API should see changes immediately, but there can be a delay, depending on the available resources. Learn more about clean room versioning.

Uploading provider-written inline functions

You can upload the code inline in the code parameter of provider.load_python_into_cleanroom. Here is an example of uploading a simple function inline:

CALL samooha_by_snowflake_local_db.provider.load_python_into_cleanroom(
$cleanroom_name,
'simple_add',                         -- Name used to call the UDF from a template.
['first INTEGER', 'second INTEGER'],  -- Arguments of the UDF, specified as '<variable_name> <SQL type>' pairs.
['numpy', 'pandas'],                  -- Packages imported by the UDF.
'INTEGER',                            -- SQL return type of UDF.
'add_two',                            -- Handler function in your code called when external name is called.
$$
import numpy as np   # Not used, but you can load supported packages.
import pandas as pd

def add_two(first, second):
    return first + second
$$
);
Copy

The calling template calls cleanroom.simple_add to call this function. The provider examples demonstrate how to upload inline code.

Uploading provider-written functions from a stage

You can upload Python files to a clean room stage and reference the stage when you call provider.load_python_into_cleanroom. Loading code from a stage allows you to develop the code in your local system in an editor, avoid copy/paste errors loading it inline, and also have better versioning control. Note that you can upload multiple files in one procedure call, but only one handler function is exposed for each upload.

Code is loaded from a stage into the clean room when you call load_python_into_cleanroom; later changes to the code on the stage are not propagated to the clean room.

To upload your UDF to a stage:

  1. Create your .py file and make it available in a location where you can upload it to a Snowsight stage.

  2. To get the name of the stage for your clean room, call provider.get_stage_for_python_files($cleanroom_name). This stage is accessible by the clean room – you cannot use an arbitrary stage that you create.

  3. Upload the .py file to the stage for your clean room. There are several ways to do this, including using the CLI, Snowsight, or language-specific drivers.

  4. Call provider.load_python_into_cleanroom with the stage location, handler, external name, arguments, and return type. Templates in your clean room can now call the function.

The following example code shows how to load code into a clean room from a stage.

-- Save the following code as reverser.py:
--import numpy as np
--def main(some_string):
--  '''Return the reverse of a string plus a random number 1-10'''
--  return some_string[::-1] + str(np.random.randint(1,10))

-- Get the stage for your clean room.
CALL samooha_by_snowflake_local_db.provider.get_stage_for_python_files($cleanroom_name);

-- Save the file to the stage. Here is how to do it by using the Snowflake CLI
PUT file://~/reverser.py <STAGE_NAME> overwrite=True auto_compress=False;

-- Load the code from the stage into the clean room.
CALL samooha_by_snowflake_local_db.provider.load_python_into_cleanroom(
    $cleanroom_name,
    'reverse', -- Name used to call the function
    ['some_string  STRING'], -- Arguments and SQL types
    ['numpy'],               -- Any required packages
    ['/reverser.py'],        -- Relative path to file on stage
    'STRING',                -- Return type
    'reverser.main'          -- <FILE_NAME>.<FUNCTION_NAME>
);

-- Uploading code, even from a stage, increases the patch number.
CALL samooha_by_snowflake_local_db.provider.set_default_release_directive(
  $cleanroom_name, 'V1_0', <NEW_PATCH_NUMBER>);

-- Upload a template that calls the function.
CALL samooha_by_snowflake_local_db.provider.add_custom_sql_template(
    $cleanroom_name,
    $udf_template_name,
    $$
    SELECT
      p.status,
      cleanroom.reverse(p.status)
    FROM SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS AS p
    LIMIT 100;
    $$
);

-- Switch to the consumer account and run the template to see the results.
Copy

The provider examples demonstrate uploading code from a stage.

Provider-written code examples

The following examples demonstrate adding provider-written UDFs and UDTFs to a clean room.

Download the following examples and then upload them as worksheet files in your Snowflake account. You need separate accounts for the provider and consumer, each with the clean rooms API installed. Replace the information as noted in the sample files.

Consumer-submitted code

A consumer can submit UDF or UDTF code and call it from a custom template that they submit. Consumer-uploaded code is bundled into a single procedure along with a custom template, and uploaded in a single procedure call. The consumer code is tied directly to that template and cannot be called by other templates.

To upload code as a consumer, you should understand custom template syntax and how to submit a consumer-defined template.

Here is an overview of the steps to upload custom consumer code:

  1. The provider creates the clean room in the standard way and then invites the consumer.

  2. The consumer installs and configures the clean room in the standard way.

  3. The consumer prepares a template. The template calls the UDF or UDTF within the cleanroom namespace. For example, to call the consumer-defined calculate_tax function, a simple template might look like the following snippet:

    SELECT {{ cleanroom.calculate_tax(p.cost) }} AS Tax FROM my_db.my_sch.sales AS p;
    
    Copy
  4. The consumer prepares their Python code. We recommend using double quotation marks (“”) rather than single quotation marks (’) in your code to avoid extra escaping needed later. Your code can reference a bundle of selected Python libraries (https://repo.anaconda.com/pkgs/snowflake/).

  5. The consumer passes their Python code into consumer.generate_python_request_template. The procedure returns the Python code as a stored procedure, with a placeholder for the custom JinjaSQL template. There are several multi-line strings in the template that use $$ as multi-line delimiters.

  6. Replace the template placeholder in the output from generate_python_request_template with your JinjaSQL template.

  7. In the combined template, escape any single quotes like this: \'. This is because single quotes will be used as the outermost delimiter for the entire multi-line procedure string. For example:

      BEGIN
    
      CREATE OR REPLACE FUNCTION CLEANROOM.custom_compare(min_status STRING, max_status STRING, this_status STRING)
      RETURNS boolean
      LANGUAGE PYTHON
      RUNTIME_VERSION = 3.10
      PACKAGES = (\'numpy\')
    
      HANDLER = \'custom_compare\'
      AS $$
      import numpy as np
    
      def custom_compare(min_status:str, max_status:str, this_status:str):
        statuses = [\'MEMBER\', \'SILVER\', \'GOLD\', \'PLATINUM\']
        return ((statuses.index(this_status) >= statuses.index(min_status)) &
                (statuses.index(this_status) <= statuses.index(max_status)))
      $$;
    
      -- Custom template
      LET SQL_TEXT varchar := $$
      SELECT
        c.status,
        c.hashed_email
      FROM IDENTIFIER( {{ my_table[0] }} ) as c
      WHERE cleanroom.custom_compare({{ min_status }}, {{ max_status }}, c.status);
      $$;
    
      LET RES resultset := (EXECUTE IMMEDIATE :SQL_TEXT);
      RETURN TABLE(RES);
    
      END;
    
    Copy
  8. Call consumer.create_template_request with your combined template. Use single quotation marks ‘’ instead of double dollar sign delimiters $$…$$ around the code you provide for stored procedure in the template_definition argument. For example:

    CALL samooha_by_snowflake_local_db.consumer.create_template_request(
      $cleanroom_name,
      $template_name,
      '
    BEGIN
    
    -- First, define the Python UDF.
    CREATE OR REPLACE FUNCTION CLEANROOM.custom_compare(min_status STRING, max_status STRING, this_status STRING)
    RETURNS boolean
    LANGUAGE PYTHON
    RUNTIME_VERSION = 3.10
    PACKAGES = (\'numpy\')
    
    HANDLER = \'custom_compare\'
    AS $$
    import numpy as np
    
    def custom_compare(min_status:str, max_status:str, this_status:str):
      statuses = [\'MEMBER\', \'SILVER\', \'GOLD\', \'PLATINUM\']
      return ((statuses.index(this_status) >= statuses.index(min_status)) &
              (statuses.index(this_status) <= statuses.index(max_status)))
        $$;
    
    -- Then define and execute the SQL query.
    LET SQL_TEXT varchar := $$
    SELECT
      c.status,
      c.hashed_email
    FROM IDENTIFIER( {{ my_table[0] }} ) as c
    WHERE cleanroom.custom_compare({{ min_status }}, {{ max_status }}, c.status);
    $$;
    
    -- Execute the query and then return the result.
    LET RES resultset := (EXECUTE IMMEDIATE :SQL_TEXT);
    RETURN TABLE(RES);
    
    END;
    ');
    
    Copy
  9. The consumer and provider continue with the standard consumer-defined template flow:

    1. The provider views (provider.list_pending_template_requests) and then approves the template request (approve_template_request)

    2. The consumer checks the request status (consumer.list_template_requests), and when the status is APPROVED, runs the template (consumer.run_analysis).

Consumer-written code examples

The following examples demonstrate adding provider-written UDFs to a clean room.

Download the following examples and then upload them as worksheet files in your Snowflake account. You need separate accounts for the provider and consumer, each with the clean rooms API installed. Replace the information as noted in the sample files:

Language: English