代码捆绑包
Any collaborator can bundle custom Python Procedures, UDFs or UDTFs with collaboration templates. Templates in turn reference the bundled code to perform complex data actions in the collaboration. Common usage includes machine learning or customized data manipulation within a query. Your uploaded code can import and use packages from an approved bundle of Python packages (https://repo.anaconda.com/pkgs/snowflake/) and the Snowpark API.
自定义代码只能通过模板调用,不能直接调用。
Note
代码捆绑包仅支持 Python 编程语言。
以下部分向您展示如何上传和使用代码捆绑包。
实施自定义代码捆绑包
以下是上传和使用代码捆绑包的方法:
代码提交者:
-
Creates and registers the code by calling REGISTER_CODE_SPEC.
The code can be inline in the spec, or linked from a stage.
-
Creates a template that references the code bundle spec by ID in the template’s
code_specsarray. Add this field as a peer of the template and parameters fields as shown in this example: -
Registers the template and then links the template into the collaboration.
分析运行器:
- Runs the template in the standard way by calling
RUN.
Important
在将任何上传的捆绑包部署到 Clean Room 之前,Snowflake 会对其执行安全检查。如果安全检查失败,该模板及其捆绑代码将不会被部署,也无法使用。
要确认带有代码捆绑包的模板已部署并可供使用,请执行以下步骤:
查找您尝试部署代码捆绑包的洁净室应用程序的名称:
Check the
upgrade_statevalue in the DESCRIBE APPLICATION response. When the upgrade state is COMPLETE, the security checks have passed and the new template and bundle are available to use. Pass in the application name returned by the command in the previous step using SQL like the following example: SQL code:
创建并注册代码包规范
上传自定义代码的第一步是创建并注册代码包规范。
Custom functions are defined in a YAML code bundle spec. Each code bundle exposes one or more functions that can be called by a template. The code bundle spec can either include the code in the spec inline, or link to code that lives on a Snowflake stage.
A collaborator registers a spec by calling REGISTRY.REGISTER_CODE_SPEC, which returns the bundle ID.
After the template that references the code bundle is linked into the collaboration, that code bundle is visible to anyone in the collaboration who can access a template that links the code bundle. Call VIEW_CODE_SPECS to list accessible code bundles in a collaboration.
Anyone who can see a code bundle in a collaboration can see and use it in their own templates in that collaboration. Any inline code can be viewed by any member of the collaboration, but staged artifact code can not be viewed by collaborators. Collaborators need to ensure that the content_hash of the referenced artifacts match for code integrity verification.
The following code bundle spec that exposes a single Python UDF called normalize_value, which calls the normalize function defined in that spec:
创建并注册调用模板
After the code spec is registered, the collaborator then registers a template that uses this code bundle. To use a code bundle, add the bundle spec ID in the template’s code_specs field. Adding this template into the collaboration will also cause the code bundled to be available in the collaboration.
A template calls a custom function using the syntax cleanroom.spec_name$function_name. Note the literal . and $ name scoping marks.
Note
请使用规范名称(而不是规范 ID)来引用模板中的函数。这样您就可以快速更新代码捆绑包的版本,而无需更改模板中对该代码捆绑包的所有引用。
In the following example, a template uses function normalize_value from the code bundle custom_udf:
将模板添加到协作
Add the template that calls your function to the collaboration in the standard way. For more information, see Templates.
当调用模板被添加到协作时,Snowflake 会进行验证并上传到协作中。以下示例展示了向现有协作添加模板的请求:
Note
使用代码捆绑包安装模板会触发 Snowflake 安全检查,并发布底层 Clean Room 的新补丁。在此过程完成且补丁安装成功之前,该模板将不可用、也无法使用。
要查看补丁安装进度,请执行以下步骤:
-
Find the name of the clean room application. Typically, this will be
SFDCR_<clean room name>, but you can search to be sure: -
Check the status of the patch install. Wait for
upgrade_stateis COMPLETE in the following query:
代码版本管理
在账户的所有注册表中,每个注册的代码规范必须具有唯一的名称 + 版本。模板会加载特定名称和版本的代码规范。如果要创建或使用新版本的代码,则必须提交新版本的模板,并在 code_specs 字段中引用新的代码版本。您不需要更改模板正文。例如:
第 1 步: 使用代码包的版本 1:
第 2 步: 更新并注册代码包的新版本,然后更新模板以使用新版本:
请注意,函数名称不包含版本,因此在上传函数的新版本时,无需更改模板正文中的调用代码。
示例规范
带有代码正文的内联 UDF¶
一个带有内联 Python 代码的简单 UDF:
UDTF(用户定义的表函数)¶
此示例 YAML 定义了一个返回多行的 UDTF:
带有 wheel 包的暂存工件¶
Be sure to read the stage_path documentation requirements for linking to staged code in your code spec.
此示例 YAML 使用暂存的 Python wheel 包:
存储过程
此示例 YAML 定义了用于数据处理的存储过程:
作为暂存工件的多个 Python 文件¶
Be sure to read the stage_path documentation requirements for linking to staged code in your code spec.
此示例 YAML 使用多个暂存的 Python 源文件: