在 Python 中为 DataFrames 创建用户定义的表函数 (UDTFs)¶
Snowpark API 提供了一些方法,您可以使用这些方法通过用 Python 编写的处理程序创建用户定义的表函数。本主题说明如何创建这些类型的函数。
简介
您可以使用 Snowpark API 创建用户定义的表函数 (UDTF)。
You do this in a way similar to creating a scalar user-defined function (UDF) with the API, as described in Creating User-Defined Functions (UDFs) for DataFrames in Python. Key differences include UDF handler requirements and parameter values required when registering the UDTF.
要在 Snowpark 中创建和注册 UDTF,必须执行以下操作:
-
实施 UDTF 处理程序。
The handler contains the UDTF’s logic. A UDTF handler must implement functions that Snowflake will invoke at runtime when the UDTF is called. For more information, see 实施 UDTF 处理程序.
-
在 Snowflake 数据库中注册 UDTF 及其处理程序。
You can use the Snowpark API to register the UDTF and its handler. Once you’ve registered the UDTF, you can call it from SQL or by using the Snowpark API. For more information about registering, see 注册 UDTF.
For information on calling a UDTF, see Calling User-Defined Table Functions (UDTFs).
实施 UDTF 处理程序¶
As described in detail in Writing a UDTF in Python, a UDTF handler class must implement methods that Snowflake invokes when the UDTF is called. You can use the class you write as a handler whether you’re registering the UDTF with the Snowpark API or creating it with SQL using the CREATE FUNCTION statement.
处理程序类的方法旨在处理 UDTF 接收的行和分区。
UDTF 处理程序类实施了以下方法,Snowflake 在运行时会调用这些方法:
-
An
__init__method. Optional. Invoked to initialize stateful processing of input partitions. -
A
processmethod. Required. Invoked for each input row. The method returns a tabular value as tuples. -
An
end_partitionmethod. Optional. Invoked to finalize processing of input partitions.While Snowflake supports large partitions with timeouts tuned to process them successfully, especially large partitions can cause processing to time out (such as when
end_partitiontakes too long to complete). Please contact Snowflake Support if you need the timeout threshold adjusted for specific usage scenarios.
For handler details and examples, see Writing a UDTF in Python.
注册 UDTF¶
实施 UDTF 处理程序后,可以使用 Snowpark API 在 Snowflake 数据库上注册 UDTF。注册 UDTF 将创建 UDTF,以便可以调用它。
You can register the UDTF as a named or anonymous function, as you can for a scalar UDF. For related information about registering a scalar UDF, see Creating an Anonymous UDF and Creating and Registering a Named UDF.
When you register a UDTF, you specify parameter values that Snowflake needs to create the UDTF. (Many of these parameters correspond functionally to clauses of the CREATE FUNCTION statement in SQL. For more information, see CREATE FUNCTION.)
Most of these parameters are the same as those you specify when you create a scalar UDF (for more information, see Creating User-Defined Functions (UDFs) for DataFrames in Python). The primary differences are due to the fact that a UDTF returns a tabular value and the fact that its handler is a class, rather than a function. For a complete list of parameters, see the documentation for the APIs linked below.
To register a UDTF with Snowpark, you use one of the following, specifying parameter values required to create the UDTF in the database. For information that differentiates these options, see UDFRegistration, which describes similar options for registering a scalar UDF.
- Use the
registerorudtffunction, pointing to a runtime Python function. You can also use theudtffunction as a decorator on the handler class.
有关这些函数的参考,请参阅:
-
Use the
register_from_filefunction, pointing to a Python file or zip file containing Python source code.For the function reference, see snowflake.snowpark.udtf.UDTFRegistration.register_from_file.
定义 UDTF 的输入类型和输出架构¶
注册 UDTF 时,需要指定有关该函数的参数和输出值的详细信息。这样做是为了使函数本身声明的类型与该函数的底层处理程序的类型精确对应。
For examples, see 示例 in this topic and in the snowflake.snowpark.udtf.UDTFRegistration reference.
注册 UDTF 时需要为其指定以下内容:
-
Types of its input parameters as a value of the registering function’s
input_typesparameter. Theinput_typesparameter is optional if you provide type hints in theprocessmethod’s declaration.Specify this value as a list of types based on snowflake.snowpark.types.DataType. For example, you might specify
input_types=[StringType(), IntegerType()]. -
Schema of its tabular output as a value of the registering function’s
output_schemaparameter.The
output_schemavalue can be one of the following:-
UDTF 的返回值中列的名称列表。
The list will include column names only, so you must also provide type hints in the
processmethod’s declaration. -
A StructType that represents the output table’s column names and types.
Code in the following example assigns a schema as a value to an
outputvariable, then uses the variable when registering the UDTF.
-
示例
The following is a brief list of examples. For more examples, see snowflake.snowpark.udtf.UDTFRegistration.
使用 udtf 函数注册 UDTF
注册该函数。
调用该函数。
使用 register 函数注册 UDTF
注册该函数。
调用该函数。
使用 register_from_file 函数注册 UDTF
注册该函数。
调用该函数。