snowflake.snowpark.functions.pandas_udf¶
- snowflake.snowpark.functions.pandas_udf(func: Optional[Callable] = None, *, return_type: Optional[DataType] = None, input_types: Optional[List[DataType]] = None, name: Optional[Union[str, Iterable[str]]] = None, is_permanent: bool = False, stage_location: Optional[str] = None, imports: Optional[List[Union[str, Tuple[str, str]]]] = None, packages: Optional[List[Union[str, module]]] = None, replace: bool = False, if_not_exists: bool = False, session: Optional[Session] = None, parallel: int = 4, max_batch_size: Optional[int] = None, statement_params: Optional[Dict[str, str]] = None, strict: bool = False, secure: bool = False, source_code_display: bool = True, external_access_integrations: Optional[List[str]] = None, secrets: Optional[Dict[str, str]] = None, immutable: bool = False, comment: Optional[str] = None, **kwargs) Union[UserDefinedFunction, partial] [source] (https://github.com/snowflakedb/snowpark-python/blob/v1.16.0/src/snowflake/snowpark/functions.py#L7738-L7874)¶
Registers a Python function as a vectorized UDF and returns the UDF. The arguments, return value and usage of this function are exactly the same as
udf()
, but this function can only be used for registering vectorized UDFs. See examples inUDFRegistration
.See also
Example:
>>> from snowflake.snowpark.types import PandasSeriesType, PandasDataFrameType, IntegerType >>> add_one_df_pandas_udf = pandas_udf( ... lambda df: df[0] + df[1] + 1, ... return_type=PandasSeriesType(IntegerType()), ... input_types=[PandasDataFrameType([IntegerType(), IntegerType()])] ... ) >>> df = session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"]) >>> df.select(add_one_df_pandas_udf("a", "b").alias("result")).order_by("result").show() ------------ |"RESULT" | ------------ |4 | |8 | ------------
or as named pandas UDFs that are accesible in the same session. Instead of calling pandas_udf as function, it can be also used as a decorator:
Example:
>>> from snowflake.snowpark.types import PandasSeriesType, PandasDataFrameType, IntegerType >>> @pandas_udf( ... return_type=PandasSeriesType(IntegerType()), ... input_types=[PandasDataFrameType([IntegerType(), IntegerType()])], ... ) ... def add_one_df_pandas_udf(df): ... return df[0] + df[1] + 1 >>> df = session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"]) >>> df.select(add_one_df_pandas_udf("a", "b").alias("result")).order_by("result").show() ------------ |"RESULT" | ------------ |4 | |8 | ------------