snowflake.snowpark.functions.udaf¶
- snowflake.snowpark.functions.udaf(handler: Optional[Type] = None, *, return_type: Optional[DataType] = None, input_types: Optional[List[DataType]] = None, name: Optional[Union[str, Iterable[str]]] = None, is_permanent: bool = False, stage_location: Optional[str] = None, imports: Optional[List[Union[str, Tuple[str, str]]]] = None, packages: Optional[List[Union[str, module]]] = None, replace: bool = False, if_not_exists: bool = False, session: Optional[Session] = None, parallel: int = 4, statement_params: Optional[Dict[str, str]] = None, immutable: bool = False, external_access_integrations: Optional[List[str]] = None, secrets: Optional[Dict[str, str]] = None, comment: Optional[str] = None, **kwargs) Union[UserDefinedAggregateFunction, partial][source] (https://github.com/snowflakedb/snowpark-python/blob/v1.30.0/snowpark-python/src/snowflake/snowpark/functions.py#L9736-L9970)¶
- Registers a Python class as a Snowflake Python UDAF and returns the UDAF. - It can be used as either a function call or a decorator. In most cases you work with a single session. This function uses that session to register the UDAF. If you have multiple sessions, you need to explicitly specify the - sessionparameter of this function. If you have a function and would like to register it to multiple databases, use- session.udaf.registerinstead. See examples in- UDAFRegistration.- Parameters:
- handler – A Python class used for creating the UDAF. 
- return_type – A - DataTyperepresenting the return data type of the UDAF. Optional if type hints are provided.
- input_types – A list of - DataTyperepresenting the input data types of the UDAF. Optional if type hints are provided.
- name – A string or list of strings that specify the name or fully-qualified object identifier (database name, schema name, and function name) for the UDAF in Snowflake, which allows you to call this UDAF in a SQL command or via - DataFrame.agg(). If it is not provided, a name will be automatically generated for the UDAF. A name must be specified when- is_permanentis- True.
- is_permanent – Whether to create a permanent UDAF. The default is - False. If it is- True, a valid- stage_locationmust be provided.
- stage_location – The stage location where the Python file for the UDAF and its dependencies should be uploaded. The stage location must be specified when - is_permanentis- True, and it will be ignored when- is_permanentis- False. It can be any stage other than temporary stages and external stages.
- imports – A list of imports that only apply to this UDAF. You can use a string to represent a file path (similar to the - pathargument in- add_import()) in this list, or a tuple of two strings to represent a file path and an import path (similar to the- import_pathargument in- add_import()). These UDAF-level imports will override the session-level imports added by- add_import(). Note that an empty list means no import for this UDAF, and- Noneor not specifying this parameter means using session-level imports.
- packages – A list of packages that only apply to this UDAF. These UDAF-level packages will override the session-level packages added by - add_packages()and- add_requirements(). Note that an empty list means no package for this UDAF, and- Noneor not specifying this parameter means using session-level packages. To use Python packages that are not available in Snowflake, refer to- custom_package_usage_config().
- replace – Whether to replace a UDAF that already was registered. The default is - False. If it is- False, attempting to register a UDAF with a name that already exists results in a- SnowparkSQLExceptionexception being thrown. If it is- True, an existing UDAF with the same name is overwritten.
- if_not_exists – Whether to skip creation of a UDAF when one with the same signature already exists. The default is - False.- if_not_existsand- replaceare mutually exclusive and a- ValueErroris raised when both are set. If it is- Trueand a UDAF with the same signature exists, the UDAF creation is skipped.
- session – Use this session to register the UDAF. If it’s not specified, the session that you created before calling this function will be used. You need to specify this parameter if you have created multiple sessions before calling this method. 
- parallel – The number of threads to use for uploading UDAF files with the PUT command. The default value is 4 and supported values are from 1 to 99. Increasing the number of threads can improve performance when uploading large UDAF files. 
- statement_params – Dictionary of statement level parameters to be set while executing this action. 
- immutable – Whether the UDAF result is deterministic or not for the same input. 
- external_access_integrations – The names of one or more external access integrations. Each integration you specify allows access to the external network locations and secrets the integration specifies. 
- secrets – The key-value pairs of string types of secrets used to authenticate the external network location. The secrets can be accessed from handler code. The secrets specified as values must also be specified in the external access integration and the keys are strings used to retrieve the secrets using secret API. 
- comment – Adds a comment for the created object. See COMMENT 
 
- Returns:
- A UDAF function that can be called with - Columnexpressions.
 - Note - 1. When type hints are provided and are complete for a function, - return_typeand- input_typesare optional and will be ignored. See details of supported data types for UDAFs in- UDAFRegistration.- 2. A temporary UDAF (when - is_permanentis- False) is scoped to this- sessionand all UDAF related files will be uploaded to a temporary session stage (- session.get_session_stage()). For a permanent UDAF, these files will be uploaded to the stage that you provide.- 3. By default, UDAF registration fails if a function with the same name is already registered. Invoking - udaf()with- replaceset to- Truewill overwrite the previously registered function.- See also - Example::
- >>> from snowflake.snowpark.types import IntegerType >>> class PythonSumUDAF: ... def __init__(self) -> None: ... self._sum = 0 ... ... @property ... def aggregate_state(self): ... return self._sum ... ... def accumulate(self, input_value): ... self._sum += input_value ... ... def merge(self, other_sum): ... self._sum += other_sum ... ... def finish(self): ... return self._sum >>> sum_udaf = udaf( ... PythonSumUDAF, ... name="sum_int", ... replace=True, ... return_type=IntegerType(), ... input_types=[IntegerType()], ... ) >>> df = session.create_dataframe([[1, 3], [1, 4], [2, 5], [2, 6]]).to_df("a", "b") >>> df.agg(sum_udaf("a")).collect() [Row(SUM_INT("A")=6)] - Instead of calling udaf it is also possible to use udaf as a decorator. 
 - Example: - >>> @udaf(name="sum_int", replace=True, return_type=IntegerType(), input_types=[IntegerType()]) ... class PythonSumUDAF: ... def __init__(self) -> None: ... self._sum = 0 ... ... @property ... def aggregate_state(self): ... return self._sum ... ... def accumulate(self, input_value): ... self._sum += input_value ... ... def merge(self, other_sum): ... self._sum += other_sum ... ... def finish(self): ... return self._sum >>> df = session.create_dataframe([[1, 3], [1, 4], [2, 5], [2, 6]]).to_df("a", "b") >>> df.agg(PythonSumUDAF("a")).collect() [Row(SUM_INT("A")=6)]