snowflake.snowpark.modin.plugin.extensions.groupby_overrides.DataFrameGroupBy.transform¶
- DataFrameGroupBy.transform(func: Union[str, Callable], *args: Any, engine: Optional[Literal['cython', 'numba']] = None, engine_kwargs: Optional[dict[str, bool]] = None, **kwargs: Any) DataFrame[source] (https://github.com/snowflakedb/snowpark-python/blob/v1.23.0/src/snowflake/snowpark/modin/plugin/extensions/groupby_overrides.py#L336-L406)¶
- Call function producing a same-indexed - DataFrameon each group.- Returns a - DataFramehaving the same indexes as the original object filled with the transformed values.- Parameters:
- func (function, str) – - Function to apply to each group. See the Notes section below for requirements. - Accepted inputs are: - String (needs to be the name of groupby method you want to use) 
- Python function 
 
- *args (Any) – Positional arguments to pass to func. 
- engine (str, default None) – - 'cython': Runs the function through C-extensions from cython.
- 'numba': Runs the function through JIT compiled code from numba.
- None: Defaults to- 'cython'or the global setting- compute.use_numba
 - This parameter is ignored in Snowpark pandas, as the execution is always performed in Snowflake. 
- engine_kwargs (dict, default None) – - For - 'cython'engine, there are no accepted- engine_kwargs
- For - 'numba'engine, the engine can accept- nopython,- nogiland- paralleldictionary keys. The values must either be- Trueor- False. The default- engine_kwargsfor the- 'numba'engine is- {'nopython': True, 'nogil': False, 'parallel': False}and will be applied to the function
 - This parameter is ignored in Snowpark pandas, as the execution is always performed in Snowflake. 
- **kwargs (Any) – Keyword arguments to be passed into func. 
 
 - Notes - Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. - Returning a Series or scalar in - funcis not yet supported in Snowpark pandas.- Examples - >>> df = pd.DataFrame( ... { ... "col1": ["Z", None, "X", "Z", "Y", "X", "X", None, "X", "Y"], ... "col2": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], ... "col3": [40, 50, 60, 10, 20, 30, 40, 80, 90, 10], ... "col4": [-1, -2, -3, -4, -5, -6, -7, -8, -9, -10], ... }, ... index=list("abcdefghij") ... ) >>> df col1 col2 col3 col4 a Z 1 40 -1 b None 2 50 -2 c X 3 60 -3 d Z 4 10 -4 e Y 5 20 -5 f X 6 30 -6 g X 7 40 -7 h None 8 80 -8 i X 9 90 -9 j Y 10 10 -10 - >>> df.groupby("col1", dropna=True).transform(lambda df, n: df.head(n), n=2) col2 col3 col4 a 1.0 40.0 -1.0 b NaN NaN NaN c 3.0 60.0 -3.0 d 4.0 10.0 -4.0 e 5.0 20.0 -5.0 f 6.0 30.0 -6.0 g NaN NaN NaN h NaN NaN NaN i NaN NaN NaN j 10.0 10.0 -10.0 - >>> df.groupby("col1", dropna=False).transform("mean") col2 col3 col4 a 2.50 25.0 -2.50 b 5.00 65.0 -5.00 c 6.25 55.0 -6.25 d 2.50 25.0 -2.50 e 7.50 15.0 -7.50 f 6.25 55.0 -6.25 g 6.25 55.0 -6.25 h 5.00 65.0 -5.00 i 6.25 55.0 -6.25 j 7.50 15.0 -7.50