snowflake.snowpark.modin.plugin.extensions.groupby_overrides.DataFrameGroupBy.transform¶
- DataFrameGroupBy.transform(func: Union[str, Callable], *args: Any, engine: Optional[Literal['cython', 'numba']] = None, engine_kwargs: Optional[dict[str, bool]] = None, **kwargs: Any) DataFrame [source] (https://github.com/snowflakedb/snowpark-python/blob/v1.26.0/snowpark-python/src/snowflake/snowpark/modin/plugin/extensions/groupby_overrides.py#L343-L413)¶
Call function producing a same-indexed
DataFrame
on each group.Returns a
DataFrame
having the same indexes as the original object filled with the transformed values.- Parameters:
func (function, str) –
Function to apply to each group. See the Notes section below for requirements.
Accepted inputs are:
String (needs to be the name of groupby method you want to use)
Python function
*args (Any) – Positional arguments to pass to func.
engine (str, default None) –
'cython'
: Runs the function through C-extensions from cython.'numba'
: Runs the function through JIT compiled code from numba.None
: Defaults to'cython'
or the global settingcompute.use_numba
This parameter is ignored in Snowpark pandas, as the execution is always performed in Snowflake.
engine_kwargs (dict, default None) –
For
'cython'
engine, there are no acceptedengine_kwargs
For
'numba'
engine, the engine can acceptnopython
,nogil
andparallel
dictionary keys. The values must either beTrue
orFalse
. The defaultengine_kwargs
for the'numba'
engine is{'nopython': True, 'nogil': False, 'parallel': False}
and will be applied to the function
This parameter is ignored in Snowpark pandas, as the execution is always performed in Snowflake.
**kwargs (Any) – Keyword arguments to be passed into func.
Notes
Functions that mutate the passed object can produce unexpected behavior or errors and are not supported.
Returning a Series or scalar in
func
is not yet supported in Snowpark pandas.Examples
>>> df = pd.DataFrame( ... { ... "col1": ["Z", None, "X", "Z", "Y", "X", "X", None, "X", "Y"], ... "col2": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], ... "col3": [40, 50, 60, 10, 20, 30, 40, 80, 90, 10], ... "col4": [-1, -2, -3, -4, -5, -6, -7, -8, -9, -10], ... }, ... index=list("abcdefghij") ... ) >>> df col1 col2 col3 col4 a Z 1 40 -1 b None 2 50 -2 c X 3 60 -3 d Z 4 10 -4 e Y 5 20 -5 f X 6 30 -6 g X 7 40 -7 h None 8 80 -8 i X 9 90 -9 j Y 10 10 -10
>>> df.groupby("col1", dropna=True).transform(lambda df, n: df.head(n), n=2) col2 col3 col4 a 1.0 40.0 -1.0 b NaN NaN NaN c 3.0 60.0 -3.0 d 4.0 10.0 -4.0 e 5.0 20.0 -5.0 f 6.0 30.0 -6.0 g NaN NaN NaN h NaN NaN NaN i NaN NaN NaN j 10.0 10.0 -10.0
>>> df.groupby("col1", dropna=False).transform("mean") col2 col3 col4 a 2.50 25.0 -2.50 b 5.00 65.0 -5.00 c 6.25 55.0 -6.25 d 2.50 25.0 -2.50 e 7.50 15.0 -7.50 f 6.25 55.0 -6.25 g 6.25 55.0 -6.25 h 5.00 65.0 -5.00 i 6.25 55.0 -6.25 j 7.50 15.0 -7.50