You are viewing documentation about an older version (1.23.0). View latest version

snowflake.snowpark.modin.plugin.extensions.groupby_overrides.DataFrameGroupBy.apply¶

DataFrameGroupBy.apply(func, *args, **kwargs)[source] (https://github.com/snowflakedb/snowpark-python/blob/v1.23.0/src/snowflake/snowpark/modin/plugin/extensions/groupby_overrides.py#L234-L253)¶

Apply function func group-wise and combine the results together.

The function passed to apply must take a dataframe or series as its first argument and return a DataFrame, Series or scalar. apply will then take care of combining the results back together into a single dataframe or series. apply is therefore a highly flexible grouping method.

While apply is a very flexible method, its downside is that using it can be quite a bit slower than using more specific methods like agg or transform. pandas offers a wide range of methods that will be much faster than using apply for their specific purposes, so try to use them before reaching for apply.

Parameters:

func (callable) – A callable that takes a dataframe or series as its first argument, and returns a dataframe, a series or a scalar. In addition the callable may take positional and keyword arguments.
args (tuple and dict) – Optional positional and keyword arguments to pass to func.
kwargs (tuple and dict) – Optional positional and keyword arguments to pass to func.

Return type:

Series or DataFrame

See also

pipe: Apply function to the full GroupBy object instead of to each group.
aggregate: Apply aggregate function to the GroupBy object.
transform: Apply function column-by-column to the GroupBy object.
Series.apply: Apply a function to a Series.
DataFrame.apply: Apply a function to each row or column of a DataFrame.

Notes

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported.

Returning a Series or scalar in func is not yet supported in Snowpark pandas.

Examples

>>> df = pd.DataFrame({'A': 'a a b'.split(),
...                    'B': [1,2,3],
...                    'C': [4,6,5]})
>>> g1 = df.groupby('A', group_keys=False)
>>> g2 = df.groupby('A', group_keys=True)

Copy

Notice that g1 have g2 have two groups, a and b, and only differ in their group_keys argument. Calling apply in various ways, we can get different grouping results:

Example 1: below the function passed to apply takes a DataFrame as its argument and returns a DataFrame. apply combines the result for each group together into a new DataFrame:

>>> g1[['B', 'C']].apply(lambda x: x.select_dtypes('number') / x.select_dtypes('number').sum()) 
            B    C
0.0  0.333333  0.4
1.0  0.666667  0.6
2.0  1.000000  1.0

Copy

In the above, the groups are not part of the index. We can have them included by using g2 where group_keys=True:

>>> g2[['B', 'C']].apply(lambda x: x.select_dtypes('number') / x.select_dtypes('number').sum()) 
            B    C
A
a 0.0  0.333333  0.4
  1.0  0.666667  0.6
b 2.0  1.000000  1.0

Copy