snowflake.snowpark.modin.plugin.extensions.groupby_overrides.DataFrameGroupBy.apply¶
- DataFrameGroupBy.apply(func, *args, **kwargs)[source] (https://github.com/snowflakedb/snowpark-python/blob/v1.23.0/src/snowflake/snowpark/modin/plugin/extensions/groupby_overrides.py#L234-L253)¶
- Apply function - funcgroup-wise and combine the results together.- The function passed to - applymust take a dataframe or series as its first argument and return a- DataFrame, Series or scalar.- applywill then take care of combining the results back together into a single dataframe or series.- applyis therefore a highly flexible grouping method.- While - applyis a very flexible method, its downside is that using it can be quite a bit slower than using more specific methods like- aggor- transform. pandas offers a wide range of methods that will be much faster than using- applyfor their specific purposes, so try to use them before reaching for- apply.- Parameters:
- func (callable) – A callable that takes a dataframe or series as its first argument, and returns a dataframe, a series or a scalar. In addition the callable may take positional and keyword arguments. 
- args (tuple and dict) – Optional positional and keyword arguments to pass to - func.
- kwargs (tuple and dict) – Optional positional and keyword arguments to pass to - func.
 
- Return type:
 - See also - pipe
- Apply function to the full GroupBy object instead of to each group. 
- aggregate
- Apply aggregate function to the GroupBy object. 
- transform
- Apply function column-by-column to the GroupBy object. 
- Series.apply
- Apply a function to a Series. 
- DataFrame.apply
- Apply a function to each row or column of a DataFrame. 
 - Notes - Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. - Returning a Series or scalar in - funcis not yet supported in Snowpark pandas.- Examples - >>> df = pd.DataFrame({'A': 'a a b'.split(), ... 'B': [1,2,3], ... 'C': [4,6,5]}) >>> g1 = df.groupby('A', group_keys=False) >>> g2 = df.groupby('A', group_keys=True) - Notice that - g1have- g2have two groups,- aand- b, and only differ in their- group_keysargument. Calling apply in various ways, we can get different grouping results:- Example 1: below the function passed to apply takes a - DataFrameas its argument and returns a DataFrame. apply combines the result for each group together into a new DataFrame:- >>> g1[['B', 'C']].apply(lambda x: x.select_dtypes('number') / x.select_dtypes('number').sum()) B C 0.0 0.333333 0.4 1.0 0.666667 0.6 2.0 1.000000 1.0 - In the above, the groups are not part of the index. We can have them included by using - g2where- group_keys=True:- >>> g2[['B', 'C']].apply(lambda x: x.select_dtypes('number') / x.select_dtypes('number').sum()) B C A a 0.0 0.333333 0.4 1.0 0.666667 0.6 b 2.0 1.000000 1.0