snowflake.snowpark.modin.plugin.extensions.groupby_overrides.DataFrameGroupBy.sum

DataFrameGroupBy.sum(numeric_only: bool = False, min_count: int = 0, engine: Optional[Literal['cython', 'numba']] = None, engine_kwargs: Optional[dict[str, bool]] = None)[source] (https://github.com/snowflakedb/snowpark-python/blob/v1.26.0/snowpark-python/src/snowflake/snowpark/modin/plugin/extensions/groupby_overrides.py#L927-L944)

Compute sum of group values.

Parameters:
  • numeric_only (bool, default False) – Include only float, int, boolean columns.

  • min_count (int, default 0) – The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

  • engine (str, default None None) –

    • 'cython' : Runs rolling apply through C-extensions from cython.

    • 'numba'Runs rolling apply through JIT compiled code from numba.

      Only available when raw is set to True.

    • None : Defaults to 'cython' or globally setting compute.use_numba

    This parameter is ignored in Snowpark pandas. The execution engine will always be Snowflake.

  • engine_kwargs (dict, default None None) –

    • For 'cython' engine, there are no accepted engine_kwargs

    • For 'numba' engine, the engine can accept nopython, nogil

      and parallel dictionary keys. The values must either be True or False. The default engine_kwargs for the 'numba' engine is {'nopython': True, 'nogil': False, 'parallel': False} and will be applied to both the func and the apply groupby aggregation.

    This parameter is ignored in Snowpark pandas. The execution engine will always be Snowflake.

Returns:

Computed sum of values within each group.

Return type:

Series or DataFrame

Examples

For SeriesGroupBy:

>>> lst = ['a', 'a', 'b', 'b']
>>> ser = pd.Series([1, 2, 3, 4], index=lst)
>>> ser
a    1
a    2
b    3
b    4
dtype: int64
>>> ser.groupby(level=0).sum()
a    3
b    7
dtype: int64
Copy

For DataFrameGroupBy:

>>> data = [[1, 8, 2], [1, 2, 5], [2, 5, 8], [2, 6, 9]]
>>> df = pd.DataFrame(data, columns=["a", "b", "c"],
...                   index=["tiger", "leopard", "cheetah", "lion"])
>>> df
         a  b  c
tiger    1  8  2
leopard  1  2  5
cheetah  2  5  8
lion     2  6  9
>>> df.groupby("a").sum()  
    b   c
a
1  10   7
2  11  17
Copy
Language: English