snowflake.snowpark.modin.plugin.extensions.groupby_overrides.DataFrameGroupBy.std

DataFrameGroupBy.std(ddof: int = 1, engine: Optional[Literal['cython', 'numba']] = None, engine_kwargs: Optional[dict[str, bool]] = None, numeric_only: bool = False)[source] (https://github.com/snowflakedb/snowpark-python/blob/v1.26.0/snowpark-python/src/snowflake/snowpark/modin/plugin/extensions/groupby_overrides.py#L909-L925)

Compute standard deviation of groups, excluding missing values.

For multiple groupings, the result index will be a MultiIndex.

Parameters:
  • ddof (int, default 1.) –

    Degrees of freedom.

    Snowpark pandas currently only supports ddof=0 and ddof=1.

  • engine (str, default None) –

    In pandas, engine can be configured as 'cython' or 'numba', and None defaults to 'cython' or globally setting compute.use_numba.

    This parameter is ignored in Snowpark pandas, as the execution is always performed in Snowflake.

  • engine_kwargs (dict, default None) –

    Configuration keywords for the configured execution egine.

    This parameter is ignored in Snowpark pandas, as the execution is always performed in Snowflake.

  • numeric_only (bool, default False) – Include only float, int or boolean data columns.

Returns:

Standard deviation of values within each group.

Return type:

Series or DataFrame

Examples

For SeriesGroupBy:

>>> lst = ['a', 'a', 'a', 'b', 'b', 'b', 'c']
>>> ser = pd.Series([7, 2, 8, 4, 3, 3, 1], index=lst)
>>> ser
a    7
a    2
a    8
b    4
b    3
b    3
c    1
dtype: int64
>>> ser.groupby(level=0).std()
a    3.21455
b    0.57735
c        NaN
dtype: float64
>>> ser.groupby(level=0).std(ddof=0)
a    2.624669
b    0.471404
c    0.000000
dtype: float64
Copy

Note that if the number of elements in a group is less or equal to the ddof, the result for the group will be NaN/None. For example, the value for group c is NaN when we call ser.groupby(level=0).std(), and the default ddof is 1.

For DataFrameGroupBy:

>>> data = {'a': [1, 3, 5, 7, 7, 8, 3], 'b': [1, 4, 8, 4, 4, 2, 1]}
>>> df = pd.DataFrame(data, index=pd.Index(['dog', 'dog', 'dog',
...                   'mouse', 'mouse', 'mouse', 'mouse'], name='c'))
>>> df      
       a  b
c
dog    1  1
dog    3  4
dog    5  8
mouse  7  4
mouse  7  4
mouse  8  2
mouse  3  1
>>> df.groupby('c').std()       
              a         b
c
dog    2.000000  3.511885
mouse  2.217356  1.500000
>>> data = {'a': [1, 3, 5, 7, 7, 8, 3], 'b': ['c', 'e', 'd', 'a', 'a', 'b', 'e']}
>>> df = pd.DataFrame(data, index=pd.Index(['dog', 'dog', 'dog',
...                   'mouse', 'mouse', 'mouse', 'mouse'], name='c'))
>>> df      
       a  b
c
dog    1  c
dog    3  e
dog    5  d
mouse  7  a
mouse  7  a
mouse  8  b
mouse  3  e
>>> df.groupby('c').std(numeric_only=True)       
              a
c
dog    2.000000
mouse  2.217356
Copy
Language: English