snowflake.snowpark.modin.plugin.extensions.groupby_overrides.SeriesGroupBy.var

SeriesGroupBy.var(ddof: int = 1, engine: Optional[Literal['cython', 'numba']] = None, engine_kwargs: Optional[dict[str, bool]] = None, numeric_only: bool = False)[source] (https://github.com/snowflakedb/snowpark-python/blob/v1.26.0/snowpark-python/src/snowflake/snowpark/modin/plugin/extensions/groupby_overrides.py#L946-L963)

Compute variance of groups, excluding missing values.

For multiple groupings, the result index will be a MultiIndex.

Parameters:
  • ddof (int, default 1) – Degrees of freedom. When ddof is 0/1, the operation is executed with Snowflake. Otherwise, it is not yet supported.

  • engine (str, default None) –

    In pandas, engine can be configured as 'cython' or 'numba', and None defaults to 'cython' or globally setting compute.use_numba.

    This parameter is ignored in Snowpark pandas, as the execution is always performed in Snowflake.

  • engine_kwargs (dict, default None) –

    Configuration keywords for the configured execution egine.

    This parameter is ignored in Snowpark pandas, as the execution is always performed in Snowflake.

  • numeric_only (bool, default False) – Include only float, int or boolean data columns.

Returns:

Variance of values within each group.

Return type:

Series or DataFrame

Examples

For SeriesGroupBy:

>>> lst = ['a', 'a', 'a', 'b', 'b', 'b', 'c']
>>> ser = pd.Series([7, 2, 8, 4, 3, 3, 1], index=lst)
>>> ser
a    7
a    2
a    8
b    4
b    3
b    3
c    1
dtype: int64
>>> ser.groupby(level=0).var()
a    10.333333
b     0.333333
c          NaN
dtype: float64
>>> ser.groupby(level=0).var(ddof=0)
a    6.888889
b    0.222222
c    0.000000
dtype: float64
Copy

Note that if the number of elements in a group is less or equal to the ddof, the result for the group will be NaN/None. For example, the value for group c is NaN when we call ser.groupby(level=0).var(), and the default ddof is 1.

For DataFrameGroupBy:

>>> data = {'a': [1, 3, 5, 7, 7, 8, 3], 'b': [1, 4, 8, 4, 4, 2, 1]}
>>> df = pd.DataFrame(data, index=pd.Index(['dog', 'dog', 'dog',
...                   'mouse', 'mouse', 'mouse', 'mouse'], name='c'))
>>> df      
       a  b
c
dog    1  1
dog    3  4
dog    5  8
mouse  7  4
mouse  7  4
mouse  8  2
mouse  3  1
>>> df.groupby('c').var()       
              a          b
c
dog    4.000000  12.333333
mouse  4.916667   2.250000
>>> data = {'a': [1, 3, 5, 7, 7, 8, 3], 'b': ['c', 'e', 'd', 'a', 'a', 'b', 'e']}
>>> df = pd.DataFrame(data, index=pd.Index(['dog', 'dog', 'dog',
...                   'mouse', 'mouse', 'mouse', 'mouse'], name='c'))
>>> df      
       a  b
c
dog    1  c
dog    3  e
dog    5  d
mouse  7  a
mouse  7  a
mouse  8  b
mouse  3  e
>>> df.groupby('c').var(numeric_only=True)       
              a
c
dog    4.000000
mouse  4.916667
Copy
Language: English