snowflake.snowpark.modin.plugin.extensions.groupby_overrides.DataFrameGroupBy.value_counts

DataFrameGroupBy.value_counts(subset: Optional[list[str]] = None, normalize: bool = False, sort: bool = True, ascending: bool = False, dropna: bool = True)[source] (https://github.com/snowflakedb/snowpark-python/blob/v1.26.0/snowpark-python/src/snowflake/snowpark/modin/plugin/extensions/groupby_overrides.py#L991-L1014)

Return a Series or DataFrame containing counts of unique rows.

Parameters:
  • subset (list-like, optional) – Columns to use when counting unique combinations.

  • normalize (bool, default False) –

    Return proportions rather than frequencies.

    Note that when normalize=True, groupby is called with sort=False, and value_counts is called with sort=True, Snowpark pandas will order results differently from native pandas. This occurs because native pandas sorts on frequencies before converting them to proportions, while Snowpark pandas computes proportions within groups before sorting.

    See issue for details: https://github.com/pandas-dev/pandas/issues/59307 (https://github.com/pandas-dev/pandas/issues/59307)

  • sort (bool, default True) – Sort by frequencies.

  • ascending (bool, default False) – Sort in ascending order.

  • dropna (bool, default True) – Don’t include counts of rows that contain NA values.

Returns:

Series if the groupby as_index is True, otherwise DataFrame.

Return type:

Series or DataFrame

Notes

  • If the groupby as_index is True then the returned Series will have a MultiIndex with one level per input column.

  • If the groupby as_index is False then the returned DataFrame will have an additional column with the value_counts. The column is labelled ‘count’ or ‘proportion’, depending on the normalize parameter.

By default, rows that contain any NA values are omitted from the result.

By default, the result will be in descending order so that the first element of each group is the most frequently-occurring row.

Examples

>>> df = pd.DataFrame({
...     'gender': ['male', 'male', 'female', 'male', 'female', 'male'],
...     'education': ['low', 'medium', 'high', 'low', 'high', 'low'],
...     'country': ['US', 'FR', 'US', 'FR', 'FR', 'FR']
... })
Copy
>>> df  
        gender  education   country
0       male    low         US
1       male    medium      FR
2       female  high        US
3       male    low         FR
4       female  high        FR
5       male    low         FR
Copy
>>> df.groupby('gender').value_counts()  
gender  education  country
female  high       FR         1
                   US         1
male    low        FR         2
                   US         1
        medium     FR         1
Name: count, dtype: int64
Copy
>>> df.groupby('gender').value_counts(ascending=True)  
gender  education  country
female  high       FR         1
                   US         1
male    low        US         1
        medium     FR         1
        low        FR         2
Name: count, dtype: int64
Copy
>>> df.groupby('gender').value_counts(normalize=True)  
gender  education  country
female  high       FR         0.50
                   US         0.50
male    low        FR         0.50
                   US         0.25
        medium     FR         0.25
Name: proportion, dtype: float64
Copy
>>> df.groupby('gender', as_index=False).value_counts()  
   gender education country  count
0  female      high      FR      1
1  female      high      US      1
2    male       low      FR      2
3    male       low      US      1
4    male    medium      FR      1
Copy
>>> df.groupby('gender', as_index=False).value_counts(normalize=True)  
   gender education country  proportion
0  female      high      FR        0.50
1  female      high      US        0.50
2    male       low      FR        0.50
3    male       low      US        0.25
4    male    medium      FR        0.25
Copy
Language: English