modin.pandas.DataFrame.describe¶

DataFrame.describe(percentiles=None, include=None, exclude=None) → Self[source] (https://github.com/snowflakedb/snowpark-python/blob/v1.44.0/.tox/docs/lib/python3.9/site-packages/modin/pandas/base.py#L1394-L1430)¶

Generate descriptive statistics for columns in the dataset.

For non-numeric columns, computes count (# of non-null items), unique (# of unique items), top (the mode; the element at the lowest position if multiple), and freq (# of times the mode appears) for each column.

For numeric columns, computes count (# of non-null items), mean, std, min, the specified percentiles, and max for each column.

If both non-numeric and numeric columns are specified, the rows for statistics of non-numeric columns appear first in the output.

Parameters:

percentiles (Optional[ListLike], default None) – The percentiles to compute for numeric columns. If unspecified, defaults to [0.25, 0.5, 0.75], which returns the 25th, 50th, and 75th percentiles. All values should fall between 0 and 1. The median (0.5) will always be added to the displayed percentile if not already included; the min and max are always displayed in addition to the percentiles.
include (Optional[List[str, ExtensionDtype | np.dtype]] | "all", default None) –
A list of dtypes to include in the result (ignored for Series).
- ”all”: Include all columns in the output.
- list-like: Include only columns of the listed dtypes. To limit the result to numeric types submit numpy.number. To limit it instead to object columns submit the numpy.object data type. Strings can also be used in the style of select_dtypes (e.g. df.describe(include=[‘O’])).
- None: If the dataframe has at least one numeric column, then include only numeric columns; otherwise include all columns in the output.
exclude (Optional[List[str, ExtensionDtype | np.dtype]], default None) –
A list of dtypes to omit from the result (ignored for Series).
- list-like: Exclude all columns of the listed dtypes. To exclude numeric types submit numpy.number. To exclude object columns submit the data type numpy.object. Strings can also be used in the style of select_dtypes (e.g. df.describe(exclude=[‘O’])).
- None: Exclude nothing.

Returns:

Snowpark DataFrame if this was a DataFrame, and Snowpark Series if this was a Series. Each column contains statistics for the corresponding column in the input dataset.

Return type:

BasePandasDataset

Examples

Describing a frame with both numeric and object columns:

>>> df = pd.DataFrame({'numeric': [1, 2, 3],
...                    'object': ['a', 'b', 'c']
...                   })
>>> df.describe(include='all') 
        numeric object
count       3.0      3
unique      NaN      3
top         NaN      a
freq        NaN      1
mean        2.0   None
std         1.0   None
min         1.0   None
25%         1.5   None
50%         2.0   None
75%         2.5   None
max         3.0   None

Copy

Describing only numeric columns:

>>> pd.DataFrame({'numeric': [1, 2, 3], 'object': ['a', 'b', 'c']}).describe(include='number') 
       numeric
count      3.0
mean       2.0
std        1.0
min        1.0
25%        1.5
50%        2.0
75%        2.5
max        3.0

Copy

Excluding numeric columns:

>>> pd.DataFrame({'numeric': [1, 2, 3], 'object': ['a', 'b', 'c']}).describe(exclude='number') 
       object
count       3
unique      3
top         a
freq        1

Copy