snowflake.snowpark.DataFrame.crosstab¶
- DataFrame.crosstab(col1: Union[Column, str], col2: Union[Column, str], *, statement_params: Optional[Dict[str, str]] = None) DataFrame [source] (https://github.com/snowflakedb/snowpark-python/blob/v1.26.0/snowpark-python/src/snowflake/snowpark/dataframe_stat_functions.py#L286-L370)¶
Computes a pair-wise frequency table (a
contingency table
) for the specified columns. The method returns a DataFrame containing this table.- In the returned contingency table:
The first column of each row contains the distinct values of
col1
.The name of the first column is the name of
col1
.The rest of the column names are the distinct values of
col2
.For pairs that have no occurrences, the contingency table contains 0 as the count.
Note
The number of distinct values in
col2
should not exceed 1000.Example:
>>> df = session.create_dataframe([(1, 1), (1, 2), (2, 1), (2, 1), (2, 3), (3, 2), (3, 3)], schema=["key", "value"]) >>> ct = df.stat.crosstab("key", "value").sort(df["key"]) >>> ct.show() --------------------------------------------------------------------------------------------- |"KEY" |"CAST(1 AS NUMBER(38,0))" |"CAST(2 AS NUMBER(38,0))" |"CAST(3 AS NUMBER(38,0))" | --------------------------------------------------------------------------------------------- |1 |1 |1 |0 | |2 |2 |0 |1 | |3 |0 |1 |1 | ---------------------------------------------------------------------------------------------
- Parameters:
col1 – The name of the first column to use.
col2 – The name of the second column to use.
statement_params – Dictionary of statement level parameters to be set while executing this action.