You are viewing documentation about an older version (1.16.0). View latest version

snowflake.snowpark.DataFrame.crosstab

DataFrame.crosstab(col1: Union[Column, str], col2: Union[Column, str], *, statement_params: Optional[Dict[str, str]] = None) DataFrame[source] (https://github.com/snowflakedb/snowpark-python/blob/v1.16.0/src/snowflake/snowpark/dataframe_stat_functions.py#L167-L220)

Computes a pair-wise frequency table (a contingency table) for the specified columns. The method returns a DataFrame containing this table.

In the returned contingency table:
  • The first column of each row contains the distinct values of col1.

  • The name of the first column is the name of col1.

  • The rest of the column names are the distinct values of col2.

  • For pairs that have no occurrences, the contingency table contains 0 as the count.

Note

The number of distinct values in col2 should not exceed 1000.

Example:

>>> df = session.create_dataframe([(1, 1), (1, 2), (2, 1), (2, 1), (2, 3), (3, 2), (3, 3)], schema=["key", "value"])
>>> ct = df.stat.crosstab("key", "value").sort(df["key"])
>>> ct.show()
---------------------------------------------------------------------------------------------
|"KEY"  |"CAST(1 AS NUMBER(38,0))"  |"CAST(2 AS NUMBER(38,0))"  |"CAST(3 AS NUMBER(38,0))"  |
---------------------------------------------------------------------------------------------
|1      |1                          |1                          |0                          |
|2      |2                          |0                          |1                          |
|3      |0                          |1                          |1                          |
---------------------------------------------------------------------------------------------
Copy
Parameters:
  • col1 – The name of the first column to use.

  • col2 – The name of the second column to use.

  • statement_params – Dictionary of statement level parameters to be set while executing this action.

Language: English