snowflake.snowpark.testing.assert_dataframe_equal¶
- snowflake.snowpark.testing.assert_dataframe_equal(actual: DataFrame, expected: DataFrame, rtol: float = 1e-05, atol: float = 1e-08) None [source] (https://github.com/snowflakedb/snowpark-python/blob/v1.26.0/snowpark-python/src/snowflake/snowpark/testing.py#L79-L229)¶
Asserts that two Snowpark
DataFrame
objects are equal. This function compares both the schema and the data of the DataFrames. If there are differences, anAssertionError
is raised with a detailed message including differences. This function is useful for unit testing and validating data transformations and processing in Snowpark.- Parameters:
actual – The actual DataFrame to be compared.
expected – The expected DataFrame to compare against.
rtol – The relative tolerance for comparing float values. Default is 1e-5.
atol – The absolute tolerance for comparing float values. Default is 1e-8.
Examples:
>>> from snowflake.snowpark.testing import assert_dataframe_equal >>> from snowflake.snowpark.types import StructType, StructField, IntegerType, StringType, DoubleType >>> schema1 = StructType([ ... StructField("id", IntegerType()), ... StructField("name", StringType()), ... StructField("value", DoubleType()) ... ]) >>> data1 = [[1, "Rice", 1.0], [2, "Saka", 2.0], [3, "White", 3.0]] >>> df1 = session.create_dataframe(data1, schema1) >>> df2 = session.create_dataframe(data1, schema1) >>> assert_dataframe_equal(df2, df1) # pass, DataFrames are identical >>> data2 = [[2, "Saka", 2.0], [1, "Rice", 1.0], [3, "White", 3.0]] # change the order >>> df3 = session.create_dataframe(data2, schema1) >>> assert_dataframe_equal(df3, df1) # pass, DataFrames are identical >>> data3 = [[1, "Rice", 1.0], [2, "Saka", 2.0], [4, "Rowe", 4.0]] >>> df4 = session.create_dataframe(data3, schema1) >>> assert_dataframe_equal(df4, df1) Traceback (most recent call last): AssertionError: Value mismatch on row 2 at column 0: actual 4, expected 3 Different row: --- actual --- +++ expected +++ - Row(ID=4, NAME='Rowe', VALUE=4.0) ? ^ ^^^ ^ + Row(ID=3, NAME='White', VALUE=3.0) ? ^ ^^^^ ^ >>> data4 = [[1, "Rice", 1.0], [2, "Saka", 2.0], [3, "White", 3.0001]] >>> df5 = session.create_dataframe(data4, schema1) >>> assert_dataframe_equal(df5, df1, atol=1e-3) # pass, DataFrames are identical due to higher error tolerance >>> assert_dataframe_equal(df5, df1, atol=1e-5) Traceback (most recent call last): AssertionError: Value mismatch on row 2 at column 2: actual 3.0001, expected 3.0 Different row: --- actual --- +++ expected +++ - Row(ID=3, NAME='White', VALUE=3.0001) ? --- + Row(ID=3, NAME='White', VALUE=3.0) >>> schema2 = StructType([ ... StructField("id", IntegerType()), ... StructField("key", StringType()), ... StructField("value", DoubleType()) ... ]) >>> df6 = session.create_dataframe(data1, schema2) >>> assert_dataframe_equal(df6, df1) Traceback (most recent call last): AssertionError: Column name mismatch at column 1: actual KEY, expected NAME Different schema: --- actual --- +++ expected +++ - StructType([StructField('ID', LongType(), nullable=True), StructField('KEY', StringType(), nullable=True), StructField('VALUE', DoubleType(), nullable=True)]) ? ^ - + StructType([StructField('ID', LongType(), nullable=True), StructField('NAME', StringType(), nullable=True), StructField('VALUE', DoubleType(), nullable=True)]) ? >>> schema3 = StructType([ ... StructField("id", IntegerType()), ... StructField("name", StringType()), ... StructField("value", IntegerType()) ... ]) >>> df7 = session.create_dataframe(data1, schema3) >>> assert_dataframe_equal(df7, df1) Traceback (most recent call last): AssertionError: Column data type mismatch at column 2: actual LongType(), expected DoubleType() Different schema: --- actual --- +++ expected +++ - StructType([StructField('ID', LongType(), nullable=True), StructField('NAME', StringType(), nullable=True), StructField('VALUE', LongType(), nullable=True)]) ? ^ ^^ + StructType([StructField('ID', LongType(), nullable=True), StructField('NAME', StringType(), nullable=True), StructField('VALUE', DoubleType(), nullable=True)]) ?
Note
1. Data in a Snowpark DataFrame is unordered, so when comparing two DataFrames, this function sorts rows based on their values first.
2. When comparing schemas,
types.IntegerType
andtypes.DoubleType
are considered different, even if the underlying values are equal (e.g., 2 vs 2.0).This function or method is experimental since 1.21.0.