snowflake.snowpark.DataFrame.drop_duplicates¶
- DataFrame.drop_duplicates(*subset: Union[str, Iterable[str]]) DataFrame [source] (https://github.com/snowflakedb/snowpark-python/blob/v1.16.0/src/snowflake/snowpark/dataframe.py#L1582-L1618)¶
Creates a new DataFrame by removing duplicated rows on given subset of columns.
If no subset of columns is specified, this function is the same as the
distinct()
function. The result is non-deterministic when removing duplicated rows from the subset of columns but not all columns.For example, if we have a DataFrame
df
, which has columns (“a”, “b”, “c”) and contains three rows(1, 1, 1), (1, 1, 2), (1, 2, 3)
, the result ofdf.dropDuplicates("a", "b")
can be either(1, 1, 1), (1, 2, 3)
or(1, 1, 2), (1, 2, 3)
- Parameters:
subset – The column names on which duplicates are dropped.
dropDuplicates()
is an alias ofdrop_duplicates()
.