snowflake.snowpark.DataFrame.drop_duplicates¶
- DataFrame.drop_duplicates(*subset: Union[str, Iterable[str]]) DataFrame[source] (https://github.com/snowflakedb/snowpark-python/blob/v1.23.0/src/snowflake/snowpark/dataframe.py#L1688-L1724)¶
- Creates a new DataFrame by removing duplicated rows on given subset of columns. - If no subset of columns is specified, this function is the same as the - distinct()function. The result is non-deterministic when removing duplicated rows from the subset of columns but not all columns.- For example, if we have a DataFrame - df, which has columns (“a”, “b”, “c”) and contains three rows- (1, 1, 1), (1, 1, 2), (1, 2, 3), the result of- df.dropDuplicates("a", "b")can be either- (1, 1, 1), (1, 2, 3)or- (1, 1, 2), (1, 2, 3)- Parameters:
- subset – The column names on which duplicates are dropped. 
 - dropDuplicates()is an alias of- drop_duplicates().