You are viewing documentation about an older version (1.3.0). View latest version

snowflake.snowpark.DataFrame.sampleBy¶

DataFrame.sampleBy(col: ColumnOrName, fractions: Dict[LiteralType, float]) → DataFrame[source] (https://github.com/snowflakedb/snowpark-python/blob/release-v1.3.0/src/snowflake/snowpark/dataframe_stat_functions.py#L221-L254)¶

Returns a DataFrame containing a stratified sample without replacement, based on a dict that specifies the fraction for each stratum.

Example:

>>> df = session.create_dataframe([("Bob", 17), ("Alice", 10), ("Nico", 8), ("Bob", 12)], schema=["name", "age"])
>>> fractions = {"Bob": 0.5, "Nico": 1.0}
>>> sample_df = df.stat.sample_by("name", fractions)  # non-deterministic result

Copy

Parameters:

col – The name of the column that defines the strata.
fractions – A dict that specifies the fraction to use for the sample for each stratum. If a stratum is not specified in the dict, the method uses 0 as the fraction.