snowflake.snowpark.DataFrame.random_split¶
- DataFrame.random_split(weights: List[float], seed: Optional[int] = None, *, statement_params: Optional[Dict[str, str]] = None) List[DataFrame] [source] (https://github.com/snowflakedb/snowpark-python/blob/v1.26.0/snowpark-python/src/snowflake/snowpark/dataframe.py#L5368-L5472)¶
Randomly splits the current DataFrame into separate DataFrames, using the specified weights.
- Parameters:
weights – Weights to use for splitting the DataFrame. If the weights don’t add up to 1, the weights will be normalized. Every number in
weights
has to be positive. If only one weight is specified, the returned DataFrame list only includes the current DataFrame.seed – The seed for sampling.
statement_params – Dictionary of statement level parameters to be set while executing this action.
Example:
>>> df = session.range(10000) >>> weights = [0.1, 0.2, 0.3] >>> df_parts = df.random_split(weights) >>> len(df_parts) == len(weights) True
Note
1. When multiple weights are specified, the current DataFrame will be cached before being split.
2. When a weight or a normailized weight is less than
1e-6
, the corresponding split dataframe will be empty.