snowflake.snowpark.DataFrame.randomSplit¶
- DataFrame.randomSplit(weights: List[float], seed: Optional[int] = None, *, statement_params: Optional[Dict[str, str]] = None) List[DataFrame][source] (https://github.com/snowflakedb/snowpark-python/blob/v1.23.0/src/snowflake/snowpark/dataframe.py#L4078-L4144)¶
- Randomly splits the current DataFrame into separate DataFrames, using the specified weights. - Parameters:
- weights – Weights to use for splitting the DataFrame. If the weights don’t add up to 1, the weights will be normalized. Every number in - weightshas to be positive. If only one weight is specified, the returned DataFrame list only includes the current DataFrame.
- seed – The seed for sampling. 
- statement_params – Dictionary of statement level parameters to be set while executing this action. 
 
 - Example: - >>> df = session.range(10000) >>> weights = [0.1, 0.2, 0.3] >>> df_parts = df.random_split(weights) >>> len(df_parts) == len(weights) True - Note - 1. When multiple weights are specified, the current DataFrame will be cached before being split. - 2. When a weight or a normailized weight is less than - 1e-6, the corresponding split dataframe will be empty.