snowflake.snowpark.Table.sample¶
- Table.sample(frac: Optional[float] = None, n: Optional[int] = None, *, seed: Optional[int] = None, sampling_method: Optional[str] = None) DataFrame [source] (https://github.com/snowflakedb/snowpark-python/blob/v1.16.0/src/snowflake/snowpark/table.py#L301-L369)¶
Samples rows based on either the number of rows to be returned or a percentage of rows to be returned.
Sampling with a seed is not supported on views or subqueries. This method works on tables so it supports
seed
. This is the main difference betweenDataFrame.sample()
and this method.- Parameters:
frac – The percentage of rows to be sampled.
n – The fixed number of rows to sample in the range of 0 to 1,000,000 (inclusive). Either
frac
orn
should be provided.seed – Specifies a seed value to make the sampling deterministic. Can be any integer between 0 and 2147483647 inclusive. Default value is
None
.sampling_method – Specifies the sampling method to use: - “BERNOULLI” (or “ROW”): Includes each row with a probability of p/100. Similar to flipping a weighted coin for each row. - “SYSTEM” (or “BLOCK”): Includes each block of rows with a probability of p/100. Similar to flipping a weighted coin for each block of rows. This method does not support fixed-size sampling. Default is
None
. Then the Snowflake database will use “ROW” by default.
Note
SYSTEM | BLOCK sampling is often faster than BERNOULLI | ROW sampling.
Sampling without a seed is often faster than sampling with a seed.
Fixed-size sampling can be slower than equivalent fraction-based sampling because fixed-size sampling prevents some query optimization.
Fixed-size sampling doesn’t work with SYSTEM | BLOCK sampling.