Data generation functions

Data Generation functions allow you to generate data. Snowflake supports two types of data generation functions:

  • Random, which can be useful for testing purposes.

    These functions produce a random value each time. Each value is independent of the other values generated by other calls to the function. The underlying algorithm produces pseudo-random values, and thus the values are not truly random or independent, but without knowing the algorithm, the values are essentially unpredictable, usually evenly distributed (if the sample size is large), and pseudo-independent of each other.

  • Controlled distribution, which can be useful for providing unique ID numbers for records that do not already have unique identifiers.

    These functions produce values that are not independent. For example, the NORMAL function returns values that have an approximately “normal” (bell-shaped) distribution based on a specified mean and standard deviation. Thus, each new value generated is at least indirectly influenced by previously generated values as the function tries to maintain the specified distribution. As another example, the SEQ family of functions return a sequence of values.

Note

The UNIFORM function is listed as a controlled-distribution function, but is intended to generate evenly-distributed values. In other words, it acts as though it’s a “random” function, but we refer to it as a controlled distribution function because the distribution is explicitly specified and because you can choose a data-generation function that produces non-uniform values over a large sample size.

List of functions

Function Name

Notes

Random

Returns a pseudo-random 64-bit integer.

Returns a random string of specified length.

Returns a random RFC 4122-compliant UUID as a formatted string.

Controlled Distribution

Returns a normal-distributed floating point number, with specified mean and standard deviation.

Returns a uniformly random number within the specified range.

Returns a Zipf-distributed integer.

Returns a sequence of monotonically increasing integers.

Usage notes

  • Random distribution functions are deterministic.

  • Each random distribution function takes a generator expression, gen, as its last argument. The generator expression, gen, can be constant or variable:

    • If constant, then the result of the random distribution function is constant (unless there are other, variable arguments, which is currently only supported for the RANDSTR function).

    • If variable, then the result of the random distribution function is variable.

  • Generator expressions must be a type 64-bit integer, although implicit conversions are allowed. Any expression that can be converted into a 64-bit integer can be used as a generator expression.

  • The randomness of any random distribution function is directly linked to the randomness of its generator expression. For most practical purposes, the RANDOM data generation function is the best choice for randomly-generated integer values.

  • Sequences generated by data generation functions are not guaranteed to be ordered and gap-free. This is because the numbers may be generated in parallel, in an unsynchronized fashion.

    For more details about sequences in Snowflake, see Using Sequences.

Language: English