数据生成函数
数据生成函数允许您生成数据。Snowflake 支持两种类型的数据生成函数:
- 随机型,这对于测试目的很有用。
这些函数每次都会生成一个随机值。每个值都独立于由函数的其他调用所生成的其他值。底层算法产生伪随机值,因此这些值不是真正随机或独立的,但在不了解算法的情况下,这些值基本上是不可预测的,通常是均匀分布的(如果样本量很大),并且彼此伪独立。
-
受控分布型,可用于为尚不具有唯一标识符的记录提供唯一 ID 编号。
These functions produce values that are not independent. For example, the NORMAL function returns values that have an approximately “normal” (bell-shaped) distribution based on a specified mean and standard deviation. Thus, each new value generated is at least indirectly influenced by previously generated values as the function tries to maintain the specified distribution. As another example, the SEQ family of functions return a sequence of values.
Note
The UNIFORM function is listed as a controlled-distribution function, but is intended to generate evenly-distributed values. In other words, it acts as though it’s a “random” function, but we refer to it as a controlled distribution function because the distribution is explicitly specified and because you can choose a data-generation function that produces non-uniform values over a large sample size.
函数列表
| Function Name | Notes |
|---|---|
| Random | |
| RANDOM | Returns a pseudo-random 64-bit integer. |
| RANDSTR | Returns a random string of specified length. |
| UUID_STRING | Returns a random RFC 4122-compliant UUID as a formatted string. |
| Controlled Distribution | |
| NORMAL | Returns a normal-distributed floating point number, with specified mean and standard deviation. |
| UNIFORM | Returns a uniformly random number within the specified range. |
| ZIPF | Returns a Zipf-distributed integer. |
| SEQ1 / SEQ2 / SEQ4 / SEQ8 | Returns a sequence of monotonically increasing integers. |
使用说明
-
随机分布函数是确定性的。
-
Each random distribution function takes a generator expression,
gen, as its last argument. The generator expression,gen, can be constant or variable:- If constant, then the result of the random distribution function is constant (unless there are other, variable arguments, which is currently only supported for the RANDSTR function).
- 如果是变量,则随机分布函数的结果为变量。
-
生成器表达式必须是 64 位整型类型,但允许隐式转换。任何可以转换为 64 位整型的表达式都可以用作生成器表达式。
-
The randomness of any random distribution function is directly linked to the randomness of its generator expression. For most practical purposes, the RANDOM data generation function is the best choice for randomly-generated integer values.
-
不能保证数据生成函数生成的序列是有序和无间隙的。这是因为这些数字可能以非同步的方式并行生成。
For more details about sequences in Snowflake, see Using Sequences.
-
Decimal-float (DECFLOAT) values can’t be used as arguments for data generation functions.