- Categories:
Aggregate functions (Cardinality Estimation) , Window functions (Cardinality Estimation)
HLL_ ACCUMULATE¶
在聚合结束时返回 HyperLogLog 状态。
For more information about HyperLogLog, see Estimating the Number of Distinct Values.
HLL discards its intermediate state when the final cardinality estimate is returned. In advanced use cases, such as incremental cardinality estimation during bulk loading, one may want to keep the intermediate state. The intermediate state can later be combined (merged) with other intermediate states, or can be exported to external tools.
In contrast to HLL, HLL_ACCUMULATE does not return a cardinality estimate. Instead, it skips the final estimation step and returns the algorithm state itself. The state is a binary of at most 4096 Bytes. For more information, see Estimating the Number of Distinct Values.
- See also:
语法
聚合函数
窗口函数
For details about the OVER clause, see Window function syntax and usage.
实参
expr要估计基数(非重复值数)的表达式。这通常是列名,但也可以是更通用的表达式。
使用说明
- This function can be used as an aggregate function or a window function.
- DISTINCT 在语法上受支持,但不起作用。
示例
这显示了估计加拿大各省不同邮政编码数量的一步。在此步骤中,我们计算了马尼托巴省不同邮政编码的近似数量,并存储了计算“状态”的内部表示,稍后我们可以将其与其他省份的类似信息相结合:
Here is another example. This example shows how to use the three related functions HLL_ACCUMULATE, HLL_ESTIMATE, and HLL_COMBINE.
创建简单的表和数据:
Create a table that contains the “state” that represents the current
approximate cardinality information for the table named sequence_demo:
Now create a second table and add data. (In a more realistic situation, the user could have loaded more data into the first table and divided the data into non-overlapping sets based on the time that the data was loaded.)
仅获取新数据的“状态”信息。
合并两批行的“状态”信息:
获取行组合集的近似基数: