Categories:

Aggregate functions (Cardinality Estimation) , Window functions (Cardinality Estimation)

HLL_ACCUMULATE

在聚合结束时返回 HyperLogLog 状态。

For more information about HyperLogLog, see Estimating the Number of Distinct Values.

HLL discards its intermediate state when the final cardinality estimate is returned. In advanced use cases, such as incremental cardinality estimation during bulk loading, one may want to keep the intermediate state. The intermediate state can later be combined (merged) with other intermediate states, or can be exported to external tools.

In contrast to HLL, HLL_ACCUMULATE does not return a cardinality estimate. Instead, it skips the final estimation step and returns the algorithm state itself. The state is a binary of at most 4096 Bytes. For more information, see Estimating the Number of Distinct Values.

See also:

HLL_COMBINE , HLL_ESTIMATE

语法

聚合函数

HLL_ACCUMULATE( [ DISTINCT ] <expr> )

HLL_ACCUMULATE(*)

窗口函数

HLL_ACCUMULATE( [ DISTINCT ] <expr> ) OVER ( [ PARTITION BY <expr1> ] )

HLL_ACCUMULATE(*) OVER ( [ PARTITION BY <expr1> ] )

For details about the OVER clause, see Window function syntax and usage.

实参

expr

要估计基数(非重复值数)的表达式。这通常是列名,但也可以是更通用的表达式。

使用说明

示例

这显示了估计加拿大各省不同邮政编码数量的一步。在此步骤中,我们计算了马尼托巴省不同邮政编码的近似数量,并存储了计算“状态”的内部表示,稍后我们可以将其与其他省份的类似信息相结合:

CREATE TABLE temporary_hll_state_for_manitoba AS
  SELECT HLL_ACCUMULATE(postal_code) AS h_a_p_c
    FROM postal_data
    WHERE province = 'Manitoba';

Here is another example. This example shows how to use the three related functions HLL_ACCUMULATE, HLL_ESTIMATE, and HLL_COMBINE.

创建简单的表和数据:

CREATE OR REPLACE SEQUENCE seq92;
CREATE OR REPLACE TABLE sequence_demo (c1 INTEGER DEFAULT seq92.nextval, dummy SMALLINT);
INSERT INTO sequence_demo (dummy) VALUES (0);

INSERT INTO sequence_demo (dummy) SELECT dummy FROM sequence_demo;
INSERT INTO sequence_demo (dummy) SELECT dummy FROM sequence_demo;
INSERT INTO sequence_demo (dummy) SELECT dummy FROM sequence_demo;

Create a table that contains the “state” that represents the current approximate cardinality information for the table named sequence_demo:

CREATE OR REPLACE TABLE resultstate1 AS (
  SELECT HLL_ACCUMULATE(c1) AS rs1
    FROM sequence_demo);

Now create a second table and add data. (In a more realistic situation, the user could have loaded more data into the first table and divided the data into non-overlapping sets based on the time that the data was loaded.)

CREATE OR REPLACE TABLE test_table2 (c1 INTEGER);
INSERT INTO test_table2 (c1) SELECT c1 + 4 FROM sequence_demo;

仅获取新数据的“状态”信息。

CREATE OR REPLACE TABLE resultstate2 AS
  (SELECT HLL_ACCUMULATE(c1) AS rs1
     FROM test_table2);

合并两批行的“状态”信息:

CREATE OR REPLACE TABLE combined_resultstate (c1) AS
  SELECT HLL_COMBINE(rs1) AS apc1
    FROM (
      SELECT rs1 FROM resultstate1
      UNION ALL
      SELECT rs1 FROM resultstate2
    );

获取行组合集的近似基数:

SELECT HLL_ESTIMATE(c1)
  FROM combined_resultstate;
+------------------+
| HLL_ESTIMATE(C1) |
%------------------%
|               12 |
+------------------+