Categories:

Aggregate functions (Cardinality Estimation) , Window function syntax and usage

DATASKETCHES_HLL_ESTIMATE

返回给定草图的基数估计值。

This function is a version of the HLL HyperLogLog function that can read binary sketches in the format used by Apache DataSketches. For more information, see the Apache DataSketches documentation (https://datasketches.apache.org/docs/HLL/HllSketches.html).

A sketch produced by the DATASKETCHES_HLL_COMBINE function can be used to compute a cardinality estimate using the DATASKETCHES_HLL_ESTIMATE function.

语法

DATASKETCHES_HLL_ESTIMATE( <binary_sketch> )

实参

binary_sketch

包含二进制格式草图信息的表达式。

返回

该函数返回 DOUBLE 类型的值。

If the input is empty, the output is 0.0.

Note

This function returns a value of a different type than the HLL_ESTIMATE function, which returns an INTEGER value.

示例

Create a table and insert values:

CREATE OR REPLACE TABLE datasketches_demo(v INT, g INT);

INSERT INTO datasketches_demo SELECT 1, 1;
INSERT INTO datasketches_demo SELECT 2, 1;
INSERT INTO datasketches_demo SELECT 2, 1;
INSERT INTO datasketches_demo SELECT 2, 1;
INSERT INTO datasketches_demo SELECT 1, 2;
INSERT INTO datasketches_demo SELECT 1, 2;
INSERT INTO datasketches_demo SELECT 4, 2;
INSERT INTO datasketches_demo SELECT 4, 2;
INSERT INTO datasketches_demo SELECT 5, 2;

以下示例使用表中的数据:

返回累积二进制草图的基数估计值

以下示例执行以下操作:

  1. The DATASKETCHES_HLL_ACCUMULATE function creates two binary sketches for the data in column v, grouped by the values 1 and 2 in column g
  2. DATASKETCHES_HLL_ESTIMATE 函数返回每张累积草图的基数估计值。
WITH
  accumulated AS (
    SELECT g,
           DATASKETCHES_HLL_ACCUMULATE(v) AS accumulated_sketches
      FROM datasketches_demo
      GROUP BY g)
SELECT g, DATASKETCHES_HLL_ESTIMATE(accumulated_sketches) AS accumulated_estimate
  FROM accumulated;
+---+----------------------+
| G | ACCUMULATED_ESTIMATE |
|---+----------------------|
| 1 |          2.000000005 |
| 2 |          3.000000015 |
+---+----------------------+

You can see values of the accumulated sketches in the example in DATASKETCHES_HLL_ACCUMULATE.

返回组合二进制草图的基数估计值

以下示例执行以下操作:

  1. The DATASKETCHES_HLL_ACCUMULATE function creates two binary sketches for the data in column v, grouped by the values 1 and 2 in column g
  2. DATASKETCHES_HLL_COMBINE 函数将这些二进制草图组合起来,使其统一。
  3. DATASKETCHES_HLL_ESTIMATE 函数返回统一草图的基数估计值。
WITH
  accumulated AS (
    SELECT g,
           DATASKETCHES_HLL_ACCUMULATE(v) AS accumulated_sketches
      FROM datasketches_demo
      GROUP BY g),
  combined AS (
    SELECT DATASKETCHES_HLL_COMBINE(accumulated_sketches) AS unified
      FROM accumulated)
SELECT DATASKETCHES_HLL_ESTIMATE(unified) AS unified_estimate
  FROM combined;
+------------------+
| UNIFIED_ESTIMATE |
%------------------%
|       4.00000003 |
+------------------+

You can see value of the combined sketches in the example in DATASKETCHES_HLL_COMBINE.