类别:

聚合函数 (百分位估计)、窗口函数的语法和用法

APPROX_PERCENTILE_ACCUMULATE

在聚合结束时返回 T-Digest 状态(作为 JSON 对象)的内部表示。(有关 T-Digest 的更多信息,请参阅:估算百分位值

The function APPROX_PERCENTILE discards this internal, intermediate state when the final percentile estimate is returned. However, in certain advanced use cases, such as estimating incremental percentile during bulk loading, you may wish to keep the intermediate state, in which case you would use APPROX_PERCENTILE_ACCUMULATE instead of APPROX_PERCENTILE.

APPROX_PERCENTILE_ACCUMULATE does not return a percentile value. Instead, it returns the algorithm state itself. The intermediate state can later be:

  • 与来自独立但相关的数据批次的其他中间状态相结合(即合并)。

  • 由其他直接对中间状态进行操作的函数处理,例如,APPROX_PERCENTILE_ESTIMATE。(有关示例,请参阅下面的“示例”部分。)

  • 导出到外部工具。

另请参阅:

APPROX_PERCENTILE_COMBINEAPPROX_PERCENTILE_ESTIMATE

语法

APPROX_PERCENTILE_ACCUMULATE( <expr> )

实参

expr

计算结果为数值的有效表达式,例如列名。

使用说明

  • 百分位数仅适用于数值,因此 expr 应生成属于数字或可转换为数字的值。

  • 不支持小数浮点数 (DECFLOAT) 值。

示例

Store the t-Digest state of the testtable.c1 column in a table and then use the state to compute percentiles:

CREATE OR REPLACE TABLE resultstate AS
  SELECT APPROX_PERCENTILE_ACCUMULATE(c1) AS s
    FROM testtable;

SELECT APPROX_PERCENTILE_ESTIMATE(s, 0.015)
  FROM resultstate;

SELECT APPROX_PERCENTILE_ESTIMATE(s, 0.2)
  FROM resultstate;

Here is a more extensive example that shows the usage of all three related functions: APPROX_PERCENTILE_ACCUMULATE, APPROX_PERCENTILE_ESTIMATE, and APPROX_PERCENTILE_COMBINE.

创建简单的表和数据:

CREATE OR REPLACE TABLE test_table1 (c1 INTEGER);
INSERT INTO test_table1 (c1) VALUES (1), (2), (3), (4);

Create a table that contains the "state" that represents the current approximate percentile information for the table named test_table1:

CREATE OR REPLACE TABLE resultstate1 AS (
  SELECT APPROX_PERCENTILE_ACCUMULATE(c1) AS rs1
    FROM test_table1);

使用该状态信息显示中值的当前估计值(0.5 表示我们希望该值位于第 50 百分位数):

SELECT APPROX_PERCENTILE_ESTIMATE(rs1, 0.5)
  FROM resultstate1;
+--------------------------------------+
| APPROX_PERCENTILE_ESTIMATE(RS1, 0.5) |
|--------------------------------------|
|                                  2.5 |
+--------------------------------------+

现在创建另一个表并添加数据。(在更实际的情况下,用户可以将更多数据加载到第一个表中,并根据加载数据的时间将数据划分为非重叠集)。

CREATE OR REPLACE TABLE test_table2 (c1 INTEGER);
INSERT INTO test_table2 (c1) VALUES (5), (6), (7), (8);

仅获取新数据的“状态”信息。

CREATE OR REPLACE TABLE resultstate2 AS
  (SELECT APPROX_PERCENTILE_ACCUMULATE(c1) AS rs1
     FROM test_table2);

合并两批行的“状态”信息:

CREATE OR REPLACE TABLE combined_resultstate (c1) AS
  SELECT APPROX_PERCENTILE_COMBINE(rs1) AS apc1
    FROM (
      SELECT rs1 FROM resultstate1
      UNION ALL
      SELECT rs1 FROM resultstate2
    );

获取合并后的行集的近似 Top K 值:

SELECT APPROX_PERCENTILE_ESTIMATE(c1, 0.5)
  FROM combined_resultstate;
+-------------------------------------+
| APPROX_PERCENTILE_ESTIMATE(C1, 0.5) |
|-------------------------------------|
|                                 4.5 |
+-------------------------------------+