计算非重复值的数量

若要计算具有非重复值的行数,可以使用以下方法之一:

  • Call the SQL COUNT function with the DISTINCT keyword.
  • If you just need an approximate count of distinct values, you can use the HyperLogLog functions (e.g. APPROX_COUNT_DISTINCT). For details, see Estimating the Number of Distinct Values.
  • If you are counting distinct values for hierarchical aggregations (e.g. multiple grouping sets, rollups, or cubes), you can improve performance by using one of the following approaches (rather than using COUNT(DISTINCT <expr>)):

通过这种方法,您可以使用位图函数生成位图,来识别列中的不同整数值。一个位图最多可以表示 32,768 个非重复值,因此,如果非重复值的数量超过 32,768,则此方法需要“分桶”(使用多个位图)。

For details, see Using Bitmaps to Compute Distinct Values for Hierarchical Aggregations.

后续主题: