计算非重复值的数量
若要计算具有非重复值的行数,可以使用以下方法之一:
- Call the SQL COUNT function with the
DISTINCTkeyword. - If you just need an approximate count of distinct values, you can use the HyperLogLog functions
(e.g.
APPROX_COUNT_DISTINCT). For details, see Estimating the Number of Distinct Values. - If you are counting distinct values for hierarchical aggregations (e.g. multiple grouping sets, rollups, or cubes), you can
improve performance by using one of the following approaches (rather than using
COUNT(DISTINCT <expr>)):
通过这种方法,您可以使用位图函数生成位图,来识别列中的不同整数值。一个位图最多可以表示 32,768 个非重复值,因此,如果非重复值的数量超过 32,768,则此方法需要“分桶”(使用多个位图)。
For details, see Using Bitmaps to Compute Distinct Values for Hierarchical Aggregations.
-
Produce arrays that contain the distinct values.
With this approach, you use the aggregate functions that produce arrays containing the unique values in a column. You can then call ARRAY_SIZE to get the count of values.
This approach works for values of any data type (e.g. VARIANT) and does not require “bucketizing”, unless the size of the data in the ARRAY exceeds the maximum size of an ARRAY.
For details, see Using Arrays to Compute Distinct Values for Hierarchical Aggregations.
后续主题: