- Categories:
Aggregate functions (Cardinality Estimation) , Window functions
DATASKETCHES_ HLL¶
Returns an approximation of the distinct cardinality of the input (that is, DATASKETCHES_HLL(col1)
returns an approximation of COUNT(DISTINCT col1)).
This function is a version of the HLL HyperLogLog function that can read binary sketches in the format used by Apache DataSketches. For more information, see the Apache DataSketches documentation (https://datasketches.apache.org/docs/HLL/HllSketches.html).
Syntax¶
Aggregate function
Window function
Required arguments¶
expr1The expression for which you want to know the number of distinct values.
Optional arguments¶
max_log_kThe maximum value, in log2, of K to initialize the datasketches HLL object. Specify an INTEGER value between 4 and 21, inclusive. For more information, see the Apache DataSketches documentation (https://datasketches.apache.org/docs/HLL/HllSketches.html).
Default: 12
expr2The optional expression used to group rows into partitions.
Returns¶
The function returns a value of type DOUBLE.
If the input is empty, the output is 0.0.
Usage notes¶
-
DISTINCT is supported syntactically, but has no effect.
-
The function supports arguments that are values of the following data types:
-
String & binary data types (for example, VARCHAR and BINARY).
For example, the following function calls are supported:
-
Data types for floating-point numbers (for example, FLOAT and DOUBLE)
The DataSketches library casts these values to DOUBLE values.
-
Data types for fixed-point numbers (for example, INTEGER and NUMERIC).
The function only supports numeric types with a scale of 0. However, you can cast numeric values with a scale other than 0 to a data types for a floating-point number.
The DataSketches library casts these values in the range of a 64-bit signed INTEGER to a 64-bit signed INTEGER value.
The DataSketches library doesn’t directly cast INTEGER values exceeding the 64-bit signed INTEGER range (such as 128-bit integer values). However, Snowflake still supports these values by automatically converting them to DOUBLE values, which DataSketches supports. This behavior is identical to the behavior of the
datasketches-pythonlibrary.
Values of other data types aren’t supported. For example, VARIANT and ARRAY values aren’t supported.
-
-
For information about NULL values and aggregate functions, see Aggregate functions and NULL values.
-
When this function is called as a window function, it doesn’t support:
- An ORDER BY clause within the OVER clause.
- Explicit window frames.
Examples¶
Create a table and insert values:
The following examples use the data in the table.
Return the estimated cardinality of grouped data in a column¶
Use the DATASKETCHES_HLL function to approximate the distinct cardinality of the data in column v
grouped by the values in column g.
The output shows that for value 1 in column g, there are about two distinct values in column v
(that is, 1 and 2). For value 2 in column g, there are about three distinct values in column v
(that is, 1, 4, and 5). The COUNT(DISTINCT v)) call returns exact number of distinct
values instead of an estimate.
If you use the DATASKETCHES_HLL_ACCUMULATE function to create binary sketches from the grouped data, the DATASKETCHES_HLL_ESTIMATE function returns the same results for the accumulated sketches. For an example, see Return the cardinality estimate for accumulated binary sketches.
Return the estimated cardinality of all data in a column¶
Use the DATASKETCHES_HLL function to approximate the distinct cardinality of all of the data in column v.
The output shows that there are about four distinct values in column v (that is, 1, 2, 4, and 5).
The COUNT(DISTINCT v)) call returns exact number of distinct values instead of an estimate.
If you use the DATASKETCHES_HLL_ACCUMULATE function to create binary sketches from the grouped data, and then use the DATASKETCHES_HLL_COMBINE function to combine the sketches into one unified sketch, the DATASKETCHES_HLL_ESTIMATE function returns the same results for the unified sketch. For an example, see Return the cardinality estimate for combined binary sketches.