Aggregate functions¶

Aggregate functions operate on values across rows to perform mathematical calculations such as sum, average, counting, minimum/maximum values, standard deviation, and estimation, as well as some non-mathematical operations.

An aggregate function takes multiple rows (actually, zero, one, or more rows) as input and produces a single output. In contrast, scalar functions take one row as input and produce one row (one value) as output.

An aggregate function always returns exactly one row, even when the input contains zero rows. Typically, if the input contains zero rows, the output is NULL. However, an aggregate function could return 0, an empty string, or some other value when passed zero rows.

List of functions (by sub-category)¶

Function Name	Notes
General Aggregation
ANY_VALUE
AVG
CORR
COUNT
COUNT_IF
COVAR_POP
COVAR_SAMP
LISTAGG
MAX
MAX_BY
MEDIAN
MIN
MIN_BY
MODE
PERCENTILE_CONT	Uses different syntax than the other aggregate functions.
PERCENTILE_DISC	Uses different syntax than the other aggregate functions.
STDDEV, STDDEV_SAMP	STDDEV and STDDEV_SAMP are aliases.
STDDEV_POP
SUM
VAR_POP
VAR_SAMP
VARIANCE_POP	Alias for VAR_POP.
VARIANCE , VARIANCE_SAMP	Alias for VAR_SAMP.
Bitwise Aggregation
BITAND_AGG
BITOR_AGG
BITXOR_AGG
Boolean Aggregation
BOOLAND_AGG
BOOLOR_AGG
BOOLXOR_AGG
Hash
HASH_AGG
Semi-structured Data Aggregation
ARRAY_AGG
OBJECT_AGG
Linear Regression
REGR_AVGX
REGR_AVGY
REGR_COUNT
REGR_INTERCEPT
REGR_R2
REGR_SLOPE
REGR_SXX
REGR_SXY
REGR_SYY
Statistics and Probability
KURTOSIS
SKEW
Counting Distinct Values
ARRAY_UNION_AGG
ARRAY_UNIQUE_AGG
BITMAP_ABSOLUTE_POSITION
BITMAP_AND
BITMAP_AND_AGG
BITMAP_BIT_POSITION
BITMAP_BUCKET_NUMBER
BITMAP_COUNT
BITMAP_CONSTRUCT_AGG
BITMAP_OR
BITMAP_OR_AGG
BITMAP_TO_ARRAY
Cardinality Estimation (using HyperLogLog)
APPROX_COUNT_DISTINCT	Alias for HLL.
DATASKETCHES_HLL
DATASKETCHES_HLL_ACCUMULATE
DATASKETCHES_HLL_COMBINE
DATASKETCHES_HLL_ESTIMATE	Not an aggregate function; uses scalar input from DATASKETCHES_HLL_ACCUMULATE or DATASKETCHES_HLL_COMBINE.
HLL
HLL_ACCUMULATE
HLL_COMBINE
HLL_ESTIMATE	Not an aggregate function; uses scalar input from HLL_ACCUMULATE or HLL_COMBINE.
HLL_EXPORT
HLL_IMPORT
Similarity Estimation (using MinHash)
APPROXIMATE_JACCARD_INDEX	Alias for APPROXIMATE_SIMILARITY.
APPROXIMATE_SIMILARITY
MINHASH
MINHASH_COMBINE
Frequency Estimation (using Space-Saving)
APPROX_TOP_K
APPROX_TOP_K_ACCUMULATE
APPROX_TOP_K_COMBINE
APPROX_TOP_K_ESTIMATE	Not an aggregate function; uses scalar input from APPROX_TOP_K_ACCUMULATE or APPROX_TOP_K_COMBINE.
Percentile Estimation (using t-Digest)
APPROX_PERCENTILE
APPROX_PERCENTILE_ACCUMULATE
APPROX_PERCENTILE_COMBINE
APPROX_PERCENTILE_ESTIMATE	Not an aggregate function; uses scalar input from APPROX_PERCENTILE_ACCUMULATE or APPROX_PERCENTILE_COMBINE.
Aggregation Utilities
GROUPING	Not an aggregate function, but can be used in conjunction with aggregate functions to determine the level of aggregation for a row produced by a GROUP BY query.
GROUPING_ID	Alias for GROUPING.
AI Functions
AI_AGG
AI_SUMMARIZE_AGG
Vector Aggregation
VECTOR_AVG
VECTOR_MAX
VECTOR_MIN
VECTOR_SUM
Semantic views
AGG

Introductory example¶

The following example illustrates the difference between an aggregate function (AVG) and a scalar function (COS). The scalar function returns one output row for each input row, while the aggregate function returns one output row for multiple input rows:

Create a table and populate it with values:

CREATE TABLE simple (x INTEGER, y INTEGER);
INSERT INTO simple (x, y) VALUES
    (10, 20),
    (20, 44),
    (30, 70);

Query the table:

SELECT x, y
    FROM simple
    ORDER BY x,y;

+----+----+
|  X |  Y |
|----+----|
| 10 | 20 |
| 20 | 44 |
| 30 | 70 |
+----+----+

The scalar function returns one output row for each input row.

SELECT COS(x)
    FROM simple
    ORDER BY x;

+---------------+
|        COS(X) |
|---------------|
| -0.8390715291 |
|  0.4080820618 |
|  0.1542514499 |
+---------------+

The aggregate function returns one output row for multiple input rows:

SELECT SUM(x)
    FROM simple;

+--------+
| SUM(X) |
|--------|
|     60 |
+--------+

Aggregate functions and NULL values¶

Some aggregate functions ignore NULL values. For example, AVG calculates the average of values 1, 5, and NULL to be 3, based on the following formula:

(1 + 5) / 2 = 3

In both the numerator and the denominator, only the two non-NULL values are used.

If all of the values passed to the aggregate function are NULL, then the aggregate function returns NULL.

Some aggregate functions can be passed more than one column. For example:

SELECT COUNT(col1, col2) FROM table1;

In these instances, the aggregate function ignores a row if any individual column is NULL.

For example, in the following query, COUNT returns 1, not 4, because three of the four rows contain at least one NULL value in the selected columns:

Create a table and populate it with values:

CREATE OR REPLACE TABLE test_null_aggregate_functions (x INT, y INT);
INSERT INTO test_null_aggregate_functions (x, y) VALUES
  (1, 2),         -- No NULLs.
  (3, NULL),      -- One but not all columns are NULL.
  (NULL, 6),      -- One but not all columns are NULL.
  (NULL, NULL);   -- All columns are NULL.

Query the table:

SELECT COUNT(x, y) FROM test_null_aggregate_functions;

+-------------+
| COUNT(X, Y) |
|-------------|
|           1 |
+-------------+

If SUM is called with an expression that references two or more columns, and if one or more of those columns is NULL, then the expression evaluates to NULL, and the row is ignored:

SELECT SUM(x + y) FROM test_null_aggregate_functions;

+------------+
| SUM(X + Y) |
|------------|
|          3 |
+------------+

This behavior differs from the behavior of GROUP BY, which does not discard rows when some columns are NULL:

SELECT x AS X_COL, y AS Y_COL
  FROM test_null_aggregate_functions
  GROUP BY x, y;

+-------+-------+
| X_COL | Y_COL |
|-------+-------|
|     1 |     2 |
|     3 |  NULL |
|  NULL |     6 |
|  NULL |  NULL |
+-------+-------+