Aggregate functions¶
Aggregate functions operate on values across rows to perform mathematical calculations such as sum, average, counting, minimum/maximum values, standard deviation, and estimation, as well as some non-mathematical operations.
An aggregate function takes multiple rows (actually, zero, one, or more rows) as input and produces a single output. In contrast, scalar functions take one row as input and produce one row (one value) as output.
An aggregate function always returns exactly one row, even when the input contains zero rows. Typically, if
the input contains zero rows, the output is NULL. However, an aggregate function could return 0, an empty string, or
some other value when passed zero rows.
List of functions (by sub-category)¶
Introductory example¶
The following example illustrates the difference between an aggregate function (AVG) and a scalar function (COS). The scalar function returns one output row for each input row, while the aggregate function returns one output row for multiple input rows:
Create a table and populate it with values:
Query the table:
The scalar function returns one output row for each input row.
The aggregate function returns one output row for multiple input rows:
Aggregate functions and NULL values¶
Some aggregate functions ignore NULL values. For example, AVG calculates the average of values 1, 5, and NULL to be 3,
based on the following formula:
(1 + 5) / 2 = 3
In both the numerator and the denominator, only the two non-NULL values are used.
If all of the values passed to the aggregate function are NULL, then the aggregate function returns NULL.
Some aggregate functions can be passed more than one column. For example:
In these instances, the aggregate function ignores a row if any individual column is NULL.
For example, in the following query, COUNT returns 1, not 4, because three of the four rows contain at least one NULL
value in the selected columns:
Create a table and populate it with values:
Query the table:
If SUM is called with an expression that references two or more columns, and if one or more of those columns is NULL, then the expression evaluates to NULL, and the row is ignored:
This behavior differs from the behavior of GROUP BY, which does not discard rows when some columns are NULL: