- Categories:
Aggregate functions (General)
ACCUMULATE¶
Returns a custom aggregate value computed by four user-defined SQL lambda functions: initialize, accumulate, combine, and terminate. ACCUMULATE follows the map-reduce aggregation model and integrates with GROUP BY, HAVING, and subqueries the same way built-in aggregates do.
ACCUMULATE is especially useful for prototyping and one-off aggregations that no built-in aggregate covers. For performance-sensitive workloads, prefer built-in aggregates when possible, or consider whether a combination of built-in aggregates and joins can achieve the same result. ACCUMULATE with OBJECT, ARRAY, or VARIANT state types carries additional performance overhead.
Syntax¶
Arguments¶
input_exprExpression evaluated once per non-NULL input row. The resulting value is passed to each lambda as the input value.
initialize_lambdaLambda with signature
(value) -> <state_expr>. Called once per non-NULL input row to produce the initial partial state from that row’s value.accumulate_lambdaLambda with signature
(state, value) -> <state_expr>. Folds a new input value into an existing partial state and returns the updated state.combine_lambdaLambda with signature
(state1, state2) -> <state_expr>. Merges two partial states produced by parallel workers. Must be associative.terminate_lambdaLambda with signature
(state) -> <output_expr>. Converts the final merged state into the result value returned to the query.
Lambda argument names are arbitrary. Type annotations are optional; see Type inference for details.
Returns¶
Returns the value produced by terminate_lambda. The data type matches
the return type of terminate_lambda. Returns NULL if all input rows
are NULL or the input set is empty.
Usage notes¶
NULL input rows are silently skipped before any lambda is called, consistent with standard SQL aggregate behavior. If all input rows are NULL or the input set is empty, the result is NULL.
ACCUMULATE has no persistent form. There’s no CREATE AGGREGATE FUNCTION or equivalent DDL. To reuse an aggregation, wrap it in a view, CTE, or stored procedure.
Lambdas may only reference their declared parameters. References to columns from the outer query aren’t allowed. To include an outer-query value, project it into the input expression or precompute it in a CTE.
The following example causes a compilation error because
column2is not a lambda parameter:The following function classes aren’t allowed inside lambdas:
Function class
Examples
Aggregate functions
SUM, AVG, COUNT
Window functions
ROW_NUMBER() OVER (…)
Non-deterministic functions
RANDOM(), UUID_STRING()
Type inference¶
Lambda argument types are optional. When omitted, types are inferred from the input expression and propagated through the lambda chain.
Explicit types aren’t coerced across lambdas. If you annotate a type on any lambda argument, that annotation is authoritative for the state type at that position. If annotations across the four lambdas are inconsistent and can’t be reconciled (for example,
initializereturns ARRAY butaccumulatedeclares the state as INT), compilation fails.You can omit all type annotations, or annotate only some arguments; the compiler infers and widens types automatically. Mixing annotated and unannotated arguments is allowed as long as the explicit annotations are consistent.
The state type (the type flowing between initialize, accumulate, and combine) and the output type (the return type of terminate) are tracked independently and may differ.
The input expression is implicitly cast to the type expected by the initialize lambda’s
valueargument. For example, if the input column is INT andvalueis declared STRING, the cast is applied automatically.
Examples¶
Compute the sum of a column, simulating the equivalent behavior of the existing SUM(c1) aggregate function:
Compute the product of all the values in a group.
Compute the mean using an ARRAY to track the running sum and count:
Compute the mean using a structured OBJECT as state:
Find the shortest string in a column:
Use ACCUMULATE with GROUP BY:
Call a UDF inside a lambda: