Estimating Percentile Values

Snowflake uses an improved version of the t-Digest algorithm, a space and time efficient way of estimating approximate percentile values in data sets.

Overview

Snowflake provides an improved version of an implementation of the t-Digest algorithm papers (https://github.com/tdunning/t-digest/tree/master/docs/t-digest-paper) by Dunning and Ertl. It has been implemented through the APPROX_PERCENTILE family of functions.

As documented, the algorithm has a constant relative error. Note that the algorithm has substantial empirical support, but no rigorous proof of any accuracy guarantees.

SQL Functions

The following Aggregate functions are provided for using t-Digest to approximate percentile values:

Implementation Details

  • The estimation uses a constant amount of space regardless of the size of the input.

  • The t-Digest state is independent from the percentile value. This enables calculating the t-Digest state once, and then querying the state for multiple percentile values.

Language: English