How operators incrementally refresh¶
The following table outlines how each operator is incrementalized (that is, how it’s transformed into a new query fragment that generates changes instead of full results) and its performance and other important factors to consider.
Operator |
Incrementalization |
Considerations |
---|---|---|
SELECT <scalar expressions> |
Incrementalized by applying expressions to changed rows. |
Performs well, no special considerations. |
WHERE <scalar expressions> |
Incrementalized by evaluating the predicate on each changed row, and including only those for which the predicate is true. |
Generally performs well. Cost scales linearly with size of changes. Refreshing a dynamic table with a highly selective WHERE expression may require warehouse uptime, even if the resulting dynamic table doesn’t change. This is because a warehouse may be required to determine which changes in the sources satisfy the predicate. |
FROM <base table> |
Incrementalized by scanning micro-partitions that were added to or removed from the table since the last refresh. |
Cost scales linearly with the volume of data in the added or removed micro-partitions. Recommendations:
|
<query> UNION ALL <query> |
Incrementalized by taking union-all of changes on each side. |
Performs well, no special considerations. |
WITH <CTE list> <query> |
Incrementalized by computing the changes of each common table expression. |
WITH makes complex queries easier to read but be cautious of making the definition of a single dynamic table too complex. For more information, see Chain together pipelines of dynamic tables and Optimizing incremental refresh mode performance for complex dynamic tables. |
Scalar Aggregates |
Scalar aggregates are currently not incrementalized efficiently. When their input changes, they’re fully recomputed. |
|
GROUP BY <keys> |
Incrementalized by recomputing aggregates for every grouping key that changed. |
Ensure the source data is clustered by the grouping keys and the changes comprise a small fraction (roughly <5%) of the grouping keys. If the grouping key contains a compound expression rather than a base column, incremental refreshes might have to scan a large amount of data. To reduce the size of these scans, materialize the expression in one dynamic table, and then apply the grouping operation on the materialized column in another dynamic table. For instance, take the following compound statement: CREATE DYNAMIC TABLE sums
AS
SELECT date_trunc(minute, ts), sum(c1) FROM table
GROUP BY 1;
The above statement can be optimized as follows: CREATE DYNAMIC TABLE intermediate
AS
SELECT date_trunc(minute, ts) ts_min, c1 FROM table;
CREATE DYNAMIC TABLE sums
AS
SELECT ts_min, sum(c1) FROM intermediate
GROUP BY 1;
|
DISTINCT |
Equivalent to GROUP BY ALL with no aggregate functions. |
Often represents a substantial optimization opportunity. It’s a common practice to apply DISTINCT liberally throughout queries to avoid accidentally introducing duplicates. In incremental refresh, DISTINCT operations consume resources on a recurring basis because duplicates have to be checked during every refresh. When optimizing performance, finding and removing redundant DISTINCTs can be an easy win. You can do this by eliminating duplicates further upstream and considering join cardinalities carefully. |
<fn> OVER <window> |
Incrementalized by recomputing the window function for every partition key that changed. |
Ensure there’s a PARTITION BY clause in your query and the source data is clustered by partition keys. Also ensure the changes comprise a small fraction (roughly <5%) of the partitions. |
<left> INNER JOIN <right> |
Incrementalized by joining the changes on the left side with the right, then joining the changes on the right side with the left. |
If one of the sides of the join is small, performance is likely good. If one of the sides of the join changes frequently, clustering the other side by the join key might improve performance. |
<left> [{LEFT | RIGHT | FULL }] OUTER JOIN <right> |
Incrementalized by factoring into an inner-join union-all-ed with one or two NOT EXISTS to compute NULLs for non-matches. This factored query is then incrementalized. The inner join is incrementalized as shown. The not-exists are incrementalized by checking if the changed keys on one side already existed on the other side. |
Recommendations:
|
LATERAL FLATTEN |
Incrementalized by applying the flatten operator to changed rows. |
Generally performs well. Cost scales linearly with size of changes. Same general considerations as the FROM <base table> operator. |