Optimize queries for incremental refresh¶

Use this page when you design a new dynamic table query or want to optimize an existing one for incremental refresh. This guide shows which operators perform well, which need careful handling, and how to restructure queries for better performance.

For a complete list of which query constructs are supported for incremental refresh, see Supported queries for dynamic tables.

Performance expectations by operator¶

Before you optimize a dynamic table query, understand which operators benefit from incremental refresh and which can cause problems.

Note

Short queries (less than 10 seconds) might see smaller performance gains because of fixed overheads like query optimization and warehouse scheduling.

Operators that perform consistently well¶

These operators work efficiently with incremental refresh:

SELECT
WHERE
FROM <base table>
UNION ALL
QUALIFY [ RANK | ROW_NUMBER | DENSE_RANK ] … = 1

For details on how Snowflake processes each operator, see the operator reference table.

Operators affected by data locality¶

For these operators, performance depends on data locality, which is how you organize your data and where changes occur relative to your keys:

INNER JOIN
OUTER JOIN
GROUP BY
DISTINCT
OVER (window functions)

When changes affect only a small portion of grouping or partition keys, these operators perform well. Poor data locality or changes spread across many keys can make incremental refresh slower than full refresh.

For details on how Snowflake processes each operator, see the operator reference table.

Common optimization patterns¶

The following sections show common patterns to optimize queries that use locality-sensitive operators.

Optimize aggregations¶

When you use GROUP BY, Snowflake recomputes aggregates for every grouping key that contains changes. Performance depends on the following factors:

Data clustering: Source data clustered by grouping keys performs best.
Change distribution: Aim for changes that affect fewer than five percent of grouping keys.
Key complexity: Simple column references outperform compound expressions.

Problem: Compound expressions in grouping keys¶

This query performs poorly because the grouping key is an expression:

CREATE DYNAMIC TABLE hourly_sums
  TARGET_LAG = '1 hour'
  WAREHOUSE = my_warehouse
AS
  SELECT DATE_TRUNC('minute', ts), SUM(amount)
  FROM transactions
  GROUP BY 1;

Solution: Materialize the expression¶

Split into two dynamic tables to expose a simple grouping key:

CREATE DYNAMIC TABLE transactions_with_minute
  TARGET_LAG = DOWNSTREAM
  WAREHOUSE = my_warehouse
AS
  SELECT DATE_TRUNC('minute', ts) AS ts_minute, amount
  FROM transactions;

CREATE DYNAMIC TABLE hourly_sums
  TARGET_LAG = '1 hour'
  WAREHOUSE = my_warehouse
AS
  SELECT ts_minute, SUM(amount)
  FROM transactions_with_minute
  GROUP BY 1;

Now GROUP BY operates on a simple column, and the intermediate table benefits from better data locality.

Optimize joins¶

Join performance depends on which side changes and how you cluster data.

INNER JOIN: Snowflake joins changes from the left side with the right table, then joins changes from the right side with the left table. Joins perform well when one side is small or changes infrequently.

OUTER JOIN: Snowflake must also compute NULL values for non-matching rows. Which side changes significantly affects performance.

Problem: Large table on both sides with poor clustering¶

Neither source table is clustered by join key:

CREATE DYNAMIC TABLE order_details
  TARGET_LAG = '1 hour'
  WAREHOUSE = my_warehouse
AS
  SELECT o.order_id, o.customer_id, p.product_name, o.quantity
  FROM orders o
  JOIN products p ON o.product_id = p.product_id;

Solution: Cluster the table that changes less often¶

Cluster the dimension table by the join key. Then, the join benefits from better locality:

ALTER TABLE products CLUSTER BY (product_id);

CREATE DYNAMIC TABLE order_details
  TARGET_LAG = '1 hour'
  WAREHOUSE = my_warehouse
AS
  SELECT o.order_id, o.customer_id, p.product_name, o.quantity
  FROM orders o
  JOIN products p ON o.product_id = p.product_id;

For OUTER JOINs:

Put the table that changes more often on the LEFT side.
Minimize changes on the side opposite the OUTER keyword.
For FULL OUTER JOINs, good locality is critical on both sides.

Optimize window functions¶

Snowflake recomputes window functions for every partition key that contains changes. Optimize them similarly to GROUP BY.

Key requirements:

Always include a PARTITION BY clause. Window functions without PARTITION BY result in a full recomputation.
Cluster source data by partition keys.
Keep changes to fewer than five percent of partitions.

Problem: Window function without partition clustering¶

The source table isn’t clustered by the partition key:

CREATE DYNAMIC TABLE ranked_sales
  TARGET_LAG = '1 hour'
  WAREHOUSE = my_warehouse
AS
  SELECT
    region,
    salesperson,
    amount,
    RANK() OVER (PARTITION BY region ORDER BY amount DESC) as sales_rank
  FROM daily_sales;

Solution: Cluster by the partition key¶

Cluster the source table by the partition key so that the window function benefits from locality:

ALTER TABLE daily_sales CLUSTER BY (region);

CREATE DYNAMIC TABLE ranked_sales
  TARGET_LAG = '1 hour'
  WAREHOUSE = my_warehouse
AS
  SELECT
    region,
    salesperson,
    amount,
    RANK() OVER (PARTITION BY region ORDER BY amount DESC) as sales_rank
  FROM daily_sales;

Remove duplicates efficiently (DISTINCT vs QUALIFY)¶

Both DISTINCT and QUALIFY can remove duplicates, but they perform differently.

DISTINCT: Equivalent to GROUP BY ALL. Locality directly affects performance; poor locality causes slow refreshes.

QUALIFY with ROW_NUMBER = 1: Snowflake optimizes the pattern QUALIFY ROW_NUMBER() ... = 1 when it’s in the top-level projection of the dynamic table. This pattern consistently performs faster than full refresh.

The optimization works best when all PARTITION BY and ORDER BY columns in the OVER() clause are queryable and persisted in the dynamic table that is included in the top-level SELECT projection.

Recommendation: Use QUALIFY instead of DISTINCT when possible¶

The following example uses DISTINCT:

CREATE DYNAMIC TABLE unique_customers
  TARGET_LAG = '1 hour'
  WAREHOUSE = my_warehouse
AS
  SELECT DISTINCT customer_id, customer_name, email
  FROM customer_events;

The following example uses QUALIFY:

CREATE DYNAMIC TABLE unique_customers
  TARGET_LAG = '1 hour'
  WAREHOUSE = my_warehouse
AS
  SELECT customer_id, customer_name, email, event_time
  FROM customer_events
  QUALIFY ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY event_time DESC) = 1;

The QUALIFY version is more explicit about which duplicate to keep (the most recent) and performs consistently well.

Remove redundant DISTINCT operations¶

Each DISTINCT consumes resources on every refresh. When your data is already unique or you eliminate duplicates upstream, remove unnecessary DISTINCT clauses.

Operator reference¶

The following table explains how Snowflake processes each SQL operator during incremental refresh:

Operator	How Snowflake processes it	Performance notes
SELECT	Applies expressions to changed rows only.	Performs well. No special considerations.
WHERE	Evaluates the predicate on changed rows only.	Performs well. Cost scales linearly with changes. Note: A highly selective WHERE might require warehouse uptime even when the output doesn’t change.
FROM <table>	Scans micro-partitions that Snowflake added or removed since the last refresh.	Cost scales with the volume of changed partitions. Limit changes to about five percent of the source table.
UNION ALL	Takes the union of changes from each side.	Performs well. No special considerations.
WITH (CTEs)	Computes changes for each Common Table Expression.	Performs well, but avoid overly complex single-table definitions. Consider splitting into multiple dynamic tables.
Scalar aggregates	Fully recomputes the aggregate when input changes.	Avoid in performance-critical tables. Consider grouping by a constant instead.
GROUP BY	Recomputes aggregates for changed grouping keys.	Cluster source by grouping keys. Avoid compound expressions in keys. See Optimize aggregations.
DISTINCT	Equivalent to GROUP BY ALL.	Locality-sensitive. Consider using QUALIFY instead. See Remove duplicates efficiently (DISTINCT vs QUALIFY).
Window functions	Recomputes for changed partition keys.	Always include PARTITION BY. Cluster source by partition keys. See Optimize window functions.
INNER JOIN	Joins changes from each side with the other table.	Performs well when one side is small. Cluster the less-frequently-changing side. See Optimize joins.
OUTER JOIN	Combines inner join with NOT EXISTS queries for NULL computation.	Most locality-sensitive operator. See Optimize joins.
LATERAL FLATTEN	Applies flatten to changed rows only.	Performs well. Cost scales linearly with changes.
QUALIFY with ranking	Uses an optimized path for ROW_NUMBER/RANK/DENSE_RANK … = 1.	Highly efficient. Place QUALIFY at the top-level projection of the dynamic table.

Dynamic table performance and optimization¶

Learn how to optimize and monitor dynamic tables for speed and cost-efficiency. This section provides foundational concepts and links to more detailed topics.

Dynamic table performance refers to how quickly and efficiently a dynamic table refresh completes. A well-performing dynamic table refreshes fast enough to meet its target lag without consuming excessive compute resources.

Why performance matters¶

Data freshness

Dynamic tables refresh based on a target lag that you specify, which is the maximum allowed delay between updates to source tables and the dynamic table’s content. When refreshes take too long, your pipeline might not meet your freshness requirements.

For example, setting a target lag of five minutes when your refresh takes eight minutes means your pipeline can’t maintain the required freshness.

Cost efficiency

Dynamic tables require virtual warehouses for refreshes, which consume credits. Poorly optimized dynamic tables might scan more data than necessary, trigger full refreshes when incremental would suffice, or require larger warehouses to complete within target lag windows.

For more information about costs, see Understanding costs for dynamic tables.

Performance decisions¶

Changes that affect dynamic table performance fall into two categories based on when you can make them:

	Design changes	Adjustments
When	Before you create a pipeline.	After your pipeline is running.
Impact	High	Medium
Flexibility	Hard to change; requires recreating tables.	Easy to change; no need to recreate tables.
Examples	Query structure, refresh mode, pipeline design.	Warehouse size, clustering keys, target lag.

For detailed guidance on both categories, see Design changes and Adjustments.

Get started¶

To get started with dynamic table performance optimization, try the hands-on tutorial:

Tutorial: Optimize dynamic table performance for SCD Type 1 workloads: Learn how to identify and resolve performance bottlenecks in a dynamic table pipeline. This tutorial shows how different SQL patterns affect incremental refresh and how to use the QUALIFY clause to efficiently remove duplicate rows.

Topics in this section¶

Monitor dynamic tables: How to monitor refresh performance, analyze query profiles, and track key metrics.
Understanding immutability constraints: How to use immutability constraints to mark historical data as unchanging and reduce refresh scope.

Optimize dynamic table performance¶

This topic covers techniques for optimizing dynamic table performance, organized into design changes and adjustments.

Before you optimize a dynamic table, you might want to diagnose the cause of slow refreshes. See Diagnose slow refreshes for a step-by-step workflow.

Design changes
Adjustments

For background on performance categories, see Performance decisions.

Design changes¶

Design changes require you to recreate a dynamic table, but have greater impact on performance.

Note

Group changes and recreate tables together instead of making incremental modifications.

Choose a refresh mode¶

The refresh mode you choose has a significant impact on performance because it determines how much data Snowflake processes during each refresh. For information about how each mode works, see Dynamic table refresh modes.

Important

Dynamic tables with incremental refresh can’t be downstream from dynamic tables that use full refresh.

Use the following decision process to select a refresh mode:

Review your query against the list of supported query constructs. Not all query operators support incremental refresh. For operators that are supported, see Operator reference to understand how they affect performance.
Estimate your change volume, which is the percentage of your data that changes between refreshes. Incremental refresh, for example, works best when less than five percent of data changes.
Evaluate your data locality. Check whether your source tables are clustered by the keys that you plan to use in joins, GROUP BY, or PARTITION BY clauses in your dynamic table query. Poor locality reduces incremental refresh efficiency. To improve locality, see Improve data locality.

Choose a mode based on the following table:

Mode	When to use
Incremental	Your query uses supported operators, less than five percent of data changes between refreshes, and your source tables have good data locality. Note Incremental refresh can still scan source tables, not just the rows that changed. For example, a new row in one side of a join must match against all rows in the other table. Even a small number of changes can trigger significant work.
Full	A large percentage of data changes, your query uses unsupported operators, or your data lacks locality.
Auto	You’re prototyping or testing. Avoid AUTO in production because its behavior might change between Snowflake releases.

When you create a dynamic table, specify the mode with REFRESH_MODE = INCREMENTAL or REFRESH_MODE = FULL in your CREATE DYNAMIC TABLE statement.

To check which refresh mode a dynamic table uses, see Refresh mode.

Optimize your queries and pipeline¶

The structure of your dynamic table queries and pipeline directly affects refresh performance. Use the following guidelines to reduce the work during each refresh.

Simplify individual queries¶

Use inner joins instead of outer joins. Inner joins perform better with incremental refresh. Verify referential integrity in your source data so that you can avoid outer joins.
Avoid unnecessary operations. Remove redundant DISTINCT clauses and unused columns. Exclude wide columns (like large JSON blobs) that aren’t frequently queried.
Remove duplicates efficiently. Use ranking functions instead of DISTINCT where possible.

For detailed guidance on how specific SQL operators affect incremental refresh performance, see Operator reference.

Note

For a comprehensive example, see Tutorial: Optimize dynamic table performance for SCD Type 1 workloads.

Split transformations across dynamic tables¶

Breaking complex transformations into multiple dynamic tables makes it easier to identify bottlenecks and improves debugging. With immutability constraints, you can also use different refresh modes for different stages.

Add filters early. Apply WHERE clauses in the dynamic tables closest to your source data so that downstream tables process fewer rows.
To avoid repeated DISTINCT operations in downstream tables, remove duplicate rows earlier in your pipeline.
Reduce the number of operations per table. Move joins, aggregations, or window functions into intermediate dynamic tables instead of combining them all in one query.
Materialize compound expressions (like DATE_TRUNC('minute', ts)) in an intermediate table before grouping by them. For details, see Optimize aggregations.

Note

Finding optimal split points requires trial and error.

Consider splitting between operations that shuffle data on different keys, such as GROUP BY, DISTINCT, window functions with PARTITION BY, and joins. This lets each dynamic table maintain better data locality for its key operation. For operator-specific guidance, see Operator reference.

The following example shows how to split a complex query into intermediate dynamic tables.

Initial complex query:

CREATE DYNAMIC TABLE final_result
  TARGET_LAG = '1 hour'
  WAREHOUSE = my_warehouse
AS
  SELECT ...
  FROM large_table a
  JOIN dimension_table b ON ...
  JOIN another_table c ON ...
  GROUP BY ...;

Split the complex pipeline by adding an intermediate dynamic table:

CREATE DYNAMIC TABLE intermediate_joined
  TARGET_LAG = DOWNSTREAM
  WAREHOUSE = my_warehouse
AS
  SELECT ...
  FROM large_table a
  JOIN dimension_table b ON ...;

CREATE DYNAMIC TABLE final_result
  TARGET_LAG = '1 hour'
  WAREHOUSE = my_warehouse
AS
  SELECT ...
  FROM intermediate_joined
  JOIN another_table c ON ...
  GROUP BY ...;

For detailed information and examples of how operators affect performance, see Operator reference.

Mark historical data immutable¶

Use the IMMUTABLE WHERE clause to tell Snowflake that certain rows won’t change. This reduces the scope of work during each refresh.

For syntax, examples, and detailed guidance, see Understanding immutability constraints.

Adjustments¶

Adjustments don’t require you to recreate dynamic tables. You can make adjustments while your pipeline is running.

Adjust your warehouse configuration¶

The warehouse that you specify in your CREATE DYNAMIC TABLE statement runs all refreshes for that table. Warehouse size and configuration directly affect refresh duration and cost.

For more information about warehouses and dynamic tables, see Understand warehouse usage for dynamic tables. For general warehouse performance optimization strategies, see Optimizing warehouses for performance.

Use a separate warehouse for initialization¶

Initial refreshes often process significantly more data than incremental refreshes. Use INITIALIZATION_WAREHOUSE to run initializations on a larger warehouse. Reserve a smaller, more cost-effective warehouse for regular refreshes:

CREATE DYNAMIC TABLE my_dynamic_table
  TARGET_LAG = 'DOWNSTREAM'
  WAREHOUSE = 'XS_WAREHOUSE'
  INITIALIZATION_WAREHOUSE = '4XL_WAREHOUSE'
  AS <query>;

To add or change the initialization warehouse for an existing dynamic table:

ALTER DYNAMIC TABLE my_dynamic_table SET INITIALIZATION_WAREHOUSE = '4XL_WAREHOUSE';

To remove the initialization warehouse and use the primary warehouse for all refreshes:

ALTER DYNAMIC TABLE my_dynamic_table UNSET INITIALIZATION_WAREHOUSE;

To view the warehouse configuration, use SHOW DYNAMIC TABLES or check the DYNAMIC_TABLE_REFRESH_HISTORY table function.

Resize when needed¶

To balance cost and performance, choose a warehouse size that prevents bytes from being spilled but doesn’t exceed what your workload can use in parallel. When faster refreshes are critical, increase the size slightly beyond the cost-optimal point.

Considerations for dynamic table refreshes:

Bytes spilled: When query history shows bytes spilled to local or remote storage, the warehouse ran out of memory during refresh. A larger warehouse provides more memory to prevent spilling. For details, see Queries too large to fit in memory.
Slow initial refresh: When the initial refresh is slow, consider setting INITIALIZATION_WAREHOUSE for the initial creation, or temporarily resize the warehouse and then resize it down after the table is created.
Saturated parallelism: Beyond a certain point, additional parallelism provides diminishing returns. Doubling warehouse size might double cost without halving runtime. To check how your refresh uses parallelism, review the query profile.

To resize a warehouse, see Increasing warehouse size.

For cost considerations, see Virtual warehouse credit usage and Working with warehouses.

Handle concurrent refreshes with multi-cluster warehouses¶

If multiple dynamic tables share a warehouse and refreshes frequently queue, consider using a multi-cluster warehouse. Multi-cluster warehouses automatically add clusters when queries queue and remove them when demand drops. This improves refresh latency during peak periods without paying for unused capacity during quiet periods.

For guidance on identifying and reducing queues, see Reducing queues.

Multi-cluster warehouses require Enterprise Edition or higher. For cost considerations, see Setting the scaling policy for a multi-cluster warehouse.

Identify the right target lag¶

Target lag controls how often your dynamic table refreshes. Shorter target lag means fresher data but more frequent refreshes and higher compute cost. For more information about how target lag works, see Understanding dynamic table target lag.

Use the following recommendations to optimize target lag for your workload:

Use DOWNSTREAM for intermediate tables that don’t need independent freshness guarantees. These tables refresh only when downstream tables need them.
Check the refresh history to find the right lag: Use DYNAMIC_TABLE_REFRESH_HISTORY or Snowsight to analyze refresh durations and skipped refreshes. Set the target lag slightly higher than your typical refresh duration.

Change target lag¶

ALTER DYNAMIC TABLE my_dynamic_table SET TARGET_LAG = '1 hour';

To set a dynamic table to refresh based on downstream demand:

ALTER DYNAMIC TABLE my_dynamic_table SET TARGET_LAG = DOWNSTREAM;

Improve data locality¶

Locality describes how closely Snowflake stores rows that share the same key values. When rows with matching keys span fewer micro-partitions (good locality), incremental refreshes scan less data. When matching keys span many micro-partitions (poor locality), incremental refresh can take longer than full refresh.

For more information about how Snowflake stores data, see Micro-partitions & Data Clustering.

Cluster source tables¶

The most effective way to improve locality is to cluster your source tables by the keys used in your dynamic table query (JOIN, GROUP BY, or PARTITION BY keys):

ALTER TABLE my_source_table CLUSTER BY (join_key_column);

When you join on multiple columns and can’t cluster by all of them:

Prioritize clustering larger tables by the most selective keys.
Consider creating separate copies of the same data clustered by different keys for use in different dynamic tables.

For more information, see Clustering Keys & Clustered Tables. To enable automatic reclustering, see Automatic Clustering.

Factors that affect locality¶

Beyond source table clustering, two other factors affect locality. These depend on your data patterns and are harder to change directly:

How new data aligns with partition keys: Incremental refresh is faster when new rows affect only a small portion of the table. This depends on your data ingestion patterns, not your query structure.

For example, time-series data grouped by hour has good locality because new rows share recent timestamps. Data grouped by a column with values spread across the entire table has poor locality.
How changes align with dynamic table clustering: When Snowflake applies updates or deletions to a dynamic table, it must locate the affected rows. This is faster when the changed rows are stored close together.

For example, updates to recent rows perform well when the dynamic table is naturally ordered by time. Updates scattered across the entire table are slower. This factor depends on your workload patterns, including which rows change and how often.

When you experience poor locality because of these factors, consider whether you can adjust your data model or ingestion patterns upstream.

Tutorial: Optimize dynamic table performance for SCD Type 1 workloads¶

Introduction¶

This tutorial shows you how to identify and resolve performance bottlenecks in a dynamic table pipeline for slowly changing dimension (SCD) Type 1 workloads. Dynamic tables automatically materialize query results and handle scheduling and orchestration for your data pipelines. Optimizing dynamic table performance helps you maintain data freshness and control costs.

About SCD Type 1 tables¶

Slowly changing dimension (SCD) tables store data that changes occasionally and unpredictably over time. Common examples include tables that track changes to customer addresses or product prices.

This tutorial implements an SCD Type 1 table, often called an “SCD-1 live” table. This type overwrites old data with new data and doesn’t keep a history of past values. SCD Type 1 tables are useful when you only care about the latest state of each record, such as a customer’s current phone number or a product’s current category.

In real-world data pipelines, you typically build a Type 1 SCD table by consuming a changelog table.

What you’ll learn¶

In this tutorial, you’ll learn how to complete the following tasks:

Create a sample source table with product change data.
Build two SCD Type 1 dynamic tables: one with a suboptimal SQL pattern and one with an optimized pattern with the QUALIFY clause.
Understand how the QUALIFY clause enables efficient incremental processing and significantly reduces refresh time.
Monitor key performance metrics like refresh duration and partition scans to identify optimization opportunities.
Compare the incremental refresh performance of both dynamic tables on the same data.

Prerequisites¶

You need access to a Snowflake environment with the following resources:

A warehouse for compute resources. Snowflake recommends using an x-small warehouse.
The privileges required to create databases, schemas, and dynamic tables. For more information, see Access control privileges.

If you don’t have a user with the necessary permissions, ask someone who does to create one for you. Users with the ACCOUNTADMIN role can create new users and grant them the required privileges.

Note

For the best experience, complete this tutorial in Snowsight so that you can quickly view the query history and monitor your dynamic table performance.

Step 1: Create the source data¶

Start by setting up a source table with sample data that simulates streaming product changes.

Create a database and schema for the tutorial, then create a source table:

CREATE DATABASE IF NOT EXISTS dt_perf_demo_db;
CREATE SCHEMA IF NOT EXISTS dt_perf_demo_db.tutorial;

USE SCHEMA dt_perf_demo_db.tutorial;

CREATE OR REPLACE TABLE product_changes (
    product_code VARCHAR(50),
    product_name VARCHAR(200),
    price NUMBER(10, 2),
    price_start_date TIMESTAMP_NTZ(9)
);

Next, insert sample data into the product_changes source table. The following command generates 100 million rows of sample product data by repeating 10,000 unique product codes and names. It assigns each product a price that changes slightly with each row, and sets a timestamp that increases by a few minutes for each new entry.

INSERT INTO product_changes (product_code, product_name, price, price_start_date)
  SELECT
      'PC-' || LPAD(TO_VARCHAR(MOD(SEQ4(), 10000) + 1), 3, '0') AS product_code,
      'Product ' || LPAD(TO_VARCHAR(MOD(SEQ4(), 10000) + 1), 3, '0') AS product_name,
      ROUND(10.00 + (MOD(SEQ4(), 10000) * 5) + (SEQ4() * 0.01), 2) AS price,
      DATEADD(MINUTE, SEQ4() * 5, '2025-01-01 00:00:00') AS PRICE_START_DATE
  FROM
      TABLE(GENERATOR(ROWCOUNT => 100000000));

Step 2: Create dynamic tables for comparison¶

In this step, you create two SCD Type 1 dynamic tables that consume from the source table. The first dynamic table uses a suboptimal SQL pattern to find the most recent price change for every product, while the second uses an optimized pattern. Creating both tables simultaneously lets you directly compare their refresh performance on the same data.

Create a suboptimal dynamic table¶

Create a dynamic table by using an INNER JOIN with a subquery that gets the latest timestamp for each product code. This is a common but inefficient pattern that triggers costly re-computation on every update.

Note

Replace my_warehouse with the name of your warehouse.

CREATE DYNAMIC TABLE product_current_price_v1
    TARGET_LAG = DOWNSTREAM
    WAREHOUSE = <my_warehouse>
    INITIALIZE = ON_SCHEDULE
    REFRESH_MODE = INCREMENTAL
  AS
  SELECT
      h.product_code,
      h.product_name,
      h.price,
      h.price_start_date
  FROM product_changes h
  INNER JOIN (
      SELECT product_code, MAX(price_start_date) max_price_start_date
      FROM product_changes
      GROUP BY product_code
  ) m ON h.price_start_date = m.max_price_start_date AND h.product_code = m.product_code;

Key details about this dynamic table configuration:

This dynamic table uses TARGET_LAG = DOWNSTREAM, which means it refreshes only when downstream tables or queries need fresh data. This setting works well for intermediate tables in a pipeline.
The REFRESH_MODE = INCREMENTAL setting tells Snowflake to process only changed data instead of recomputing the entire table.

Create an optimized dynamic table¶

Now create a second dynamic table named product_current_price_v2 with an optimized SQL pattern. This table uses the QUALIFY clause to efficiently filter to the latest price for each product:

CREATE DYNAMIC TABLE product_current_price_v2
    TARGET_LAG = DOWNSTREAM
    WAREHOUSE = <my_warehouse>
    REFRESH_MODE = INCREMENTAL
    INITIALIZE = ON_SCHEDULE
  AS
  SELECT
      product_code,
      product_name,
      price,
      price_start_date
  FROM product_changes
  QUALIFY RANK() OVER (PARTITION BY product_code ORDER BY price_start_date DESC) = 1;

Using the QUALIFY clause with a ranking window function like RANK() lets Snowflake efficiently detect which product partitions changed. Instead of rescanning all historical data, the engine finds affected partitions and recalculates rankings only for those specific products. This results in more efficient incremental refreshes.

This optimization works because of the following factors:

Ranking functions like RANK, ROW_NUMBER, or DENSE_RANK used with PARTITION BY let the engine isolate changes by product.
Filtering to RANK() ... = 1 keeps only the latest record for each product, which is what SCD Type 1 tables require.
Placing the QUALIFY RANK() ... = 1 clause at the top level of the dynamic table query, not within a subquery, ensures that the optimization applies.
Persisting the product_code and price_start_date keys as columns in the dynamic table lets the engine track partition changes between refreshes and avoids full table scans.

This pattern also demonstrates good data locality, which describes how closely Snowflake stores rows with matching keys together. The pattern isolates changes to specific partition keys, which avoids full table scans.

Refresh both dynamic tables¶

To fill in the initial data for both tables, manually refresh them. This establishes a baseline for comparing their incremental refresh performance in the next step:

ALTER DYNAMIC TABLE product_current_price_v1 REFRESH;

ALTER DYNAMIC TABLE product_current_price_v2 REFRESH;

Step 3: Compare incremental refresh performance¶

Now compare how each table handles incremental refreshes.

Add new data to the source table¶

This step simulates new data arriving in the source table, as would happen in a real-world streaming scenario. Insert 1,000 new rows into the product_changes source table that update the price for five of the existing products:

INSERT INTO product_changes (product_code, product_name, price, price_start_date)
  SELECT
      'PC-' || LPAD(TO_VARCHAR(MOD(SEQ4(), 5) + 1), 3, '0') AS product_code,
      'Product ' || LPAD(TO_VARCHAR(MOD(SEQ4(), 5) + 1), 3, '0') AS product_name,
      ROUND(50.00 + (MOD(SEQ4(), 10) * 5) + ((SEQ4() + 100000000) * 0.01), 2) AS price,
      DATEADD(MINUTE, (SEQ4() + 100000000) * 5, '2025-01-01 00:00:00') AS price_start_date
  FROM
      TABLE(GENERATOR(ROWCOUNT => 1000));

Monitor refresh performance¶

Dynamic table performance depends on several factors: how you write queries, how you organize data, and the resources you allocate. The key metrics to monitor are refresh duration, partition scans, and bytes spilled. In this step, you’ll compare these metrics between the two dynamic table implementations.

To pick up the changes, start by refreshing the suboptimal dynamic table:

ALTER DYNAMIC TABLE product_current_price_v1 REFRESH;

Check the execution time and scan metrics:

Navigate to Transformation » Dynamic Tables.
Filter the list by selecting the dt_perf_demo_db database, then select product_current_price_v1.
Select the Refresh History tab and notice the REFRESH DURATION value for the most recent refresh.
Select Show query profile for the latest refresh entry.
Find the Statistics section and notice the Partitions scanned value.

The product_current_price_v1 table is inefficient because the subquery recalculates the maximum timestamp for all 10,000 products, even though only five products received new price changes. This forces the dynamic table engine to scan many more partitions than necessary, driving up both time and cost as the source table grows. This pattern demonstrates poor data locality because changes don’t align well with how the data is organized for incremental processing.
Now refresh the optimized product_current_price_v2 dynamic table:
```
ALTER DYNAMIC TABLE product_current_price_v2 REFRESH;
```
Repeat the previous steps to check the Refresh History for the optimized table:

Compare the two refresh operations. The optimized product_current_price_v2 dynamic table should complete significantly faster than the suboptimal product_current_price_v1 dynamic table. In the example results, the suboptimal table took 2.8 seconds while the optimized table took only 804 milliseconds.

Open the Query Profile and compare the Statistics section:

The product_current_price_v2 uses the QUALIFY clause with a ranking window function, which lets the engine efficiently identify and process only the five products that changed, resulting in a much faster incremental refresh. This query pattern has good data locality because Snowflake can isolate which partition keys (product codes) contain changes.

Tip

Even at the small scale used in this tutorial, this optimization leads to noticeable performance improvements. In production, with millions of products and billions of records, this optimization can cut refresh times from hours to seconds. Performance depends on the percentage of changed products, so efficiency remains high as your data grows.

Faster refreshes translate directly to fresher data. If you need data fresh within minutes, optimizing query patterns like this helps you meet aggressive target lag requirements without oversizing warehouses.

Clean up¶

To delete all objects created for this tutorial, run the following DROP statement:

DROP DATABASE dt_perf_demo_db;

Summary and additional resources¶

In this tutorial, you optimized a dynamic table pipeline by replacing a suboptimal subquery pattern with the highly efficient QUALIFY RANK() = 1 pattern for an SCD Type 1 table. This lets the dynamic table engine apply performance optimizations for incremental refresh and leads to faster and cheaper pipeline runs. Faster refreshes mean you can maintain data freshness with tighter target lag requirements without increasing cost.

Along the way, you completed the following tasks:

Created a source table with sample product data simulating a changelog.
Created a suboptimal SCD Type 1 dynamic table that demonstrated the common pitfall of using a nested query with MAX() to find the latest records.
Applied the QUALIFY optimization to significantly improve dynamic table refresh performance with efficient incremental processing. This pattern improves data locality by letting the engine isolate changes to specific partition keys.
Monitored refresh performance by comparing partition scans and execution times between different implementations using the query profile. These metrics help you identify whether your queries work efficiently with incremental refresh.

Key performance concepts demonstrated:

Incremental refresh efficiency: The optimized query processes only changed data, while the suboptimal query rescans the entire dataset.
Data locality: When changes align with partition keys (product codes), incremental refresh performs well. When changes scatter across many keys or require full rescans, performance suffers. See Improve data locality for more details.
Target lag and freshness: Optimizing query patterns lets you meet tighter data freshness requirements without oversizing warehouses.

For more information about dynamic tables and optimization techniques, explore the following resources:

Query and pipeline optimization:

Query optimization for incremental refresh: Learn which operators perform well with incremental refresh and how to restructure queries for better performance. See Operator reference.
Data locality: Understand how data organization affects incremental refresh performance and how to cluster source tables. See Improve data locality.
Immutability constraints: To avoid reprocessing unchanged historical data, use the IMMUTABLE WHERE option. This can greatly reduce refresh costs and time.

Infrastructure and monitoring:

Target lag: Learn how to balance data freshness requirements with compute costs by choosing appropriate target lag settings. See Understanding dynamic table target lag.
Warehouse sizing: Learn how warehouse size affects refresh performance and cost. See Adjust your warehouse configuration.
Performance monitoring: Track key metrics like refresh duration, partition scans, and warehouse utilization to identify optimization opportunities. See Monitor dynamic tables.
Refresh modes: Understand when to use incremental vs. full refresh mode and how Snowflake chooses between them. See Understanding dynamic table initialization and refresh.
Dynamic Iceberg tables: Use dynamic tables with Apache Iceberg™ tables to build interoperable data pipelines for your data lake. See Create dynamic Apache Iceberg™ tables.