Use primary keys to optimize dynamic table pipelines

Snowflake can use primary keys to track row-level changes in dynamic tables without relying on change-tracking columns. This enables incremental refresh for pipelines that run insert overwrite workloads including full refresh dynamic tables, which normally block downstream incremental processing.

Primary keys are especially effective when an INSERT OVERWRITE is performed on a base table where only a small fraction of the data is actually changed. In these cases, primary key-based change tracking processes only the changed rows instead of recomputing the entire table. A primary key provides a stable row identifier that persists across overwrites.

For conceptual background, see Understanding primary keys in dynamic tables.

Improve performance for INSERT OVERWRITE workloads

When a base table is periodically rewritten through INSERT OVERWRITE, standard change-tracking columns are reset and a dynamic table consuming the base table will see a set of inserts and deletes for all rows in the base table.

In the following example, an external process rewrites the dimension_table periodically, but most rows remain the same:

CREATE TABLE dimension_table (
  dim_id INT PRIMARY KEY RELY,
  dim_name VARCHAR,
  category VARCHAR
);

CREATE TABLE fact_table (
  fact_id INT,
  dim_id INT,
  measure FLOAT,
  ts TIMESTAMP
);

CREATE DYNAMIC TABLE enriched_facts
  TARGET_LAG = '30 minutes'
  WAREHOUSE = mywh
  REFRESH_MODE = INCREMENTAL
AS
  SELECT f.fact_id, f.measure, d.dim_name, d.category, f.ts
  FROM fact_table f
  INNER JOIN dimension_table d ON f.dim_id = d.dim_id;

When the dimension table is rewritten through INSERT OVERWRITE, Snowflake uses the primary key to identify which dimension rows actually changed and refreshes only the affected facts, rather than recomputing the entire join.

Enable incremental refresh downstream of a full refresh dynamic table

Normally, a dynamic table with REFRESH_MODE = INCREMENTAL can’t read from a dynamic table with REFRESH_MODE = FULL. When the full refresh dynamic table has a system-derived unique key, you can explicitly set the refresh mode to INCREMENTAL.

Example: Use a base table primary key

Create a base table with a primary key and set the RELY property so Snowflake uses it for row-level change tracking:

CREATE TABLE raw_events (
  event_id INT PRIMARY KEY RELY,
  event_type VARCHAR,
  payload VARIANT,
  created_at TIMESTAMP
);

Create a full refresh dynamic table that reads from the base table. Because the base table has a reliable primary key, Snowflake can derive an unique key from the base table and register it as an unique constraint for the dynamic table:

CREATE DYNAMIC TABLE transformed_events
  TARGET_LAG = '10 minutes'
  WAREHOUSE = mywh
  REFRESH_MODE = FULL
AS
  SELECT event_id, event_type, payload:user_id::STRING AS user_id, created_at
  FROM raw_events;

Create an incremental dynamic table downstream. This works because the upstream table has a system-derived reliable unique key:

CREATE DYNAMIC TABLE event_summary
  TARGET_LAG = '10 minutes'
  WAREHOUSE = mywh
  REFRESH_MODE = INCREMENTAL
AS
  SELECT user_id, COUNT(*) AS event_count, MAX(created_at) AS last_event
  FROM transformed_events
  GROUP BY user_id;

Example: Use a query-derived primary key

When a dynamic table’s query includes a GROUP BY clause, Snowflake automatically derives an unique key from the grouping columns. Downstream tables can use this derived key for primary key-based change tracking and enable incremental refreshes.

CREATE DYNAMIC TABLE daily_sales
  TARGET_LAG = '1 hour'
  WAREHOUSE = mywh
  REFRESH_MODE = FULL
AS
  SELECT DATE_TRUNC('day', sale_ts) AS sale_day, product_id, SUM(amount) AS total_sales
  FROM sales
  GROUP BY sale_day, product_id;

The daily_sales table has a derived unique key on (sale_day, product_id) because the GROUP BY guarantees one row per combination. A downstream table can refresh incrementally:

CREATE DYNAMIC TABLE product_trends
  TARGET_LAG = '1 hour'
  WAREHOUSE = mywh
  REFRESH_MODE = INCREMENTAL
AS
  SELECT product_id, AVG(total_sales) AS avg_daily_sales, COUNT(*) AS days_with_sales
  FROM daily_sales
  GROUP BY product_id;

Check system-derived unique keys on a dynamic table

To see whether a dynamic table has a derived unique key, use the SHOW UNIQUE KEYS command:

SHOW UNIQUE KEYS IN daily_sales;

If the output contains a unique key, the dynamic table supports primary key-based change tracking. Downstream dynamic tables can use REFRESH_MODE = INCREMENTAL to read from it, even if it uses full refresh mode.

You can also verify support by creating a downstream dynamic table with REFRESH_MODE = INCREMENTAL. If the upstream table doesn’t have a reliable unqiue key, the creation fails with an error.