管理 Apache Iceberg™ 表

在 Snowflake 中管理 Apache Iceberg™ 表

您还可以将使用外部目录的 Iceberg 表转换为使用 Snowflake 作为 Iceberg 目录的表。要了解更多信息,请参阅 转换 Apache Iceberg™ 表以使用 Snowflake 作为目录

查询表

若要查询 Iceberg 表,用户必须获授予或继承以下权限:

  • 对包含该表的数据库和架构的 USAGE 权限

  • 对表的 SELECT 权限

您可以使用 SELECT 语句查询 Iceberg 表。例如:

SELECT col1, col2 FROM my_iceberg_table;
Copy

备注

Along with Snowflake, you can also use an external query engine to query Iceberg tables. For more information, see 使用具有 Apache Iceberg™ 表的外部查询引擎.

使用 DML 命令

使用 Snowflake 作为目录的 Iceberg 表支持完整 数据操作语言 (DML) 命令,包括以下内容:

Snowflake 管理的表还支持使用 COPY INTO <table>Snowpipe 等功能进行高效的批量加载。有关更多信息,请参阅 将数据加载到 Apache Iceberg™ 表中

备注

  • Snowflake also supports writing to externally managed Iceberg tables. For more information, see 对外部托管的 Apache Iceberg™ 表的写入支持 and 写入外部托管的 Iceberg 表.

  • For Snowflake-managed Iceberg tables, if a DML operation fails unexpectedly and rolls back, some Parquet files might get written to your external cloud storage but won't be tracked or referenced by your Iceberg table metadata. These Parquet files are orphan files.

    If you see a mismatch between storage usage for your external cloud storage and Snowflake, you might have orphan files in your external cloud storage. To see your storage usage for Snowflake, you can use the TABLE_STORAGE_METRICS 视图 or TABLE_STORAGE_METRICS 视图. If you see a mismatch, contact Snowflake Support for assistance with determining whether you have orphan files and removing them.

示例:更新表

您可以使用 INSERTUPDATE 语句来修改 Snowflake 管理的 Iceberg 表。

下面的示例将一个新值插入到名为 store_sales 的 Iceberg 表中,而且,如果 cola 列的值当前为 -99,则将该列的值更新为 1。

INSERT INTO store_sales VALUES (-99);

UPDATE store_sales
  SET cola = 1
  WHERE cola = -99;
Copy

生成 DML 变更的快照

对于使用 Snowflake 作为目录的表,Snowflake 会自动生成 Iceberg 元数据。Snowflake 将元数据写入外部卷上名为 metadata 的文件夹。要查找 metadata 文件夹,请参阅 数据和元数据目录

或者,您可以调用 SYSTEM$GET_ICEBERG_TABLE_INFORMATION 函数为新变更生成 Iceberg 元数据。

对于不受 Snowflake 管理的表,该函数将返回有关最新刷新快照的信息。

例如:

SELECT SYSTEM$GET_ICEBERG_TABLE_INFORMATION('db1.schema1.it1');
Copy

输出:

+-----------------------------------------------------------------------------------------------------------+
| SYSTEM$GET_ICEBERG_TABLE_INFORMATION('DB1.SCHEMA1.IT1')                                                   |
|-----------------------------------------------------------------------------------------------------------|
| {"metadataLocation":"s3://mybucket/metadata/v1.metadata.json","status":"success"}                         |
+-----------------------------------------------------------------------------------------------------------+

使用行级删除

Snowflake supports querying tables with row-level deletes and writing to tables by using row-level deletes.

Query tables

当您已为更新、删除和合并操作配置了 行级删除 (https://iceberg.apache.org/spec/#row-level-deletes) 时,Snowflake 支持对 外部管理的 Iceberg 表 进行查询。

要配置行级删除,请参阅 Apache Iceberg 文档中的 Write 属性 (https://iceberg.apache.org/docs/latest/configuration/#write-properties)。

Write to tables by using positional delete files

备注

  • 仅支持外部管理的 Iceberg 表。

  • To use position row-level deletes, ensure that the Iceberg version for Iceberg tables is set to v2, which is the default. For more information, see ICEBERG_VERSION_DEFAULT. If the Iceberg version is set to v3, the merge-on-read behavior in Snowflake is to use deletion vectors.

Snowflake supports position row-level deletes for writing to externally managed Iceberg tables stored on Amazon S3, Azure, or Google Cloud. To turn off position deletes, which enable running the DML operations in copy-on-write mode, set the ENABLE_ICEBERG_MERGE_ON_READ parameter to FALSE at the table, schema, or database level.

Write to tables by using deletion vectors

To optimize row-level data modifications, Snowflake supports deletion vectors for writing to externally managed and Snowflake-managed Iceberg tables stored on Amazon S3, Azure, or Google Cloud. With deletion vectors, Snowflake can perform "merge-on-read" (MOR) operations, which improve write performance for the following DML statements:

  • DELETE

  • UPDATE

  • MERGE

Snowflake achieves this performance by writing small vector files instead of rewriting large data files. For more information, see Deletion vectors (https://iceberg.apache.org/spec/#deletion-vectors) in the Apache Iceberg specification.

Enable deletion vectors

To enable deletion vectors, complete the following steps:

  1. Set the default Iceberg version for Iceberg tables to v3 by following the instructions in 配置默认 Iceberg 版本.

    备注

    If the default Iceberg version for Iceberg tables is v2, Snowflake performs "merge-on-read" (MOR) operations by using positional delete files.

  2. Set the ENABLE_ICEBERG_MERGE_ON_READ parameter to TRUE, which is the default, by following the instructions in ENABLE_ICEBERG_MERGE_ON_READ.

  3. To run DML operations in copy-on-write mode, set the ENABLE_ICEBERG_MERGE_ON_READ parameter to FALSE.

Usage notes for deletion vectors

  • Default behavior

    • The system default for ENABLE_ICEBERG_MERGE_ON_READ is TRUE.

  • Write method heuristics

    • When ENABLE_ICEBERG_MERGE_ON_READ is set to TRUE, Snowflake uses heuristics to decide per-file whether to use merge-on-read or copy-on-write:

      • Row count: Snowflake only writes a deletion vector if fewer than ~5% of rows in a data file are deleted. If ≥5% are deleted, Snowflake rewrites the file by using copy-on-write.

      • File size: For Snowflake to write deletion vectors, the data file must be larger than approximately 1.6 MB.

  • Compatibility

    • If you use compute engines that don't yet support Iceberg v3 deletion vectors, set ENABLE_ICEBERG_MERGE_ON_READ to FALSE to enforce copy-on-write for all writes.

  • Parameter precedence

    • Snowflake only checks the ENABLE_ICEBERG_MERGE_ON_READ parameter to determine the write method. It doesn't recognize the following Iceberg table properties:

      • write.delete.mode

      • write.update.mode

      • write.merge.mode

写时复制与读时合并

Iceberg 提供了两种模式来配置计算引擎如何处理外部管理表的行级操作。Snowflake 支持这两种模式。

下表描述了何时可能需要使用每种模式:

模式

描述

写时复制(默认)

此模式优先考虑读取时间并影响写入速度。

当执行更新、删除或合并操作时,计算引擎会重写整个受影响的 Parquet 数据文件。这可能会导致写入速度变慢(尤其是在处理大型数据文件时),但不会影响读取时间。

这是默认模式。

读时合并

此模式优先考虑写入速度并稍微影响读取时间。

当执行更新、删除或合并操作时,计算引擎会创建一个删除文件,其中仅包含有关已更改行的信息。

当从表中读取时,查询引擎会将删除文件与数据文件合并。合并可能增加读取时间。但是,您可以通过安排定期压缩和表维护来优化读取性能。

要了解有关 Iceberg 行级更改的更多信息,请参阅 Apache Iceberg 文档中的 行级删除 (https://iceberg.apache.org/spec/#row-level-deletes)。

注意事项和限制

Consider the following information when you use row-level deletes with Iceberg tables:

  • Snowflake supports position deletes (https://iceberg.apache.org/spec/#position-delete-files) only for v2 Iceberg tables, and deletion vectors (https://iceberg.apache.org/spec/#deletion-vectors) for v3 Iceberg tables.

  • Snowflake only supports position deletes with externally managed Iceberg tables.

  • 为了在使用行级删除时获得最佳读取性能,请执行定期压缩和表维护以删除旧的删除文件。有关信息,请参阅 维护使用外部目录的表

  • 过多的位置删除(尤其是悬空位置删除)可能会阻碍表的创建和刷新操作。要避免此问题,请执行表维护以删除多余的位置删除。

    使用的表维护方法取决于您的外部 Iceberg 引擎。例如,您可以使用带有 delete-file-thresholdrewrite-all 选项的 Spark rewrite_data_files 方法。有关更多信息,请参阅 Apache Iceberg™ 文档中的 rewrite_data_files (https://iceberg.apache.org/docs/latest/spark-procedures/#rewrite_data_files)。

Set a target file size

To improve query performance for external Iceberg engines such as Spark or Trino, you can configure a target file size for both Snowflake-managed and externally managed Iceberg tables with write support. You can either set a specific size (16MB, 32MB, 64MB, or 128MB), or use the AUTO option. AUTO works differently, depending on the table type:

  • Snowflake-managed tables: AUTO specifies that Snowflake should choose the file size for the table based on table characteristics such as size, DML patterns, ingestion workload, and clustering configuration. Snowflake automatically adjusts the file size, starting at 16 MB, for better read and write performance in Snowflake.

  • Externally managed tables: AUTO specifies that Snowflake should aggressively scale to a larger file size.

You can set the target file size when you create an Iceberg table, or run the ALTER ICEBERG TABLE command to change the target file size for an existing Iceberg table. Snowflake attempts to maintain file sizes close to the target size when writing Parquet files for a table.

After you set a target file size, Snowflake immediately starts to create larger files for new Data Manipulation Language (DML) operations. Snowflake's table maintenance operations asynchronously change the existing table files according to the target file size.

The following example uses TARGET_FILE_SIZE to set a target file size of 128 MB for a Snowflake-managed table:

CREATE ICEBERG TABLE my_iceberg_table (col1 INT)
  CATALOG = 'SNOWFLAKE'
  EXTERNAL_VOLUME = 'my_external_volume'
  BASE_LOCATION = 'my_iceberg_table'
  TARGET_FILE_SIZE = '128MB';
Copy

Alternatively, use ALTER ICEBERG TABLE to set the TARGET_FILE_SIZE property for an existing table:

ALTER ICEBERG TABLE my_iceberg_table
  SET TARGET_FILE_SIZE = '32MB';
Copy

To check the value of the TARGET_FILE_SIZE property for a table, use the SHOW PARAMETERS command:

SHOW PARAMETERS LIKE 'target_file_size' FOR my_iceberg_table;
Copy

Table optimization for Snowflake-managed Iceberg tables

Table optimization automatically performs maintenance to improve the performance and reduce the storage costs of your Snowflake-managed Iceberg tables.

备注

  • Snowflake doesn't support orphan file deletion for Snowflake-managed Iceberg tables. If you see a mismatch between storage usage for your external cloud storage and Snowflake, you might have orphan files in your external cloud storage. To see your storage usage for Snowflake, you can use the TABLE_STORAGE_METRICS 视图 or TABLE_STORAGE_METRICS 视图. If you see a mismatch, contact Snowflake Support for assistance with determining whether you have orphan files and removing them.

  • To improve query performance, you can also set a target file size. For more information, see Set a target file size.

Snowflake supports the Iceberg table optimization features summarized in the following table:

Feature

Improves query performance

Reduces storage costs

Notes

Automatic Clustering [1]

  • Billed.

  • Disabled by default.

Data compaction

  • Billed.

  • Enabled by default.

Manifest compaction

  • No cost.

  • Enabled automatically; you can't disable it.

Snapshot expiry

  • No cost.

  • Enabled automatically; you can't disable it.

[1] Unlike the other table optimization features, Automatic Clustering is billed separately as a standalone feature.

Automatic Clustering

Automatic Clustering reorganizes data within files or partitions based on frequently queried columns. The file size for Iceberg tables is based on your clustering configuration, unless you set a target file size. If you do, the file size is the specific size you set. For more information, see Set a target file size.

To set Automatic Clustering, specify the CLUSTER BY parameter when you create a Snowflake-managed Iceberg table or modify an existing table. For more information, see:

For more information about Automatic Clustering, see 自动聚类.

Data compaction

Data compaction combines small files into larger, more efficient files to manage storage, maintain an optimal file size, and improve query performance.

In most cases, data compaction doesn't have a significant effect on compute costs, but if these costs are a concern, you can disable compaction. For example, you might want to disable compaction on a table if you rarely query it. To disable or enable data compaction, see Set data compaction.

备注

  • To query data compaction jobs for Iceberg tables, see ICEBERG_STORAGE_OPTIMIZATION_HISTORY 视图. This view includes the number of credits that are billed for data compaction.

  • If you have 自动聚类 enabled, clustering performs data compaction on the table. This is true, regardless of whether data compaction is enabled or disabled on the table.

  • You also have the option to set a target file size. For more information, see Set a target file size.

Manifest compaction

Manifest compaction optimizes the metadata layer by reorganizing and combining smaller manifest files. This compaction reduces metadata overhead and improves query performance.

This feature is enabled automatically and you can't disable it.

Snapshot expiry

Snapshot expiry systematically deletes old snapshots and their unique data and metadata files from the table's history. This deletion is based on predefined retention policies.

This feature is enabled automatically and you can't disable it.

维护使用外部目录的表

您可以对使用外部目录的 Iceberg 表执行维护操作。

维护操作包括以下内容:

  • 使快照过期

  • 移除旧的元数据文件

  • 压缩数据文件

重要

为了使您的 Iceberg 表与外部更改保持同步,请务必将您的 Snowflake 刷新计划与表维护保持一致。每次执行维护操作时都应 刷新表

要了解不受 Snowflake 管理的 Iceberg 表的维护信息,请参阅 Apache Iceberg 文档中的 维护 (https://iceberg.apache.org/docs/latest/maintenance/)。

刷新表元数据

当您使用外部 Iceberg 目录时,可以使用 ALTER ICEBERG TABLE ... REFRESH 命令刷新表元数据。刷新表元数据会将元数据与最近的表更改同步。

备注

我们建议为支持的外部管理表设置 自动刷新

刷新表的元数据

以下示例手动刷新使用外部目录(例如,AWS Glue 或 Delta)的表的元数据。刷新表可使表与远程目录中发生的所有更改保持同步。

使用这种类型的 Iceberg 表时,您不需要在命令中指定元数据文件路径。

ALTER ICEBERG TABLE my_iceberg_table REFRESH;
Copy

要保持表自动更新,您可以设置 自动刷新。使用 ALTER ICEBERG TABLE 命令。

例如:

ALTER ICEBERG TABLE my_iceberg_table SET AUTO_REFRESH = TRUE;
Copy

刷新从 Iceberg 文件创建的表的元数据

以下示例手动刷新从外部云存储位置中的 Iceberg 元数据文件 创建的表,并指定元数据文件的相对路径,不带前导正斜杠 (/)。元数据文件定义刷新后表中的数据。

ALTER ICEBERG TABLE my_iceberg_table REFRESH 'metadata/v1.metadata.json';
Copy

检索存储指标

Snowflake 不会向您的账户收取 Iceberg 表存储费用。但是,您可以通过 Snowflake Information SchemaAccount Usage 架构中的 TABLE_STORAGE_METRICS 和 TABLES 视图进行查询,以跟踪 Iceberg 表占用了多少存储空间。

下面的示例查询连接了 ACCOUNT_USAGE.TABLE_STORAGE_METRICS 视图和 ACCOUNT_USAGE.TABLES 视图,并根据 TABLES.IS_ICEBERG 列进行筛选。

SELECT metrics.* FROM
  snowflake.account_usage.table_storage_metrics metrics
  INNER JOIN snowflake.account_usage.tables tables
  ON (
    metrics.id = tables.table_id
    AND metrics.table_schema_id = tables.table_schema_id
    AND metrics.table_catalog_id = tables.table_catalog_id
  )
  WHERE tables.is_iceberg='YES';
Copy

Set data compaction

You can set data compaction on Snowflake-managed Iceberg tables when you create a database, schema, or table, or run the ALTER command to change the setting for an existing database, schema, or table. You can also set data compaction at the account level by using the ALTER ACCOUNT command. For more information about data compaction, see Data compaction.

The following example uses ENABLE_DATA_COMPACTION to disable data compaction for a Snowflake-managed table:

CREATE ICEBERG TABLE my_iceberg_table (col1 INT)
  CATALOG = 'SNOWFLAKE'
  EXTERNAL_VOLUME = 'my_external_volume'
  BASE_LOCATION = 'my_iceberg_table'
  ENABLE_DATA_COMPACTION = FALSE;
Copy

Alternatively, use ALTER ICEBERG TABLE to disable it for an existing table.

ALTER ICEBERG TABLE my_iceberg_table
  SET ENABLE_DATA_COMPACTION = FALSE;
Copy

For more information, see:

Use default values with Iceberg tables

备注

For the other Iceberg v3 features that are supported in this preview, see Apache Iceberg™ 表:对 Apache Iceberg™ v3 的支持(预览版).

This preview introduces support for the default values feature for Apache Iceberg™ tables in accordance with the Iceberg v3 specification.

重要

To use default values with Iceberg tables, the tables must conform to v3 of the Apache Iceberg™ table specification. For instructions on how to configure the Iceberg version for tables, see 配置默认 Iceberg 版本.

This feature lets you to set default values for existing and new records without having to rewrite existing data files. You can set the following default values for table columns:

  • An initial default, which provides a default value for existing records when a field is added.

  • A write default, which provides a default value for new records if the field with the default value isn't specified during writes.

With this feature, you can evolve schemas while presenting values for historical data and provide a fallback value for future writes. For more information, see Default values (https://iceberg.apache.org/spec/#default-values).

You can specify a default value when you create or modify a table:

  • To create a table with a default value for a column, use the DEFAULT keyword with your column definition. The value you specify is set as both the initial default and write default for the column. You can't change the initial default for the column.

  • To add a column with a default value to a table, use the DEFAULT keyword with the column definition in your ALTER ICEBERG TABLE command. The value you specify is set as both the initial default and write default for the column. You can't change the initial default for the column.

  • To change the write default for a column, use the WRITE DEFAULT keywords with the ALTER ICEBERG TABLE command.

重要

When you specify a default value for a column, you must specify a static value; you can't specify an expression or function for the value. This requirement is in accordance with the Iceberg v3 specification and applies to both the initial default and write default.

The following sections include examples of how to specify default values and change the default write value.

Example: Create a table with a default value

To create an Iceberg table with default values, use the CREATE ICEBERG TABLE command.

In the following example, you first set a default value for a column when you create a Snowflake-managed Iceberg table. Next, you insert a record into the table without specifying a value for the column with the default value.

  1. Create a user_events table, which includes an event_version column with a default value of 2:

    CREATE ICEBERG TABLE user_events (
        event_id INT,
        user_id INT,
        event_type STRING,
        event_time TIMESTAMP,
        event_version INT DEFAULT 2
      )
      CATALOG = 'SNOWFLAKE'
      EXTERNAL_VOLUME = 'my_external_volume'
      BASE_LOCATION = 'database/schema/user_event'
      ICEBERG_VERSION = 3;
    
    Copy

    Setting a default value in the table definition sets an initial default and a write default. Because the column has a write default, the value 2 will be used for new records if the event_version isn't specified during writes.

  2. Add a login event with event_version specified:

    INSERT INTO user_events VALUES
      (1, 101, 'login', '2025-11-01 10:00:00', 1);
    
    Copy
  3. Add a purchase event, but don't specify an event_version:

    INSERT INTO user_events VALUES
    (1, 101, 'purchase', '2025-11-01 10:01:00');
    
    Copy

    As a result, Snowflake inputs the value for event_version into the table as 2.

  4. Query the table:

    SELECT * FROM user_events;
    
    Copy

    输出:

    +-----------+----------+-------------+---------------------+----------------+
    | event_id  | user_id  | event_type  | event_time          | event_version  |
    +-----------+----------+-------------+---------------------+----------------+
    | 1         | 101      | login       | 2025-11-01 10:00:00 | 1              |
    | 1         | 101      | purchase    | 2025-11-01 10:01:00 | 2              |
    +-----------+----------+-------------+---------------------+----------------+
    

Example: Add a column with a default value to an existing table

To add a new column with a default value to an Iceberg table, use the ALTER ICEBERG TABLE command.

In the following example, you modify the user_events table by adding an event_version column, which has a default value of 2:

ALTER ICEBERG TABLE user_events ADD COLUMN event_version INT DEFAULT 2;
Copy

In addition to setting a write default, adding a column with a default value also sets an initial default for the column. As a result, the default value for existing records for the event_version column is 2.

Example: Change the write default for a column

The following example changes the write default for the event_version column of the user_events table to 3:

ALTER ICEBERG TABLE user_events ALTER COLUMN event_version SET WRITE DEFAULT 3;
Copy

View the default values defined for a table

To view the default value for a table column in a Snowflake-managed or externally managed Iceberg table, run the DESCRIBE ICEBERG TABLE command, and then view the DEFAULT column and WRITE DEFAULT column in the output:

  • The DEFAULT column maps to the initial-default value in the Apache Iceberg specification.

  • The WRITE DEFAULT column maps to the write-default value in the Apache Iceberg specification.

These columns return in the output, regardless of whether the table is a v2 Iceberg table or a v3 Iceberg table.

The following example describes the columns for the user_events table. This table has an initial default and write default specified for the event_version column:

DESC ICEBERG TABLE user_events
  ->> SELECT
    "name",
    "kind",
    "default",
    "write default"
      FROM $1;
Copy

输出:

+-----------------+---------+---------+---------------+
| name            | kind    | default | write default |
+-----------------+---------+-------------------------+
| EVENT_ID        | COLUMN  |         |               |
| USER_ID         | COLUMN  |         |               |
| EVENT_TYPE      | COLUMN  |         |               |
| EVENT_TIME      | COLUMN  |         |               |
| EVENT_VERSION   | COLUMN  | 2       | 3             |
+-----------------+---------+---------+---------------+

Drop the write default

To drop the write default for a column, use the DROP WRITE DEFAULT keywords with the ALTER ICEBERG TABLE command.

The following example drops the default write value for the event_version column:

ALTER ICEBERG TABLE user_events ALTER COLUMN event_version DROP WRITE DEFAULT;
Copy

Considerations and limitations for default values

Consider the following items when you use default values with Snowflake-managed and externally managed Iceberg tables:

Snowflake-managed and externally managed Iceberg tables

  • You can't later add or change an initial default for a column after you create it. Therefore, you need to drop the column and add the column by using ALTER TABLE ... DROP COLUMN and ALTER TABLE ... ADD COLUMN commands.

  • The maximum size for a default value is 128|~|MB.

  • Default values can't use data types that can't be represented as constants, so you can't use the following data types with a default value:

    • map

    • list

    • struct

    • variant

Snowflake-managed Iceberg tables

  • The write-default value is always initialized to the initial-default value. To see the default for both of these values, run the DESCRIBE ICEBERG TABLE command, and then view the WRITE DEFAULT and DEFAULT columns in the output.

  • You can't specify a default value that uses the TIMESTAMP_NTZ(9) or TIMESTAMP_LTZ(9) data type.

  • You can only set a default value to an expression, such as DEFAULT pi(), when you create a table; you can't set a default value to an expression when you modify a table by using the ALTER ICEBERG TABLE command.

  • Sequences aren't supported.

    For example, the following CREATE ICEBERG TABLE command fails because it includes LOG_ID NUMBER(38,0) NOT NULL autoincrement order:

    CREATE OR REPLACE ICEBERG TABLE CDC_RUN_LOG (
        LOG_ID NUMBER(38,0) NOT NULL autoincrement order,
        ENTITY_NAME VARCHAR(100),
        LAST_RUN TIMESTAMP_NTZ(9),
        DAG_NAME VARCHAR(100)
        )
        CATALOG = 'SNOWFLAKE'
        EXTERNAL_VOLUME = 'my_external_volume'
        BASE_LOCATION = 'my_iceberg_table';
        COMMENT='CDC table to manage log of runs'
        ICEBERG_VERSION = 3;
    
    Copy

Externally managed Iceberg tables

  • You can't specify a default value that uses the TIMESTAMP_NTZ(9) or TIMESTAMP_LTZ(9) data type.

These considerations and limitations apply to default values, which are features of Iceberg v3. For a list of considerations and limitations that apply to all Iceberg v3 tables, see Iceberg v3 功能注意事项和限制.

Use row lineage with Iceberg tables

备注

For the other Iceberg v3 features that are supported in this preview, see Apache Iceberg™ 表:对 Apache Iceberg™ v3 的支持(预览版).

This preview introduces support for the row lineage feature for Apache Iceberg™ tables. With this feature, the following columns are automatically written by Snowflake to an Iceberg table:

  • _row_id

  • _last_updated_sequence_number

This feature lets query engines to reliably match the same row across snapshots and detect row-level changes. For more information, see Row lineage (https://iceberg.apache.org/spec/#row-lineage).

This feature is supported with both Snowflake-managed and externally managed Iceberg tables.

重要

To use row lineage with Iceberg tables, the tables must conform to v3 of the Apache Iceberg™ table specification. For instructions on how to configure the Iceberg version for tables, see 配置默认 Iceberg 版本.

Considerations and limitations for row lineage

Row lineage is supported in streams with the following considerations:

  • Append-only streams and standard streams are supported on Snowflake-managed Iceberg v3 tables.

  • Insert-only streams and standard streams are supported on externally managed Iceberg v3 tables.

    • To have standard streams produce the correct results, the external engine must write to Iceberg v3 tables with respect to the Iceberg v3 specification. Specifically, newly inserted rows should have _row_id=NULL. Rows that are copied during copy-on-write should maintain the _row_id.

    • MAX_DATA_EXTENSION_TIME_IN_DAYS doesn't work on externally managed Iceberg v3 tables.

  • When DMLs are committed over multi-statement transactions, append-only streams on Iceberg v3 tables have different semantics compared to Iceberg v2 tables:

    • On Iceberg v2, for append-only streams, if a row is added and then deleted in a multi-statement transaction, this row is considered an insertion.

    • On Iceberg v3, for append-only streams, this row isn't treated as an insertion.

These considerations and limitations apply to row lineage, which is a feature from Iceberg v3. For a list of considerations and limitations that apply to all Iceberg v3 tables, see Iceberg v3 功能注意事项和限制.