自动刷新 Apache Iceberg™ 表¶
为新的或现有的外部管理 Apache Iceberg™ 表 配置自动元数据刷新。通过自动刷新,Snowflake 以连续和无服务器形式轮询您的外部 Iceberg 目录,以将元数据与最新的远程变更同步。
Iceberg 表的自动刷新与目录表或外部表的自动刷新工作方式不同,因为它不依赖云服务提供商的通知。相反,您可以按照以下步骤配置该功能:
设置目录集成的刷新间隔。Snowflake 支持以下外部 Iceberg 目录选项的自动刷新:
符合 Apache Iceberg REST OpenAPI 规范的 REST 目录
Snowflake Open Catalog
对象存储(仅限 Delta Lake)
AWS Glue
这种方法允许您通过目录集成集中管理刷新设置,同时根据需要对各个表进行单独控制。
设置目录集成的刷新间隔¶
运行 CREATE CATALOG INTEGRATION 命令时,可以为 REFRESH_INTERVAL_SECONDS 参数指定值。否则,默认刷新间隔为 30 秒。Snowflake 只在目录集成定义了 Iceberg 表的情况下轮询外部目录。
以下示例为 AWS Glue 创建目录集成,并指定 60 秒的刷新间隔:
CREATE CATALOG INTEGRATION auto_refresh_catalog_integration
CATALOG_SOURCE = GLUE
CATALOG_NAMESPACE = 'my_catalog_namespace'
TABLE_FORMAT = ICEBERG
GLUE_AWS_ROLE_ARN = 'arn:aws:iam::123456789123:role/my-catalog-role'
GLUE_CATALOG_ID = '123456789123'
ENABLED = TRUE
REFRESH_INTERVAL_SECONDS = 60;
要更新目录集成的刷新间隔,请使用 ALTER CATALOG INTEGRATION 命令。
例如:
ALTER CATALOG INTEGRATION auto_refresh_catalog_integration SET REFRESH_INTERVAL_SECONDS = 120;
创建一个自动刷新的 Iceberg 表¶
使用 CREATE ICEBERG TABLE 命令创建 Iceberg 表。要指定表应使用自动元数据刷新,请设置 AUTO_REFRESH = TRUE。
下面的示例创建一个使用 AWS Glue 作为目录的 Iceberg 表,指定前面创建的目录集成 (auto_refresh_catalog_integration) 和来自 AWS Glue 的 CATALOG_TABLE_NAME。
CREATE OR REPLACE ICEBERG TABLE auto_refresh_iceberg_table
CATALOG_TABLE_NAME = 'myGlueTable'
CATALOG = 'auto_refresh_catalog_integration'
AUTO_REFRESH = TRUE;
启用或关闭自动刷新¶
备注
如果表使用 Snowflake 8.22 版之前创建的目录集成,则必须先使用 ALTER CATALOG INTEGRATION 命令设置
REFRESH_INTERVAL_SECONDS参数,然后再对表启用自动刷新。频繁地打开和关闭 Iceberg 表的自动刷新会降低表的元数据刷新速度。
使用 ALTER ICEBERG TABLE 命令启用或关闭现有 Iceberg 表的自动刷新。
例如:
ALTER ICEBERG TABLE my_iceberg_table SET AUTO_REFRESH = FALSE;
监控自动刷新状态¶
SHOW ICEBERG TABLES¶
要获取多个表的自动刷新状态,请使用 SHOW ICEBERG TABLES 命令。
SHOW ICEBERG TABLES;
命令输出包括一个名为 auto_refresh_status 的列,该列显示的信息与您具有访问权限的每个表的 SYSTEM$AUTO_REFRESH_STATUS 函数相同。
SYSTEM$AUTO_REFRESH_STATUS¶
要检索特定表的自动刷新状态,请调用 SYSTEM$AUTO_REFRESH_STATUS 函数。
SELECT SYSTEM$AUTO_REFRESH_STATUS('my_iceberg_table');
该函数返回 Snowflake 用于自动刷新表的管道的详细信息,例如快照队列的执行状态和大小。RUNNING 的执行状态指示自动刷新正在按预期运行。有关更多信息,请参阅 SYSTEM$AUTO_REFRESH_STATUS。
ICEBERG_TABLE_SNAPSHOT_REFRESH_HISTORY¶
要检索有关特定表的最新刷新历史记录的元数据和快照信息,请使用 ICEBERG_TABLE_SNAPSHOT_REFRESH_HISTORY 函数。
SELECT *
FROM TABLE(INFORMATION_SCHEMA.ICEBERG_TABLE_SNAPSHOT_REFRESH_HISTORY(
TABLE_NAME => 'my_iceberg_table'
));
Monitor automated refresh events¶
You can configure Snowflake to record an event that provides information about the status of automated refresh for an Iceberg table. Snowflake records the event in the event table for your account. You can't set it at the database or schema level. When Snowflake automatically refreshes an Iceberg table in your account, Snowflake records an event to your event table.
Monitoring automated refresh events can help you gain insight into the following areas:
Automated refresh progress: Track how snapshots move through the automated refresh process.
Aggregated statistics: Review summarized statistics for automated refresh operations.
You can also configure alerts for the following critical conditions:
Refresh errors
High refresh latencies
备注
Logging events for automated refresh incurs costs. For more information, see 遥测数据收集成本.
Snowflake records an event when automated refresh starts, completes, or results in error.
Set the severity level to capture events¶
To capture automated refresh events, you must set the LOG_LEVEL parameter at the Iceberg table level or account level. The LOG_LEVEL determines which events to capture based on the following values:
ERROR: Events that signal a change requiring human intervention to resolve.
WARN: Events that signal an issue that can be resolved without human intervention.
DEBUG: High-volume events.
备注
There is no default severity level. To capture events, you must set the severity level at either the account level or Iceberg table level.
For example, to capture DEBUG-level automated refresh events for a specific Iceberg table, use the following command:
ALTER ICEBERG TABLE <my_table_name> SET LOG_LEVEL = DEBUG;
有关更多信息,请参阅 为日志、指标和跟踪设置级别。
Query your event table for automated refresh events¶
Before you can query for automated refresh events, you must set up an event table and set the severity level for event capture.
The following example shows how to retrieve Iceberg automated refresh events that are generated during snapshot processing:
SELECT record_type,
record:"name" event_name,
record:"severity_text" log_level,
resource_attributes:"snow.database.name" database_name,
resource_attributes:"snow.schema.name" schema_name,
resource_attributes:"snow.table.name" table_name,
resource_attributes:"snow.catalog.integration.name" catalog_integration_name,
record_attributes:"snow.snapshot.id" snapshot_id,
parse_json(value):metadata_file_location metadata_file_location,
parse_json(value):snapshot_state snapshot_state
parse_json(value):created created
parse_json(value):detected detected
FROM my_active_event_table
WHERE record_type='EVENT' AND event_name='iceberg_auto_refresh_snapshot_lifecycle';
输出:
+-------------+-----------------------------------------+-----------+---------------+-------------+------------+--------------------------+---------------+------------------------+----------------+--------------------------+--------------------------+
| RECORD_TYPE | EVENT_NAME | LOG_LEVEL | DATABASE_NAME | SCHEMA_NAME | TABLE_NAME | CATALOG_INTEGRATION_NAME | SNAPSHOT_ID | METADATA_FILE_LOCATION | SNAPSHOT_STATE | CREATED | DETECTED |
+-------------+-----------------------------------------+-----------+---------------+-------------+------------+--------------------------+---------------+------------------------+----------------+--------------------------+--------------------------+
| EVENT | iceberg_auto_refresh_snapshot_lifecycle | DEBUG | TESTDB | TESTSH | TESTTABLE | glue_integration | 4281775564368 | metadata.json | detected | 2025-05-21T20:38:06.822Z | 2025-10-17T21:01:49.695Z |
| EVENT | iceberg_auto_refresh_snapshot_lifecycle | DEBUG | TESTDB | TESTSH | TESTTABLE | glue_integration | 4281775564368 | metadata.json | started | 2025-05-21T20:38:06.822Z | 2025-10-17T21:01:49.695Z |
| EVENT | iceberg_auto_refresh_snapshot_lifecycle | DEBUG | TESTDB | TESTSH | TESTTABLE | glue_integration | 4281775564368 | metadata.json | completed | 2025-05-21T20:38:06.822Z | 2025-10-17T21:01:49.695Z |
+-------------+-----------------------------------------+-----------+---------------+-------------+------------+--------------------------+---------------+------------------------+----------------+--------------------------+--------------------------+
Query your event table for stale automated refresh events¶
You can query your event table for tables whose last successful refresh is older than a threshold you define.
The following example defines a threshold of 20 minutes:
SET STALENESS_THRESHOLD_MINUTES = 20;
Query for tables whose last successful refresh is older than the threshold you defined:
WITH last_successful_refresh AS ( -- Find the most recent 'completed' event for each table SELECT resource_attributes:"snow.table.name"::STRING AS table_name, MAX(timestamp) AS last_success_timestamp FROM <my_active_event_table> WHERE record:"name" = 'iceberg_auto_refresh_snapshot_lifecycle' AND parse_json(value):snapshot_state::STRING = 'completed' GROUP BY table_name ) -- Select tables whose the last successful refresh was longer ago than our threshold SELECT table_name, last_success_timestamp FROM last_successful_refresh WHERE last_success_timestamp < DATEADD(minute, -$STALENESS_THRESHOLD_MINUTES, CURRENT_TIMESTAMP()) ORDER BY last_success_timestamp ASC;
Where
my_active_event_tableis your active event table.输出:
+------------+-------------------------+ | TABLE_NAME | LAST_SUCCESS_TIMESTAMP | +------------+-------------------------+ | my_table | 2025-10-10 07:24:30.854 | +------------+-------------------------+
错误恢复¶
当自动刷新过程中发生错误时,Snowflake会将执行状态更新为以下值之一:
STALLED表示Snowflake正在尝试从错误中恢复。如果恢复成功,自动刷新进程将继续按预期运行,执行状态将转换回正常的RUNNING状态。STOPPED表示自动刷新进程遇到了不可恢复的错误,并且表的自动刷新已停止。例如,当 Snowflake 无法在目标快照和当前快照之间建立直接沿袭时,可能会发生不可恢复的错误。
要从
STOPPED状态恢复,请执行以下操作:执行手动元数据刷新。有关说明,请参阅 刷新表元数据。
使用 ALTER ICEBERGTABLE 重新启用自动刷新...SET AUTO_REFRESH 语句重命名数据库角色。
通过调用
RUNNING函数验证自动刷新是否处于 SYSTEM$AUTO_REFRESH_STATUS 状态。您还可以多次调用该函数,以确认排队快照的数量 (pendingSnapshotCount) 逐渐减少。
计费¶
Snowflake uses Snowpipe to automate refreshes for Iceberg tables, so charges for automated refresh appear in the same line item on your bill as Snowpipe charges. Using events to monitor automated refresh also incurs cost. For more information, see 遥测数据收集成本.
There are no Snowpipe file charges for this feature.
您可以通过检查 Account Usage PIPE_USAGE_HISTORY 视图 来估算产生的费用,pipe_name 列中显示了 Iceberg 表名。
对于基于 Delta 的 Iceberg 表,自动刷新管道会显示 NULL 管道名称。
有关 Iceberg 表收费的更多信息,请参阅 Iceberg 表计费。
注意事项和限制¶
当您使用利用自动刷新的 Iceberg 表时,请考虑以下事项:
对于在 Snowflake 8.22 版之前创建的目录集成(基于 Delta 表的目录集成对应版本为 9.2 之前),必须先手动设置
REFRESH_INTERVAL_SECONDS参数,然后才能对依赖于该目录集成的表启用自动刷新。有关说明,请参阅 ALTER CATALOG INTEGRATION ... SET AUTO_REFRESH。对于 对象存储的目录集成,只有与
TABLE_FORMAT = DELTA的集成才支持自动刷新。对于频繁更新的表,使用较短的轮询间隔 (
REFRESH_INTERVAL_SECONDS) 可能会导致性能下降。