Use a catalog-linked database for Apache Iceberg™ tables¶
With a catalog-linked database, you can access multiple remote Iceberg tables from Snowflake without creating individual externally managed tables.
目录链接的数据库是一种连接到外部 Iceberg REST 目录的 Snowflake 数据库。Snowflake 会自动与外部目录同步以检测命名空间和 Iceberg 表,并将远程表注册到目录链接的数据库。目录链接的数据库还支持创建和删除架构或 Iceberg 表。
Billing for catalog-linked databases¶
Snowflake bills your account for the following usage:
- Automatic table discovery, create schema, drop schema, and drop table. Snowflake will bill your account for this usage under the CREDITS_USED_CLOUD_SERVICES usage type. Usage for cloud services is charged only if the daily consumption of cloud services exceeds 10% of the daily usage of virtual warehouses. For more information, see Understanding billing for cloud services usage.
- Create table. Snowflake will bill your account for this usage under the CREDITS_USED_COMPUTE usage type through auto refresh. The cost for this usage is described in Table 5 of the Snowflake service consumption table on the Snowflake website. Refer to the Snowflake-managed compute column for the Automated Refresh and Data Registration row.
Snowflake won’t bill you for any cloud services that you use during table creation.
Note
To view the credit usage for your catalog-linked databases, use the CATALOG_LINKED_DATABASE_USAGE_HISTORY view.
在创建目录链接的数据库之前,请从以下选项中选择一种来配置外部目录及表存储的访问权限。
以下步骤介绍如何创建目录链接的数据库、检查 Snowflake 与目录之间的同步状态,以及在数据库中创建或查询表。
- Configure access to your external catalog and table storage
- 创建目录链接数据库
- 检查目录同步状态
- 对于目录链接的数据库中的 Iceberg 表: or 写入远程目录
Note
- If your external data is in Unity Catalog, see Tutorial: Set up bidirectional access to Apache Iceberg™ tables in Databricks Unity Catalog to get started with catalog-linked databases.
- If your external data is in AWS Glue, see Build Data Lakes using Apache Iceberg with Snowflake and AWS Glue
Configure access to your external catalog and table storage¶
Before you create a catalog-linked database, you need to configure access to your external catalog and table storage. To configure this access, you configure a catalog integration with vended credentials. With this option, your remote Iceberg catalog must support credential vending.
For instructions, see Use catalog-vended credentials for Apache Iceberg™ tables.
Note
If your remote Iceberg catalog doesn’t support credential vending, you must configure an external volume and a catalog integration to configure access to your external catalog and table storage. First, configure an external volume for your cloud storage provider. Then, configure an Apache Iceberg™ REST catalog integration for your remote Iceberg catalog.
创建目录链接数据库
Create a catalog-linked database with the CREATE DATABASE (catalog-linked) command:
The following example creates a catalog-linked database that uses vended credentials. The sync interval is 30 seconds, which is the default. The sync interval tells Snowflake how often to poll your remote catalog.
Note
To create a catalog-linked database that uses an external volume, see CREATE DATABASE (catalog-linked), including the example.
Your catalog-linked database includes a link icon.
Check the configuration of a catalog-linked database¶
After you create a catalog-linked database, use the SYSTEM$GET_CATALOG_LINKED_DATABASE_CONFIG function to check the configuration for the database.
检查目录同步状态
To check whether Snowflake has successfully linked your remote catalog to your database, use the SYSTEM$CATALOG_LINK_STATUS function.
该函数还可提供相关信息,帮助您识别远程目录中同步失败的表。
Identify tables that were created but couldn’t be initialized¶
To identify tables in the remote catalog that synced successfully but fail to refresh automatically, run the SHOW ICEBERG TABLES
command, and then refer to the auto_refresh_status column in the output. These tables
have an executionState of ICEBERG_TABLE_NOT_INITIALIZED in the output.
For example, Snowflake might successfully discover and create a table in your remote catalog to your catalog-linked database, but this table has a corrupted data file in your remote catalog. As a result, Snowflake can’t automatically refresh the table until you resolve the error.
Automated refresh is turned off for these kinds of tables, so querying the table in Snowflake returns an error that says the table was never initialized. To query the table, you must fix the error, and then turn on automated refresh for the table.
对于目录链接的数据库中的 Iceberg 表:¶
创建目录链接的数据库后,Snowflake 会启动表发现过程,并自动使用 SYNC_INTERVAL_SECONDS 参数值(默认间隔为 30 秒)轮询已链接的目录以检查变更。
在数据库中,远程目录的允许命名空间将显示为架构,而 Iceberg 表将出现在其各自的架构之下。
您可以使用 SELECT 语句查询远程表。
Note
For the requirements for identifying objects in a catalog-linked database, see Requirements for identifier resolution in a catalog-linked database.
For more information about object identifiers, see Identifier requirements.
例如:
写入远程目录
您可以使用 Snowflake 在已链接目录中创建命名空间和 Iceberg 表。有关详细信息,请参阅以下主题:
- Write support for externally managed Apache Iceberg™ tables
- Use CREATE SCHEMA to create namespaces in your external catalog
- Create an Iceberg table in a catalog-linked database
Requirements for identifier resolution in a catalog-linked database¶
The requirement for resolving an identifier depends on the following:
- The value that you specified for the CATALOG_CASE_SENSITIVITY parameter when you created your catalog-linked database
- Whether your external Iceberg catalog uses case-sensitive or case-insensitive identifiers.
Note
- These identifier resolution rules apply only to identifiers within the catalog-linked database. Identifiers outside the catalog-linked database follow standard Snowflake identifier resolution rules.
- These requirements apply to all SQL statements, including queries and DDL commands such as CREATE and ALTER.
- If your catalog uses CASE_INSENSITIVE mode (the default for catalogs like AWS Glue and Unity Catalog), you don’t need to double-quote identifiers for any commands. For DDL, Snowflake normalizes object names to lowercase on the remote catalog except when you use double-quoted mixed-case identifiers with QUOTED_IDENTIFIERS_IGNORE_CASE set to FALSE (the default). See the CASE_INSENSITIVE row in the table for details.
- If your catalog uses CASE_SENSITIVE mode and normalizes to lowercase, you must use lowercase letters and surround the schema, table, and column names in double quotes when creating or altering objects.
The following table shows the requirement for each scenario:
| CATALOG_CASE_SENSITIVITY value | External Iceberg catalog uses | Requirement |
|---|---|---|
| CASE_SENSITIVE | Case sensitive identifiers | Snowflake matches identifiers exactly as they appear, including case. Snowflake automatically converts unquoted identifiers to uppercase, but quoted identifiers must match exactly the case in your external catalog. The following example shows a valid query for creating a table: Snowflake creates the table in the external catalog as The following example shows a valid query for selecting the In the previous example, the double quotes are required for matching the capitalization exactly. The following example shows an invalid query, unless a In the previous example, the query is invalid if The following example shows an invalid query for the case when an all uppercase |
| CASE_SENSITIVE | Case insensitive identifiers | If the external Iceberg catalog is actually case insensitive, and normalizes to lowercase, you must surround identifiers in double quotes. The following example shows valid queries: |
| CASE_INSENSITIVE | Case insensitive identifiers | If your case-insensitive catalog has a lowercase You don’t need to use double quotes when creating or altering objects. The DDL examples that follow show how Snowflake normalizes object names to lowercase on the remote catalog; the Quoted identifiers and QUOTED_IDENTIFIERS_IGNORE_CASE parameter subsections describe when double quotes preserve mixed case. For example: Snowflake creates the schema as Snowflake creates the table as Snowflake adds the column as Snowflake renames the column to Quoted identifiers: When QUOTED_IDENTIFIERS_IGNORE_CASE
is FALSE (the default), double-quoted identifiers preserve the exact casing you specify. For example,
QUOTED_IDENTIFIERS_IGNORE_CASE parameter: When QUOTED_IDENTIFIERS_IGNORE_CASE is TRUE in a catalog-linked database with CASE_INSENSITIVE, Snowflake resolves identifiers in a case-insensitive manner whether they are double-quoted or unquoted. For DDL, Snowflake still normalizes object names to lowercase on the remote catalog, consistent with the unquoted DDL examples in this section. |
| CASE_INSENSITIVE | Case sensitive identifiers | If the external Iceberg catalog is actually case sensitive, Snowflake treats unquoted identifiers as case-insensitive and automatically converts unquoted identifiers to uppercase. When you create or query objects, Snowflake matches identifiers regardless of case, as long as they are unquoted. Using this pattern is discouraged because Snowflake can’t resolve two different identifiers that differ in casing. This pattern only works when no two identifiers are different in casing only. Consider the case where the remote catalog has a Quoted identifiers preserve case and match exactly. However, in CASE_INSENSITIVE mode, unquoted and quoted forms are both supported. |
为 |iceberg-tm| 表使用目录链接的数据库¶
Consider the following items when you use a catalog-linked database:
-
Supported only when you use a catalog integration for Iceberg REST (for example, Snowflake Open Catalog).
-
To limit automatic table discovery to a specific set of namespaces, use the ALLOWED_NAMESPACES parameter. You can also use the BLOCKED_NAMESPACES parameter to block a set of namespaces.
-
Snowflake doesn’t sync remote catalog access control for users or roles.
-
You can create schemas, externally managed Iceberg tables, or database roles in a catalog-linked database. Creating other Snowflake objects isn’t currently supported.
-
When you create a catalog-linked database, you can’t specify the default Iceberg version or merge-on-read behavior to use for Iceberg tables.
However, you can modify these properties for an existing database by using the ALTER DATABASE (catalog-linked) command to set the following parameters:
- ICEBERG_VERSION_DEFAULT
- ENABLE_ICEBERG_MERGE_ON_READ
-
For Iceberg tables in a catalog-linked database:
-
Snowflake bidirectionally syncs table and column descriptions between the remote catalog and Snowflake. Sync can update a description to a new value, but never replaces a non-empty description with an empty one. Other remote catalog table properties, such as retention policies or buffers, aren’t copied, and altering table properties isn’t currently supported.
-
Automated refresh is enabled by default. If the
table-uuidof an external table and the catalog-linked database table don’t match, refresh fails and Snowflake drops the table from the catalog-linked database; Snowflake doesn’t change the remote table. -
If you drop a table from the remote catalog, Snowflake drops the table from the catalog-linked database. This action is asynchronous, so you might not see the change in the remote catalog right away.
-
If you rename a table in the remote catalog, Snowflake drops the existing table from the catalog-linked database and creates a table with the new name.
-
Masking policies and tags are supported. Other Snowflake-specific features, including replication and cloning, aren’t supported.
-
The character that you choose for the NAMESPACE_FLATTEN_DELIMITER parameter can’t appear in your remote namespaces. During the auto discovery process, Snowflake skips any namespace that contains the delimiter, and doesn’t create a corresponding schema in your catalog-linked database.
-
If you specify anything other than
_,$, or numbers for the NAMESPACE_FLATTEN_DELIMITER parameter, you must put the schema name in quotes when you query the table. -
For databases linked to AWS Glue, you must use lowercase letters and surround the schema, table, and column names in double quotes. This is also required for other Iceberg REST catalogs that only support lowercase identifiers.
The following example shows a valid query:
The following statements aren’t valid, because they use uppercase letters or omit the double quotes:
-
Using UNDROP ICEBERG TABLE isn’t supported.
-
Sharing:
- Sharing with a listing isn’t currently supported
- Direct sharing is supported
-
-
For writing to tables in a catalog-linked database:
- Creating tables in nested namespaces isn’t currently supported.
- Writing to tables in nested namespaces isn’t currently supported.
- Position row-level deletes (https://iceberg.apache.org/spec/#row-level-deletes) are supported for tables stored
on Amazon S3, Azure, or Google Cloud. Row-level deletes with equality delete files aren’t supported. For more information about row-level deletes,
see Use row-level deletes. To turn off position deletes, which enable
running the Data Manipulation Language (DML) operations in copy-on-write mode, set the
ENABLE_ICEBERG_MERGE_ON_READparameter to FALSE at the table, schema, or database level.