CREATE ICEBERG TABLE(使用 AWS Glue 作为 Iceberg 目录)

Important

To integrate with AWS Glue, we recommend that you instead use AWS Glue Iceberg REST (https://docs.aws.amazon.com/glue/latest/dg/connect-glu-iceberg-rest.html), which supports additional Iceberg table features such as catalog-vended credentials.

For instructions, see CREATE CATALOG INTEGRATION (Apache Iceberg™ REST) and CREATE ICEBERG TABLE (Iceberg REST catalog).

Creates or replaces an Apache Iceberg™ table in the current/specified schema using an Iceberg table that is registered in the AWS Glue Data Catalog. This type of Iceberg table requires a catalog integration to connect Snowflake to AWS Glue.

本主题将 Iceberg 表简称为“表”(指定 Iceberg 表 的位置除外)以避免混淆。

Note

Before creating a table, you must create the external volume where the Iceberg metadata and data files are stored. For instructions, see Configure an external volume.

You also need a catalog integration for the table. To learn more, see Configure a catalog integration for AWS Glue.

See also:

ALTER ICEBERG TABLE , DROP ICEBERG TABLE , SHOW ICEBERG TABLES , DESCRIBE ICEBERG TABLE , UNDROP ICEBERG TABLE

语法

CREATE [ OR REPLACE ] ICEBERG TABLE [ IF NOT EXISTS ] <table_name>
  [ EXTERNAL_VOLUME = '<external_volume_name>' ]
  [ CATALOG = '<catalog_integration_name>' ]
  CATALOG_TABLE_NAME = '<catalog_table_name>'
  [ CATALOG_NAMESPACE = '<catalog_namespace>' ]
  [ REPLACE_INVALID_CHARACTERS = { TRUE | FALSE } ]
  [ AUTO_REFRESH = { TRUE | FALSE } ]
  [ COMMENT = '<string_literal>' ]
  [ [ WITH ] TAG ( <tag_name> = '<tag_value>' [ , <tag_name> = '<tag_value>' , ... ] ) ]
  [ WITH CONTACT ( <purpose> = <contact_name> [ , <purpose> = <contact_name> ... ] ) ]

必填参数

table_name

指定表的标识符(名称);在创建表的架构中必须是唯一的。

In addition, the identifier must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (for example, "My object"). Identifiers enclosed in double quotes are also case-sensitive.

For more information, see Identifier requirements.

CATALOG_TABLE_NAME = 'catalog_table_name'

Specifies the table name as recognized by the AWS Glue Data Catalog. For an example of using CATALOG_TABLE_NAME when you create an Iceberg table, see Examples (in this topic).

创建表后,该参数无法更改。

可选参数

EXTERNAL_VOLUME = 'external_volume_name'

Specifies the identifier (name) for the external volume where the Iceberg table stores its metadata files and data in Parquet format. Iceberg metadata and manifest files store the table schema, partitions, snapshots, and other metadata.

If you don’t specify this parameter, the Iceberg table defaults to the external volume for the schema, database, or account. The schema takes precedence over the database, and the database takes precedence over the account.

CATALOG = 'catalog_integration_name'

Specifies the identifier (name) of the catalog integration for this table. You must specify a catalog integration that you have configured for AWS Glue. For information, see Configure a catalog integration for AWS Glue.

如果未指定,则默认将 Iceberg 表视为架构、数据库或账户的目录集成。架构优先于数据库,数据库优先于账户。

CATALOG_NAMESPACE = 'catalog_namespace'

Optionally specifies the namespace (for example, my_glue_database) for the AWS Glue Data Catalog source. By specifying a namespace with the catalog integration and then at the table level, you can use a single catalog integration for AWS Glue to create Iceberg tables across different databases. If you don’t specify a namespace with the table, the table uses the default catalog namespace associated with the catalog integration

如果在目录集成中未指定默认命名空间,则必须为 AWS Glue Data Catalog 指定命名空间,以设置表的目录命名空间。

REPLACE_INVALID_CHARACTERS = { TRUE | FALSE }

Specifies whether to replace invalid UTF-8 characters with the Unicode replacement character (�) in query results. You can only set this parameter for tables that use an external Iceberg catalog.

  • TRUE replaces invalid UTF-8 characters with the Unicode replacement character.
  • FALSE leaves invalid UTF-8 characters unchanged. Snowflake returns a user error message when it encounters invalid UTF-8 characters in a Parquet data file.

If not specified, the Iceberg table defaults to the parameter value for the schema, database, or account. The schema takes precedence over the database, and the database takes precedence over the account.

Default: FALSE

AUTO_REFRESH = { TRUE | FALSE }

Specifies whether Snowflake should automatically poll the external Iceberg catalog that is associated with the table for metadata updates.

If no value is specified for the REFRESH_INTERVAL_SECONDS parameter on the catalog integration, Snowflake uses a default refresh interval of 30 seconds.

For more information, see automated refresh.

Default: FALSE

Note

Using AUTO_REFRESH with INFER_SCHEMA isn’t supported.

COMMENT = 'string_literal'

指定表的注释。

默认:无值

TAG ( tag_name = 'tag_value' [ , tag_name = 'tag_value' , ... ] )

Specifies the tag name and the tag string value.

The tag value is always a string, and the maximum number of characters for the tag value is 256.

For information about specifying tags in a statement, see Tag quotas.

WITH CONTACT ( purpose = contact [ , purpose = contact ...] )

Associate the new object with one or more contacts.

Specify the WITH CONTACT clause after all other clauses except the AS clause (if that clause is supported by this command).

访问控制要求

A role used to execute this operation must have the following privileges at a minimum:

权限对象备注
CREATE ICEBERG TABLE架构
CREATE EXTERNAL VOLUME账户需要创建新的外部卷。
USAGE外部卷需要引用现有的外部卷。
CREATE INTEGRATION账户需要创建新的目录集成。
USAGE目录集成需要引用现有目录集成。

Operating on an object in a schema requires at least one privilege on the parent database and at least one privilege on the parent schema.

For instructions on creating a custom role with a specified set of privileges, see Creating custom roles.

For general information about roles and privilege grants for performing SQL actions on securable objects, see Overview of Access Control.

使用说明

  • 运行此命令的注意事项:

    • If you created your external volume or catalog integration using a double-quoted identifier, you must specify the identifier exactly as created (including the double quotes) in your CREATE ICEBERG TABLE statement. Failure to include the quotes might result in an Object does not exist error (or similar type of error).

      To view an example, see the Examples (in this topic) section.

  • Considerations for creating tables:

    • A schema cannot contain tables and/or views with the same name. When creating a table:
    • If a view with the same name already exists in the schema, an error is returned and the table is not created.
    • If a table with the same name already exists in the schema, an error is returned and the table is not created, unless the optional OR REPLACE keyword is included in the command.
    • CREATE OR REPLACE <object> statements are atomic. That is, when an object is replaced, the old object is deleted and the new object is created in a single transaction.

      This means that any queries concurrent with the CREATE OR REPLACE ICEBERG TABLE operation use either the old or new table version.
    • The OR REPLACE and IF NOT EXISTS clauses are mutually exclusive. They can’t both be used in the same statement.
    • Similar to reserved keywords, ANSI-reserved function names (CURRENT_DATE, CURRENT_TIMESTAMP, etc.) cannot be used as column names.
    • Recreating a table (using the optional OR REPLACE keyword) drops its history, which makes any stream on the table stale. A stale stream is unreadable.
  • 关于元数据:

    Attention

    Customers should ensure that no personal data (other than for a User object), sensitive data, export-controlled data, or other regulated data is entered as metadata when using the Snowflake service. For more information, see Metadata fields in Snowflake.

示例

创建以 AWS Glue 作为目录的 Iceberg 表

This example creates an Iceberg table that uses the AWS Glue Data Catalog. To override the default catalog namespace and set a catalog namespace for the table, the statement uses the optional CATALOG_NAMESPACE parameter.

CREATE ICEBERG TABLE glue_iceberg_table
  EXTERNAL_VOLUME='glue_catalog_volume'
  CATALOG='glue_catalog_integration'
  CATALOG_TABLE_NAME='my_glue_catalog_table'
  CATALOG_NAMESPACE='icebergcatalogdb2'
  AUTO_REFRESH = TRUE;

使用带双引号的标识符指定外部卷或目录集成

此示例创建一个具有外部卷和目录集成的 Iceberg 表,表的标识符中包含双引号。放在双引号内的标识符区分大小写,并且通常包含特殊字符。

The identifiers "glue_volume_1" and "glue_catalog_integration_1" are specified exactly as created (including the double quotes). Failure to include the quotes might result in an Object does not exist error (or similar type of error).

To learn more, see Double-quoted identifiers.

CREATE OR REPLACE ICEBERG TABLE itable_with_quoted_catalog
  EXTERNAL_VOLUME = '"glue_volume_1"'
  CATALOG = '"glue_catalog_integration_1"'
  CATALOG_TABLE_NAME='my_glue_catalog_table'
  CATALOG_NAMESPACE='icebergcatalogdb2'
  AUTO_REFRESH = TRUE;