为 AWS Glue Iceberg REST 配置目录集成

Follow the steps in this topic to create a catalog integration for the AWS Glue Iceberg REST endpoint (https://docs.aws.amazon.com/glue/latest/dg/connect-glu-iceberg-rest.html) with Signature Version 4 (SigV4) (https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_aws-signing.html) authentication.

Note

To configure a catalog integration for connecting to AWS Glue Data Catalog through a private IP address instead of over the public internet, see Configure an Apache Iceberg™ REST catalog integration with outbound private connectivity.

第 1 步:配置 AWS Glue Data Catalog 的访问权限

Create an IAM policy for Snowflake to access the AWS Glue Data Catalog. Attach the policy to an IAM role, which you specify when you create a catalog integration. For instructions, see Creating IAM policies (https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create-console.html) and Modifying a role permissions policy (https://docs.aws.amazon.com/IAM/latest/UserGuide/roles-managingrole-editing-console.html#roles-modify_permissions-policy) in the AWS Identity and Access Management User Guide.

只读示例策略

Snowflake 至少需要 AWS Glue Data Catalog 的以下权限才能使用 Glue Iceberg REST 目录访问信息。

  • glue:GetCatalog
  • glue:GetDatabase
  • glue:GetDatabases
  • glue:GetTable
  • glue:GetTables

以下示例策略(JSON 格式)提供了访问指定数据库中所有表所需的权限。

{
   "Version": "2012-10-17",
   "Statement": [
      {
         "Sid": "AllowGlueCatalogTableAccess",
         "Effect": "Allow",
         "Action": [
           "glue:GetCatalog",
           "glue:GetDatabase",
           "glue:GetDatabases",
           "glue:GetTable",
           "glue:GetTables"
         ],
         "Resource": [
            "arn:aws:glue:*:<accountid>:table/*/*",
            "arn:aws:glue:*:<accountid>:catalog",
            "arn:aws:glue:*:<accountid>:database/<database-name>"
         ]
      }
   ]
}

Note

读写入示例策略

The following example policy (in JSON format) provides the required permissions for read and write access to all of the tables in all databases. To configure write access for externally managed tables, use this policy as an example.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowGlueCatalogTableAccess",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "glue:GetCatalog",
        "glue:GetDatabase",
        "glue:GetDatabases",
        "glue:CreateDatabase",
        "glue:DeleteDatabase",
        "glue:GetTable",
        "glue:GetTables",
        "glue:CreateTable",
        "glue:UpdateTable",
        "glue:DeleteTable"
      ],
      "Resource": [
        "arn:aws:glue:*:<accountid>:table/*/*",
        "arn:aws:glue:*:<accountid>:catalog",
        "arn:aws:glue:*:<accountid>:database/*",
        "arn:aws:s3:<external_volume_path>"
      ]
    }
  ]
}

Note

(可选)配置 Lake Formation 访问控制

如果您使用 AWS Lake Formation 进行精细访问控制,请确保您的 Lake Formation 配置允许 Snowflake 访问目录对象及其底层数据。

The IAM role that you created in the previous step — the role that you specify in Snowflake when you create a catalog integration — must have the lakeformation:GetDataAccess IAM permission. This permission grants read and write access to underlying data:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "lakeformation:GetDataAccess",
            "Resource": "*"
        }
    ]
}

For more information, see Underlying data access control (https://docs.aws.amazon.com/lake-formation/latest/dg/access-control-underlying-data.html) in the Lake Formation documentation.

You must also grant data permissions to the IAM role. The method that you use to grant data permissions depends on your Lake Formation setup. For example, you might use the named resources method to grant permissions to AWS Glue objects, or you might use tag-based access control. For more information and instructions, see the AWS Lake Formation documentation (https://docs.aws.amazon.com/lake-formation/latest/dg/granting-catalog-permissions.html).

第 2 步:在 Snowflake 中创建目录集成

Create a catalog integration for the AWS Glue Iceberg REST endpoint (https://docs.aws.amazon.com/glue/latest/dg/connect-glu-iceberg-rest.html) using the CREATE CATALOG INTEGRATION (Apache Iceberg™ REST) command. Specify the IAM role that you configured. For CATALOG_NAME, use your AWS account ID.

CREATE CATALOG INTEGRATION glue_rest_catalog_int
  CATALOG_SOURCE = ICEBERG_REST
  TABLE_FORMAT = ICEBERG
  CATALOG_NAMESPACE = 'rest_catalog_integration'
  REST_CONFIG = (
    CATALOG_URI = 'https://glue.us-west-2.amazonaws.com/iceberg'
    CATALOG_API_TYPE = AWS_GLUE
    CATALOG_NAME = '123456789012'
  )
  REST_AUTHENTICATION = (
    TYPE = SIGV4
    SIGV4_IAM_ROLE = 'arn:aws:iam::123456789012:role/my-role'
    SIGV4_SIGNING_REGION = 'us-west-2'
  )
  ENABLED = TRUE;

其中:

  • CATALOG_URI is the service endpoint for the AWS Glue Iceberg REST catalog.
  • CATALOG_NAME is the ID of your AWS account.

For more information, see CREATE CATALOG INTEGRATION (Apache Iceberg™ REST), which includes instructions for configuring a catalog integration for AWS Glue.

第 3 步:检索 Snowflake 账户的 AWS IAM 用户和外部 ID

To retrieve information about the AWS IAM user and the external ID for your Snowflake account, run the DESCRIBE CATALOG INTEGRATION command. You provide this information to AWS in the next step to establish a trust relationship.

DESCRIBE CATALOG INTEGRATION glue_rest_catalog_int;

记录以下值:

ValueDescription
GLUE_AWS_IAM_USER_ARNThe AWS IAM user created for your Snowflake account, for example, arn:aws:iam::123456789001:user/abc1-b-self1234. Snowflake provisions a single IAM user for your entire Snowflake account. All Glue catalog integrations in your account use that IAM user.
GLUE_AWS_EXTERNAL_IDAn external ID for establishing a trust relationship.

第 4 步:授予 IAM 用户访问 AWS Glue 数据目录的权限

Update the trust policy for the same IAM role that you specified with the ARN when you created the catalog integration (GLUE_AWS_ROLE_ARN). Add the values that you recorded in the previous step to the trust policy.

For instructions, see Modifying a trust policy (https://docs.aws.amazon.com/IAM/latest/UserGuide/roles-managingrole-editing-console.html#roles-managingrole_edit-trust-policy).

The following example policy shows where to specify the GLUE_AWS_IAM_USER_ARN and GLUE_AWS_EXTERNAL_ID values:

{
   "Version": "2012-10-17",
   "Statement": [
      {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
         "AWS": "<glue_iam_user_arn>"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
         "StringEquals": {
            "sts:ExternalId": "<glue_aws_external_id>"
         }
      }
      }
   ]
}

其中:

  • glue_iam_user_arn is the GLUE_IAM_USER_ARN value that you recorded.
  • glue_aws_external_id is the GLUE_AWS_EXTERNAL_ID value that you recorded.

Note

  • For security reasons, if you create a new catalog integration (or recreate an existing catalog integration by using the CREATE OR REPLACE CATALOG INTEGRATION syntax), the new catalog integration has a different external ID and can’t resolve the trust relationship unless you modify the trust policy with the new external ID.
  • To verify that your permissions are configured correctly, create an Iceberg table that uses this catalog integration. Snowflake doesn’t verify that your permissions are set correctly until you create an Iceberg table that references this catalog integration.

后续步骤

After you configure a catalog integration for AWS Glue Iceberg REST, you can create a catalog-linked database. Specify the name of your catalog integration as the catalog when you create your catalog-linked database.

与目录关联的数据库通过自动发现远程 Iceberg REST 目录中的命名空间和表,并保持同步,将外部数据引入 Snowflake。