Unload into Google Cloud Storage

如果您已经拥有 Google Cloud Storage 账户并使用 Cloud Storage 桶来存储和管理数据文件,您可以在从 Snowflake 表卸载数据时使用现有桶和文件夹路径。本主题介绍如何使用 COPY 命令来将数据从表卸载到 Cloud Storage 桶的命令。然后,您可以将卸载的数据文件下载到本地文件系统。

如下图所示,要将数据卸载到 Cloud Storage 桶,分两步执行:

Step 1:

Use the COPY INTO <location> command to copy the data from the Snowflake database table into one or more files in a Cloud Storage bucket. In the command, you specify a named external stage object that references the Cloud Storage bucket (recommended) or you can choose to unload directly to the bucket by specifying the URI and storage integration (if required) for the bucket.

无论您使用哪种方法,如果您手动或在脚本中执行命令,此步骤都需要运行一个当前虚拟仓库的会话。仓库提供计算资源来从表中写入行。

Step 2:

使用 Google 提供的接口/工具从 Cloud Storage 桶下载文件。

Unloading data to Cloud Storage

Tip

The instructions in this set of topics assume you have read File formats to unload data and have created a named file format, if desired.

Before you begin, you may also want to read Data unloading considerations for best practices, tips, and other guidance.

Configure Cloud Storage for unloading data

要让 Snowflake 写入 Cloud Storage 桶,您必须配置存储集成对象,将外部云存储的身份验证责任委托给 Snowflake Identity and Access Management (IAM) 实体。

For configuration instructions, see Configure an integration for Google Cloud Storage.

Unload data into an external stage

外部暂存区是指定的数据库对象,为数据卸载提供最大程度的灵活性。因为它们是数据库对象,所以可以将指定暂存区的权限授予任何角色。

您可以使用 Web 界面或 SQL 来创建指定的外部暂存区:

Snowsight:

In the navigation menu, select Catalog » Database Explorer. Then select the <db_name> » Stages.

SQL:

CREATE STAGE

Create a named stage

Snowflake uses multipart uploads when uploading to Amazon S3 and Google Cloud Storage. This process might leave incomplete uploads in the storage location for your external stage.

To prevent incomplete uploads from accumulating, we recommend that you set a lifecycle rule. For instructions, see the Amazon S3 (https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpu-abort-incomplete-mpu-lifecycle-config.html) or Google Cloud Storage (https://cloud.google.com/storage/docs/lifecycle#abort-mpu) documentation.

The following example creates an external stage named my_ext_unload_stage with a folder path named unload. The stage references the following objects:

  • A named storage integration called gcs_int. For instructions, see Configure an integration for Google Cloud Storage.
  • A named file format called my_csv_unload_format. For instructions, see File formats to unload data.
    CREATE OR REPLACE STAGE my_ext_unload_stage
      URL='gcs://mybucket/unload'
      STORAGE_INTEGRATION = gcs_int
      FILE_FORMAT = my_csv_unload_format;

Unload data to the named stage

  1. Use the COPY INTO <location> command to unload data from a table into a Cloud Storage bucket using the external stage.

    The following example uses the my_ext_unload_stage stage to unload all the rows in the mytable table into one or more files into the Cloud Storage bucket. A d1 filename prefix is applied to the files:

    COPY INTO @my_ext_unload_stage/d1
    FROM mytable;
  2. 使用 Cloud Storage 提供的工具从桶中检索对象(即命令生成的文件)。

Unload data directly into a Cloud Storage bucket

  1. Use the COPY INTO <location> command to unload data from a table directly into a specified Cloud Storage bucket. This option works well for ad hoc unloading, when you aren’t planning regular data unloading with the same table and bucket parameters.

您必须指定 Cloud Storage 桶的 URI,以及访问桶所需的存储集成。

The following example unloads all the rows in the mytable table into one or more files with the folder path prefix unload/ in a Cloud Storage bucket:

COPY INTO 'gcs://mybucket/unload/'
  FROM mytable
  STORAGE_INTEGRATION = gcs_int;
  1. 使用 Cloud Storage 控制台(或同等客户端应用程序)从桶中检索对象(即命令生成的文件)。