COPY FILES¶

将文件从源位置复制到输出暂存区。您可以使用暂存区或查询作为要复制的文件的来源。

使用暂存区作为源，将文件从一个暂存区复制到另一个暂存区，而无需重命名。
将查询用作以下任务的源：
- Copy from or to a set of files defined by a query (SELECT statement).
- Copy from files written by a UDF (for example, Writing files from Snowpark Python UDFs and UDTFs).
- 从作用域或暂存区 URLs 复制。

You can copy from and to existing named stages, as the following table illustrates:

源位置	目标位置
内部命名暂存区	内部命名暂存区
外部暂存区	内部命名暂存区
内部命名暂存区	外部暂存区
外部暂存区	外部暂存区
Snowflake Git repository clone	内部命名暂存区
Snowflake Git repository clone	外部暂存区

目标或源外部暂存区可以引用以下任何云存储服务或本地位置中的文件：

Amazon S3
Google Cloud Storage
Microsoft Azure Blob 存储
Microsoft Data Lake Storage Gen2
Microsoft Azure General-purpose v2
Amazon S3-compatible storage

See also:: External stages , Internal stages, Git repository clone

语法

从暂存区复制

COPY FILES INTO @[<namespace>.]<stage_name>[/<path>/]
  FROM @[<namespace>.]<stage_name>[/<path>/]
  [ FILES = ( '<file_name>' [ , '<file_name>' ] [ , ... ] ) ]
  [ PATTERN = '<regex_pattern>' ]
  [ DETAILED_OUTPUT = { TRUE | FALSE } ]

从查询复制

COPY FILES INTO @[<namespace>.]<stage_name>[/<path>/]
  FROM ( SELECT <existing_url> [ , <new_filename> ] FROM ... )
  [ DETAILED_OUTPUT = { TRUE | FALSE } ]

必填参数

INTO @[namespace.]stage_name[/path/]

指定复制文件的目标位置。

namespace is the database or schema in which the internal or external stage resides, in the form of database_name.schema_name or schema_name. The namespace is optional if a database and schema are currently in use within the user session; otherwise, it is required.
path is an optional, case-sensitive path in the cloud storage location that specifies a set of files to copy from the source stage or a specific location on the target stage. Your cloud storage service might call the path a prefix or a folder.

Note

If a target or source path name includes special characters or spaces, you must enclose the INTO ... value in single quotes.

The values for INTO ... must be literal constants. The values cannot be SQL variables.

使用暂存区作为源

FROM @[namespace.]stage_name[/path/]: Specifies the source location where the files to copy are staged. The values provided to FROM ... follow the same specification and constraints as INTO... values.

使用查询作为源

FROM (SELECT existing_url [ , new_filename ] FROM ... )

Specifies the source location and optional relative output location for the copied files. Each row that the SELECT query returns represents a file to copy.

existing_url is a scoped URL, stage name, or stage URL.
new_filename is an optional relative path from the output stage specified for the INTO clause.

Snowflake 将文件复制到以下位置：

@[<namespace>.]<stage_name>[/<path>]<new_filename>

If you don’t specify a value for new_filename, Snowflake uses the relative path of the existing_url.

可选参数

FILES = ( 'file_name' [ , 'file_name' ... ] )

指定要复制的一个或多个用逗号分隔的文件名的列表。这些文件必须已暂存到您在命令中指定的源位置。Snowflake 会跳过任何无法找到的指定文件。

您最多可以指定 1000 个文件名。

从查询复制文件不支持此选项。请使用查询来提供文件名列表。

Note

要设置外部暂存区的文件路径，Snowflake 会将暂存区定义中的 URL 添加到列表中的每个文件名前面。

However, Snowflake does not insert a separator between the path and file name. You must explicitly include a separator (/) at the end of the URL in the stage definition or at the beginning of each file name in the FILES list.

PATTERN = 'regex_pattern'

Specifies a regular expression pattern for filtering the list of files to copy. This command applies the regular expression to the entire storage location in the FROM clause.

从查询复制文件不支持此选项。相反，使用查询来匹配模式。

Tip

为了获得最佳性能，请避免对大量文件进行筛选的模式。

DETAILED_OUTPUT = { TRUE | FALSE }

指定命令输出是应汇总复制操作的结果，还是列出复制的每个文件。

Values:

If TRUE, the output includes a row for each file copied to the target location. A single column named file contains the target path (if applicable) and file name for each copied file.
If FALSE, the output is a single row with the number of files that were copied.

Default:

TRUE

访问控制要求

A role used to execute this SQL command must have the following privileges (depending on the source and target locations) at a minimum:

权限	对象	备注
USAGE	外部暂存区	在源或目标外部暂存区是必填的。
READ	内部命名暂存区	在源内部暂存区是必填的。
WRITE	内部命名暂存区	在目标内部暂存区是必填的。

Operating on an object in a schema requires at least one privilege on the parent database and at least one privilege on the parent schema.

For instructions on creating a custom role with a specified set of privileges, see Creating custom roles.

For general information about roles and privilege grants for performing SQL actions on securable objects, see Overview of Access Control.

使用说明

此命令不支持以下操作：
- 将文件复制到表暂存区或从中复制文件。
- 使用暂存区作为源时，将文件复制到用户暂存区或从中复制文件。
- Copying data in archival cloud storage classes that requires restoration before it can be retrieved. Archival storage classes include Amazon S3 Glacier Flexible Retrieval, Glacier Deep Archive, or Microsoft Azure Archive Storage.
- 复制大于 5GB 的文件。

运行此命令的注意事项：

COPY FILES statements overwrite any existing files with matching names in the target location. The command does not remove any existing files that don’t match the names of the copied files.
如果文件复制操作失败，Snowflake 不会执行任何自动清理。

Copying files from Google Cloud Storage: A COPY FILES statement might fail if the object list for an external stage includes one or more directory blobs. A directory blob is a path that ends in a forward slash character (/). In the following example output for LIST @<stage>, my_gcs_stage/load/ is a directory blob.

+---------------------------------------+------+----------------------------------+-------------------------------+
| name                                  | size | md5                              | last_modified                 |
|---------------------------------------+------+----------------------------------+-------------------------------|
| my_gcs_stage/load/                    |  12  | 12348f18bcb35e7b6b628ca12345678c | Mon, 11 Aug 2022 16:57:43 GMT |
| my_gcs_stage/load/data_0_0_0.csv.gz   |  147 | 9765daba007a643bdff4eae10d43218y | Mon, 11 Aug 2022 18:13:07 GMT |
+---------------------------------------+------+----------------------------------+-------------------------------+

使用 Google Cloud Console 创建目录时，Google 会创建目录 Blob。

To avoid this issue and specify which files to copy, use the PATTERN option (for copy from stage) or FROM (for copy from query).

For an example, see 使用模式匹配复制文件.

Snowflake uses multipart uploads when uploading to Amazon S3 and Google Cloud Storage. To prevent incomplete uploads from accumulating, we recommend that you set a lifecycle rule. For instructions, see the Amazon S3 (https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpu-abort-incomplete-mpu-lifecycle-config.html) or Google Cloud Storage (https://cloud.google.com/storage/docs/lifecycle#abort-mpu) documentation.

COPY FILES 命令会产生数据传输和计算成本：
- Data transfer: Cloud providers might charge for data transferred out of their own network. To recover these expenses, Snowflake charges a per-byte fee when you copy files from an internal Snowflake stage into an external stage in a different region or with a different cloud provider. Snowflake does not charge for data ingress (for example, when copying files from an external stage into an internal stage).
  
  For more information about data transfer billing, see Understanding data transfer cost.
- Compute: COPY FILES is a serverless feature and doesn’t require a virtual warehouse. The line item for the COPY FILES command on your Snowflake bill does not include any cloud services charges.
  
  For more information about compute resource billing, see Understanding compute cost.
Note
Some Snowflake features, such as Native Apps and worksheets, incur COPY FILES charges. As a result, you might see COPY FILES charges even if you haven’t executed the COPY FILES command. For more information about these charges, contact Snowflake Support.
Snowflake 不维护此命令的文件复制历史记录。

示例

复制文件

Copy all of the files from an existing source stage (src_stage) to an existing target stage (trg_stage):

COPY FILES
  INTO @trg_stage
  FROM @src_stage;

Note

要从具有受保护存储位置的外部暂存区复制文件或将文件复制到此外部暂存区，请确保暂存区定义包含访问云存储位置的凭据。

Specify the names of files to copy from an existing source stage (src_stage) to an existing target stage (trg_stage):

COPY FILES
  INTO @trg_stage
  FROM @src_stage
  FILES = ('file1.csv', 'file2.csv');

Copy files from a specific path on an existing stage (src_stage/src_path/) to a specific path on an existing target stage (trg_stage/trg_path/):

COPY FILES
  INTO @trg_stage/trg_path/
  FROM @src_stage/src_path/;

使用模式匹配复制文件

Use pattern matching to load only compressed CSV files in any path on an existing source stage (src_stage) to an existing target stage (trg_stage):

COPY FILES
  INTO @trg_stage
  FROM @src_stage
  PATTERN='.*/.*/.*[.]csv[.]gz';

The .* component represents zero or more occurrences of any character. The square brackets escape the period character (.) that precedes a file extension.

Copy only uncompressed CSV files whose names include the string sales:

COPY FILES
  INTO @trg_stage
  FROM @src_stage
  PATTERN='.*sales.*[.]csv';

使用查询复制文件

复制单个文件

文件名与源暂存区中的文件名相同。

COPY FILES
  INTO @trg_stage
  FROM (SELECT '@src_stage/file.txt');

复制并重命名单个文件

COPY FILES
  INTO @trg_stage
  FROM (SELECT '@src_stage/file.txt', 'new_filename.txt');

复制表中的所有文件

要使用查询复制多个文件，可以使用通用查询。

-- Create a table with URLs
CREATE TABLE urls(src_file STRING, trg_file STRING);
INSERT INTO urls VALUES ('@src_stage/file.txt', 'new_filename.txt');

-- Insert additional URLs here
COPY FILES
  INTO @trg_stage
  FROM (SELECT src_file, trg_file FROM urls);

只复制部分文件

此示例使用过滤器来复制与模式匹配的文件。

COPY FILES
  INTO @trg_stage
  FROM (SELECT src_file, trg_file FROM urls WHERE src_file LIKE '%file%');

从目录表复制文件

COPY FILES
  INTO @trg_stage
  FROM (SELECT relative_path FROM directory(@src_stage) WHERE relative_path LIKE '%.txt');

Copy files with detailed output¶

To produce command output with a list of files that are copied to the target location, use DETAILED_OUTPUT = TRUE.

The output has a single column named file that contains the target path, if applicable, and the file name for each copied file.
```
COPY FILES
  INTO @trg_stage
  FROM @src_stage
  DETAILED_OUTPUT = TRUE;
```
An example output:
```
+--------------------+
| file               |
%--------------------%
| employees01.csv.gz |
| employees02.csv.gz |
| employees03.csv.gz |
| employees04.csv.gz |
| employees05.csv.gz |
+--------------------+
```
To produce command output that summarizes the results of the copy operation, use DETAILED_OUTPUT = FALSE.

The output is a single row with the number of files that were copied.
```
COPY FILES
  INTO @trg_stage
  FROM @src_stage
  DETAILED_OUTPUT = FALSE;
```
An example output:
```
+-------------------+
| numOfFilesCopied  |
| 5                 |
+-------------------+
```