COPY FILES¶
将文件从源位置复制到输出暂存区。您可以使用暂存区或查询作为要复制的文件的来源。
- 使用暂存区作为源,将文件从一个暂存区复制到另一个暂存区,而无需重命名。
- 将查询用作以下任务的源:
- Copy from or to a set of files defined by a query (SELECT statement).
- Copy from files written by a UDF (for example, Writing files from Snowpark Python UDFs and UDTFs).
- 从作用域或暂存区 URLs 复制。
You can copy from and to existing named stages, as the following table illustrates:
| 源位置 | 目标位置 |
|---|---|
| 内部命名暂存区 | 内部命名暂存区 |
| 外部暂存区 | 内部命名暂存区 |
| 内部命名暂存区 | 外部暂存区 |
| 外部暂存区 | 外部暂存区 |
| Snowflake Git repository clone | 内部命名暂存区 |
| Snowflake Git repository clone | 外部暂存区 |
目标或源外部暂存区可以引用以下任何云存储服务或本地位置中的文件:
- Amazon S3
- Google Cloud Storage
- Microsoft Azure Blob 存储
- Microsoft Data Lake Storage Gen2
- Microsoft Azure General-purpose v2
- Amazon S3-compatible storage
- See also:
语法
从暂存区复制
从查询复制
必填参数
INTO @[namespace.]stage_name[/path/]指定复制文件的目标位置。
namespaceis the database or schema in which the internal or external stage resides, in the form ofdatabase_name.schema_nameorschema_name. The namespace is optional if a database and schema are currently in use within the user session; otherwise, it is required.pathis an optional, case-sensitive path in the cloud storage location that specifies a set of files to copy from the source stage or a specific location on the target stage. Your cloud storage service might call the path a prefix or a folder.
Note
- If a target or source path name includes special characters or spaces, you must enclose the
INTO ...value in single quotes. - The values for
INTO ...must be literal constants. The values cannot be SQL variables.
使用暂存区作为源
FROM @[namespace.]stage_name[/path/]Specifies the source location where the files to copy are staged. The values provided to
FROM ...follow the same specification and constraints asINTO...values.
使用查询作为源
FROM (SELECT existing_url [ , new_filename ] FROM ... )Specifies the source location and optional relative output location for the copied files. Each row that the SELECT query returns represents a file to copy.
existing_urlis a scoped URL, stage name, or stage URL.new_filenameis an optional relative path from the output stage specified for theINTOclause.
Snowflake 将文件复制到以下位置:
@[<namespace>.]<stage_name>[/<path>]<new_filename>If you don’t specify a value for
new_filename, Snowflake uses the relative path of theexisting_url.
可选参数
FILES = ( 'file_name' [ , 'file_name' ... ] )指定要复制的一个或多个用逗号分隔的文件名的列表。这些文件必须已暂存到您在命令中指定的源位置。Snowflake 会跳过任何无法找到的指定文件。
您最多可以指定 1000 个文件名。
从查询复制文件不支持此选项。请使用查询来提供文件名列表。
Note
要设置外部暂存区的文件路径,Snowflake 会将暂存区定义中的 URL 添加到列表中的每个文件名前面。
However, Snowflake does not insert a separator between the path and file name. You must explicitly include a separator (
/) at the end of the URL in the stage definition or at the beginning of each file name in theFILESlist.PATTERN = 'regex_pattern'Specifies a regular expression pattern for filtering the list of files to copy. This command applies the regular expression to the entire storage location in the
FROMclause.从查询复制文件不支持此选项。相反,使用查询来匹配模式。
Tip
为了获得最佳性能,请避免对大量文件进行筛选的模式。
DETAILED_OUTPUT = { TRUE | FALSE }指定命令输出是应汇总复制操作的结果,还是列出复制的每个文件。
- Values:
- If
TRUE, the output includes a row for each file copied to the target location. A single column namedfilecontains the target path (if applicable) and file name for each copied file. - If
FALSE, the output is a single row with the number of files that were copied.
- If
- Default:
TRUE
访问控制要求
A role used to execute this SQL command must have the following privileges (depending on the source and target locations) at a minimum:
| 权限 | 对象 | 备注 |
|---|---|---|
| USAGE | 外部暂存区 | 在源或目标外部暂存区是必填的。 |
| READ | 内部命名暂存区 | 在源内部暂存区是必填的。 |
| WRITE | 内部命名暂存区 | 在目标内部暂存区是必填的。 |
Operating on an object in a schema requires at least one privilege on the parent database and at least one privilege on the parent schema.
For instructions on creating a custom role with a specified set of privileges, see Creating custom roles.
For general information about roles and privilege grants for performing SQL actions on securable objects, see Overview of Access Control.
使用说明
-
此命令不支持以下操作:
- 将文件复制到 表 暂存区或从中复制文件。
- 使用暂存区作为源时,将文件复制到 用户 暂存区或从中复制文件。
- Copying data in archival cloud storage classes that requires restoration before it can be retrieved. Archival storage classes include Amazon S3 Glacier Flexible Retrieval, Glacier Deep Archive, or Microsoft Azure Archive Storage.
- 复制大于 5GB 的文件。
-
运行此命令的注意事项:
-
COPY FILES statements overwrite any existing files with matching names in the target location. The command does not remove any existing files that don’t match the names of the copied files.
-
如果文件复制操作失败,Snowflake 不会执行任何自动清理。
-
Copying files from Google Cloud Storage: A COPY FILES statement might fail if the object list for an external stage includes one or more directory blobs. A directory blob is a path that ends in a forward slash character (
/). In the following example output forLIST @<stage>,my_gcs_stage/load/is a directory blob.
-
使用 Google Cloud Console 创建目录时,Google 会创建目录 Blob。
To avoid this issue and specify which files to copy, use the PATTERN option (for copy from stage) or FROM (for copy from query).
For an example, see 使用模式匹配复制文件.
- Snowflake uses multipart uploads when uploading to Amazon S3 and Google Cloud Storage. To prevent incomplete uploads from accumulating, we recommend that you set a lifecycle rule. For instructions, see the Amazon S3 (https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpu-abort-incomplete-mpu-lifecycle-config.html) or Google Cloud Storage (https://cloud.google.com/storage/docs/lifecycle#abort-mpu) documentation.
-
COPY FILES 命令会产生数据传输和计算成本:
-
Data transfer: Cloud providers might charge for data transferred out of their own network. To recover these expenses, Snowflake charges a per-byte fee when you copy files from an internal Snowflake stage into an external stage in a different region or with a different cloud provider. Snowflake does not charge for data ingress (for example, when copying files from an external stage into an internal stage).
For more information about data transfer billing, see Understanding data transfer cost.
-
Compute: COPY FILES is a serverless feature and doesn’t require a virtual warehouse. The line item for the COPY FILES command on your Snowflake bill does not include any cloud services charges.
For more information about compute resource billing, see Understanding compute cost.
Note
Some Snowflake features, such as Native Apps and worksheets, incur COPY FILES charges. As a result, you might see COPY FILES charges even if you haven’t executed the COPY FILES command. For more information about these charges, contact Snowflake Support.
-
-
Snowflake 不维护此命令的文件复制历史记录。
示例
复制文件
Copy all of the files from an existing source stage (src_stage) to an existing target stage (trg_stage):
Note
要从具有受保护存储位置的外部暂存区复制文件或将文件复制到此外部暂存区,请确保暂存区定义包含访问云存储位置的凭据。
Specify the names of files to copy from an existing source stage (src_stage) to an existing target stage (trg_stage):
Copy files from a specific path on an existing stage (src_stage/src_path/)
to a specific path on an existing target stage (trg_stage/trg_path/):
使用模式匹配复制文件
Use pattern matching to load only compressed CSV files in any path on an existing source stage (src_stage) to an existing target stage (trg_stage):
The .* component represents zero or more occurrences of any character.
The square brackets escape the period character (.) that precedes a file extension.
Copy only uncompressed CSV files whose names include the string sales:
使用查询复制文件
复制单个文件
文件名与源暂存区中的文件名相同。
复制并重命名单个文件
复制表中的所有文件
要使用查询复制多个文件,可以使用通用查询。
只复制部分文件
此示例使用过滤器来复制与模式匹配的文件。
从目录表复制文件
Copy files with detailed output¶
-
To produce command output with a list of files that are copied to the target location, use
DETAILED_OUTPUT = TRUE.The output has a single column named
filethat contains the target path, if applicable, and the file name for each copied file.An example output:
-
To produce command output that summarizes the results of the copy operation, use
DETAILED_OUTPUT = FALSE.The output is a single row with the number of files that were copied.
An example output: