自动目录表元数据刷新
您可以在内部或外部暂存区自动刷新目录表的元数据。
刷新操作将元数据与存储中的最新关联文件集同步,并响应以下类型的更改:
- 路径中的新文件将添加到表元数据中。
- 路径中的文件将在表元数据中更新。
- 路径中不再存在的文件将从表元数据中移除。
内部暂存区
发生以下情况时,自动刷新内部暂存区中的目录表会将元数据与内部命名暂存区和路径中的最新关联文件集同步:
- 路径中的新文件将添加到表元数据中。
- 对路径中文件的更改将在表元数据中更新。
- 路径中不再存在的文件将从表元数据中移除。
创建启用了目录表的内部命名暂存区
Create an internal named stage with a directory table enabled by using the CREATE STAGE command. Snowflake reads your staged data files into the directory table metadata.
外部暂存区
您可以使用以下事件通知服务自动刷目录表的元数据:
- Amazon S3: Amazon SQS (Simple Queue Service) (https://aws.amazon.com/sqs/)
- Google Cloud Storage: Google Cloud Pub/Sub (https://cloud.google.com/storage/docs/reporting-changes)
- Microsoft Azure: Microsoft Azure Event Grid (https://azure.microsoft.com/en-us/services/event-grid/)
要设置自动刷新,请参阅文件所在云存储服务的主题:
- Refresh directory tables automatically for Amazon S3
- Refresh directory tables automatically for Google Cloud Storage
- Refresh directory tables automatically for Azure Blob Storage
跨云支持
Snowflake 支持外部暂存区的跨云、跨区域自动目录表刷新。
The following table shows the cross-cloud options that Snowflake supports for automated directory table refreshes, based on the cloud platform that hosts your Snowflake account.
| Amazon S3 | Google Cloud Storage | Microsoft Azure Blob 存储 | Microsoft Data Lake Storage Gen2 | Microsoft Azure General-purpose v2 | |
|---|---|---|---|---|---|
| AWS 上托管的账户 | ✔ | ✔ | ✔ | ✔ | ✔ |
| GCP 上托管的账户 | ✔ | ✔ | ✔ | ✔ | ✔ |
| Azure 上托管的账户 | ✔ | ✔ | ✔ | ✔ | ✔ |
注意事项
- Automated refreshes are event-based and provide better performance than manual refreshes for large or fast-growing stages.
- Automated refreshes for internal stages is currently available for accounts hosted on AWS. Snowflake doesn’t support refreshing the directory table metadata on an internal stage when your account is hosted on Google Cloud or Azure.
后续主题: