Snowpipe

Snowpipe enables loading data from files as soon as they’re available in a stage. This means you can load data from files in micro-batches, making it available to users within minutes, rather than manually executing COPY statements on a schedule to load larger batches.

How does Snowpipe work?

Snowpipe loads data from files as soon as they are available in a stage. The data is loaded according to the COPY statement defined in a referenced pipe.

A pipe is a named, first-class Snowflake object that contains a COPY statement used by Snowpipe. The COPY statement identifies the source location of the data files (i.e., a stage) and a target table. All data types are supported, including semi-structured data types such as JSON and Avro.

Different mechanisms for detecting the staged files are available:

  • Automating Snowpipe using cloud messaging

    Automated data loads leverage event notifications for cloud storage to inform Snowpipe of the arrival of new data files to load. Snowpipe polls the event notifications from a queue. By using the metadata in the queue, Snowpipe loads the new data files into the target table in a continuous, serverless fashion based on the parameters defined in a specified pipe object.

  • Calling Snowpipe REST endpoints

    Your client application calls a public REST endpoint with the name of a pipe object and a list of data filenames. If new data files matching the list are discovered in the stage referenced by the pipe object, they are queued for loading. Snowflake-provided compute resources load data from the queue into a Snowflake table based on parameters defined in the pipe.

Supported Cloud Storage services

The following table indicates the cloud storage service support for automated Snowpipe and Snowpipe REST API calls from Snowflake accounts hosted on each cloud platform:

Snowflake Account Host

Amazon S3

Google Cloud Storage

Microsoft Azure Blob storage

Microsoft Data Lake Storage Gen2

Microsoft Azure General-purpose v2

Amazon Web Services

Google Cloud

Microsoft Azure

For more information, see Automating continuous data loading using cloud messaging and Overview of the Snowpipe REST endpoints to load data.

Note that the government regions of the cloud providers do not allow event notifications to be sent to or from other commercial regions. For more information, see AWS GovCloud (US) (https://docs.aws.amazon.com/govcloud-us/latest/UserGuide/govcloud-s3.html) and Azure Government (https://learn.microsoft.com/en-us/azure/azure-government/).

Important

Snowflake recommends that you enable cloud event filtering for Snowpipe to reduce costs, event noise, and latency. For more information about configuring event filtering for each cloud provider, see the following pages:

  • Configuring event notifications using object key name filtering - Amazon S3 (https://docs.aws.amazon.com/AmazonS3/latest/userguide/notification-how-to-filtering.html)

  • Understand event filtering for Event Grid subscriptions - Azure (https://docs.microsoft.com/en-us/azure/event-grid/event-filtering)

  • Filtering messages - Google Pub/Sub (https://cloud.google.com/pubsub/docs/filtering)

How is Snowpipe different from bulk data loading?

This section briefly describes the primary differences between Snowpipe and a bulk data load workflow using the COPY command. Additional details are provided throughout the Snowpipe documentation.

Authentication

Bulk data load:

Relies on the security options supported by the client for authenticating and initiating a user session.

Snowpipe:

When calling the REST endpoints: Requires key pair authentication with JSON Web Token (JWT). JWTs are signed using a public/private key pair with RSA encryption.

Load history

Bulk data load:

Stored in the metadata of the target table for 64 days. Available upon completion of the COPY statement as the statement output.

Snowpipe:

Stored in the metadata of the pipe for 14 days. Must be requested from Snowflake via a REST endpoint, SQL table function, or ACCOUNT_USAGE view.

Important

To avoid reloading files (and duplicating data), we recommend loading data from a specific set of files using either bulk data loading or Snowpipe but not both.

Transactions

Bulk data load:

Loads are always performed in a single transaction. Data is inserted into table alongside any other SQL statements submitted manually by users.

Snowpipe:

Loads are combined or split into a single or multiple transactions based on the number and size of the rows in each data file. Rows of partially loaded files (based on the ON_ERROR copy option setting) can also be combined or split into one or more transactions.

Compute resources

Bulk data load:

Requires a user-specified warehouse to execute COPY statements.

Snowpipe:

Uses Snowflake-supplied compute resources.

Cost

Bulk data load:

Billed for the amount of time each virtual warehouse is active.

Snowpipe:

Billed according to the compute resources used in the Snowpipe warehouse while loading the files.

Load order of data files

For each pipe object, Snowflake establishes a single queue to sequence data files awaiting loading. As new data files are discovered in a stage, Snowpipe appends them to the queue. However, multiple processes pull files from the queue; and so, while Snowpipe generally loads older files first, there is no guarantee that files are loaded in the same order they are staged.

Data duplication

Snowpipe uses file loading metadata associated with each pipe object to prevent reloading the same files (and duplicating data) in a table. This metadata stores the path (i.e. prefix) and name of each loaded file, and prevents loading files with the same name even if they were later modified (i.e. have a different eTag).

Estimating Snowpipe latency

Given the number of factors that can differentiate Snowpipe loads, it is very difficult for Snowflake to estimate latency. File formats and sizes, and the complexity of COPY statements (including SELECT statement used for transformations), all impact the amount of time required for a Snowpipe load.

We suggest that you experiment by performing a typical set of loads to estimate average latency.

Pipe security

Access control privileges

Creating pipes

Creating and managing pipes requires a role with a minimum of the following privileges:

Object

Privilege

Notes

Database

USAGE

Schema

USAGE, CREATE PIPE

Stage in the pipe definition

USAGE

External stages only.

Stage in the pipe definition

READ

Internal stages only.

Table in the pipe definition

SELECT, INSERT

Owning pipes

After a pipe is created, the pipe owner (i.e. the role that has the OWNERSHIP privilege on the pipe) must have the following privileges:

Object

Privilege

Notes

Database

USAGE

Schema

USAGE

Pipe

OWNERSHIP

Stage in the pipe definition

USAGE

External stages only.

Stage in the pipe definition

READ

Internal stages only.

Table in the pipe definition

SELECT, INSERT

Pausing or resuming pipes

In addition to the pipe owner, a role that has the following minimum permissions can pause or resume the pipe:

Object

Privilege

Notes

Database

USAGE

Schema

USAGE

Pipe

OPERATE

Stage in the pipe definition

USAGE

External stages only.

Stage in the pipe definition

READ

Internal stages only.

Table in the pipe definition

SELECT, INSERT

Snowpipe DDL

To support creating and managing pipes, Snowflake provides the following set of special DDL commands:

In addition, providers can view, grant, or revoke access to the necessary database objects for Snowpipe using the following standard access control DDL:

Language: English