About Openflow Connector for Kinesis

Note

The connector is subject to the Connector Terms.

This topic describes the basic concepts of Openflow Connector for Kinesis, its workflow, and limitations.

You can use Amazon Kinesis Data Streams (https://docs.aws.amazon.com/streams/latest/dev/introduction.html) to collect and process large streams of data records in real time. Producers continually push data to Kinesis Data Streams, and consumers process the data in real time.

A Kinesis data stream is a set of shards (https://docs.aws.amazon.com/streams/latest/dev/key-concepts.html#shard). Each shard has a sequence of data records. A data record is the unit of data stored in a Kinesis data stream. Data records are composed of a sequence number, a partition key, and a data blob, which is an immutable sequence of bytes.

The Openflow Connector for Kinesis reads data from a Kinesis data stream and writes it to a Snowflake table using Snowpipe Streaming.

Workflow

  1. An AWS administrator performs the following tasks:

    1. Creates credentials for the connector to connect with Kinesis Stream and the associated DynamoDB.

    2. Sets up IAM policies that have the permissions listed in IAM permissions required for KCL consumer applications (https://docs.aws.amazon.com/streams/latest/dev/kcl-iam-permissions.html).

    3. Record the stream name and application name and give them to your Snowflake account administrator. These are required when setting up the connector in Runtime.

  2. A Snowflake account administrator performs the following tasks:

    1. Downloads and imports the connector definition file into the Snowflake Openflow canvas.

    2. Configures the connector as follows:

      1. Provides the AWS and Snowflake credentials and settings.

      2. Provides the Kinesis stream name.

      3. Sets the database and schema names in the Snowflake account.

      4. Customizes other parameters.

    3. Runs the connector in the Openflow canvas. Upon execution, the connector performs the following actions:

      1. Creates DynamoDB tables for storing Kinesis Stream checkpoints.

      2. Extracts stream data.

      3. Creates the configured destination table in the Snowflake database if at least one record was received from the stream.

      4. Loads the processed data into the specified Snowflake table.

  3. Business users can perform operations on the data downloaded from Kinesis into the destination table.

Limitations

  • Only a single stream is supported.

  • Enhanced fan-out mode is not supported.

  • If the Stream To Table Map parameter is not set, then:

    • Table names must precisely match the stream of the data they hold.

    • Table names must be in uppercase format.

  • If the Stream To Table Map parameter is set, then the table names must match the table names specified in the mapping. The table names must be a valid Snowflake unquoted identifier. For information about valid table names, see Identifier requirements.

  • The tables created need to reflect the message model. If a message contains an extra field that is missing in a table, then the connector fails.

  • Only JSON and AVRO message formats are supported.

  • Only Confluent Schema Registry is supported.

  • Only Amazon IAM authentication is supported.

  • In the case of data insertion failure into a table, the connector attempts to connect three times before routing the data to failure output.

Next steps

Set up the Openflow Connector for Kinesis

Language: English