About Openflow Connector for MySQL

Note

This connector is subject to the Snowflake Connector Terms.

This topic describes the basic concepts of Openflow Connector for MySQL, its workflow, and limitations.

The Openflow Connector for MySQL connects a MySQL database instance to Snowflake and replicates data from selected tables in near real-time or on a specified schedule. The connector also creates a log of all data changes, which is available along with the current state of the replicated tables.

Use this connector if you’re looking to do the following:

  • CDC replication of MySQL tables into Snowflake for comprehensive, centralized reporting

How tables are replicated

The tables are replicated in the following stages:

  1. Schema introspection: The connector discovers the columns in the source table, including the column names and types, then validates them against Snowflake’s and the connector’s Limitations. Validation failures cause this stage to fail, and the cycle completes. After successful completion of this stage, the connector creates an empty destination table.

  2. Snapshot load: The connector copies all data available in the source table into the destination table. If this stage fails, then no more data is replicated. After successful completion, the data from the source table is available in the destination table.

  3. Incremental load: The connector tracks changes in the source table and applies those changes to the destination table. This process continues until the table is removed from replication. Failure at this stage permanently stops replication of the source table, until the issue is resolved.

    Note

    This connector can be configured to immediately start replicating incremental changes for newly added tables, bypassing the snapshot load phase. This option is often useful when reinstalling the connector in an account where previously replicated data exists and you want to continue replication without having to re-snapshot tables.

    For details on the bypassing snapshot load and using the incremental load process, see Incremental replication.

Important

Interim failures, such as connection errors, do not prevent tables from being replicated. Permanent failures, such as unsupported data types, do prevent tables from being replicated. If a permanent failure prevents a table from being replicated, remove the table from the list of replicated tables. After you address the problem that caused the failure, you can add the table back to the list of replicated tables.

Workflow

  1. A MySQL database administrator performs the following tasks:

    • Configure MySQL replication settings

    • Create credentials for the connector

    • (Optionally) Provide the SSL certificate.

  2. A Snowflake account administrator performs the following tasks:

    1. Creates a service user for the connector, a warehouse for the connector, and a destination database for the replicated data.

    2. Installs the connector.

    3. Specifies the required parameters for the flow template.

    4. Runs the flow. The connector performs the following tasks when run in Openflow:

      1. Creates a schema for journal tables.

      2. Creates the schemas and destination tables matching the source tables configured for replication.

      3. Starts replicating the tables. For details on the replication process, see How tables are replicated.

Check table replication status

Interim failures, such as connection errors, do not prevent table replication. However, permanent failures, such as unsupported data types, prevent table replication.

To troubleshoot replication issues or verify that a table has been successfully removed from the replication flow, check the Table State Store:

  1. In the Openflow runtime canvas, right-click a processor group and choose Controller Services. A table listing controller services displays.

  2. Locate the row labeled Table State Store, click the More Three vertical dots indicating more options button on the right side of the row, and then choose View State.

A list of tables and their current states displays. Type in the search box to filter the list by table name. The possible states are:

  • NEW: The table is scheduled for replication but replication hasn’t started.

  • SNAPSHOT_REPLICATION: The connector is copying existing data. This status displays until all records are stored in the destination table.

  • INCREMENTAL_REPLICATION: The connector is actively replicating changes. This status displays after snapshot replication ends and continues to display indefinitely until a table is either removed from replication or replication fails.

  • FAILED: Replication has permanently stopped due to an error.

Note

The Openflow runtime canvas doesn’t display table status changes — only the current table status. However, table status changes are recorded in logs when they occur. Look for the following log message:

Replication state for table <database_name>.<schema_name>.<table_name> changed from <old_state> to <new_state>
Copy

If a permanent failure prevents table replication, remove the table from replication. After you address the problem that caused the failure, you can add the table back to replication. For more information, see Restart table replication.

Supported MySQL versions

The following table lists the tested and officially supported MySQL versions.

8.0

8.4

Standard (https://www.mysql.com/)

Yes

Yes

AWS RDS (https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_MySQL.html)

Yes

Amazon Aurora (https://docs.aws.amazon.com/AmazonRDS/latest/AuroraMySQLReleaseNotes/Welcome.html)

Yes, as Version 3

GCP Cloud SQL (https://cloud.google.com/sql/mysql?hl=en)

Yes

Yes

Azure Database (https://azure.microsoft.com/en-us/products/mysql/)

Yes

Yes

Openflow requirements

  • The runtime size must be at least Medium. Use a bigger runtime when replicating large data volumes, especially when row sizes are large.

  • The connector does not support multi-node Openflow runtimes. Configure the runtime for this connector with Min nodes and Max nodes set to 1.

Limitations

  • The connector supports MySQL version 8 or later.

  • The connector supports only username and password authentication with MySQL.

  • Only database tables containing primary keys can be replicated.

  • The connector does not replicate tables with data that exceeds Snowflake’s type limitations.

  • The connector does not replicate columns of types GEOMETRY, GEOMETRYCOLLECTION, LINESTRING, MULTILINESTRING, MULTIPOINT, MULTIPOLYGON, POINT, and POLYGON.

  • The connector has the Group Replication Limitations of MySQL (https://dev.mysql.com/doc/refman/8.4/en/group-replication-limitations.html#group-replication-limitations-transaction-size). This means that a single transaction must fit into a binary log message of size no more than 4 GB.

  • The connector does not support replicating tables from a reader instance in Amazon Aurora as Aurora reader instances do not maintain their own binary logs.

  • The connector supports source table schema changes with the exception of changing primary key definitions and changing the precision or the scale of a numeric column.

  • The connector does not support re-adding a column after it is dropped.

  • For DATE and DATETIME types, any values that contain a zero month or day are mapped to the Unix epoch (‘1970-01-01’ or ‘1970-01-01T00:00’). Date zero (‘0000-00-00’) is also mapped to the Unix epoch. Values with a zero year are converted to year one, for example, ‘0000-05-30 7:59:59’ becomes ‘0001-05-30T7:59:59’). The remaining date and time components are unchanged.

  • For TIMESTAMP type, value ‘0000-00-00 00:00:00’ is mapped to the Unix EPOCH (‘1970-01-01T00:00Z’).

Note

Limitations affecting certain table columns can be bypassed by excluding these specific columns from replication.

Next steps

Set up the Openflow Connector for MySQL

Language: English