About Openflow Connector for PostgreSQL

Note

This connector is subject to the Snowflake Connector Terms.

This topic describes the basic concepts of Openflow Connector for PostgreSQL, its workflow, and limitations.

About the Openflow Connector for PostgreSQL

The Openflow Connector for PostgreSQL connects a PostgreSQL database instance to Snowflake and replicates data from selected tables in near real-time or on schedule. The connector also creates a log of all data changes, available along the current state of the replicated tables.

Use cases

Use this connector if you’re looking to do the following:

  • CDC replication of PostgreSQL data with Snowflake for comprehensive, centralized reporting.

Supported PostgreSQL versions

The following are the supported PostgreSQL versions.

Supported PostgreSQL versions

11

12

13

14

15

16

17

18

Standard (https://www.postgresql.org/)

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

AWS RDS (https://docs.aws.amazon.com/AmazonRDS/latest/PostgreSQLReleaseNotes/Welcome.html)

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Amazon Aurora (https://docs.aws.amazon.com/AmazonRDS/latest/AuroraPostgreSQLReleaseNotes/Welcome.html)

Yes

Yes

Yes

Yes

Yes

Yes

Yes

GCP Cloud SQL (https://cloud.google.com/sql/docs/postgres/)

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Azure Database (https://learn.microsoft.com/en-us/azure/postgresql/)

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Openflow requirements

  • The runtime size must be at least Medium. Use a bigger runtime when replicating large data volumes, especially when row sizes are large.

  • The connector does not support multi-node Openflow runtimes. Configure the runtime for this connector with Min nodes and Max nodes set to 1.

Limitations

  • The connector supports PostgreSQL version 11 or later.

  • The connector supports only username and password authentication with PostgreSQL.

  • The connector does not replicate tables with data that exceeds Snowflake’s type limitations. An exception to this rule is date & time data type columns that contain out-of-range values. For more information, see Out of range value support.

  • The connector requires every replicated table to have a primary key, and that the replica identity of the table is the same as the primary key.

  • The connector supports source table schema changes with the exception of changing primary key definitions, changing the precision, or the scale of a numeric column.

  • The connector does not support re-adding a column after it is dropped.

Note

Limitations affecting certain table columns can be bypassed by excluding these specific columns from replication.

Workflow

  1. A Database administrator configures PostgreSQL replication settings, creates a publication, and credentials for the connector. Optionally, they deliver the SSL certificate.

  2. A Snowflake account administrator performs the following tasks:

    1. Creates a service user for the connector, a warehouse for the connector, and a destination database to replicate into.

    2. Installs the connector.

    3. Specifies the required parameters for the flow template.

    4. Runs the flow. The connector performs the following tasks when run in Openflow:

      1. Creates a schema for journal tables.

      2. Creates the schemas and destination tables matching the source tables configured for replication.

      3. Starts replication following the table replication lifecycle.

How the connector works

The following sections describe how the connector works in various scenarios, including replication, changes in schema, and data retention.

How tables are replicated

  1. Schema introspection: The connector discovers the columns in the source table, their names, types, then validates them against Snowflake’s and the connector’s limitations. Validation failures cause this stage to fail, and the cycle completes. After successful completion of Schema Introspection, the connector creates an empty destination table.

  2. Snapshot load: The connector copies all data available in the source table into the destination table. Failure of this stage finishes the cycle, and no more data is replicated. After successful completion, the whole set of data from the source table is available in the destination table.

  3. Incremental load: The connector keeps tracking changes in the source table, and copying them into the destination table. This continues until the table is removed from replication. Failure at this stage permanently stops replication of the source table, until the issue is removed.

    Note

    This connector can be configured to immediately start replicating incremental changes for newly added tables, bypassing the snapshot load phase. This option is often useful when reinstalling the connector in an account where previously replicated data exists and you want to continue replication without having to re-snapshot tables.

    For details on the bypassing snapshot load and using the incremental load process, see Incremental replication.

Important

Interim failures, such as connection errors, do not prevent tables from being replicated. Permanent failures, such as unsupported data types, do prevent tables from being replicated. If a permanent failure prevents a table from being replicated, remove the table from the list of replicated tables. After you address the problem that caused the failure, you can add the table back to the list of replicated tables.

TOASTed value support

The connector supports replicating tables with TOAST values (https://www.postgresql.org/docs/current/storage-toast.html) for columns of types: array, bytea, json, jsonb, text, varchar, xml.

Whenever the connector encounters a TOASTed value in the CDC stream, it substitutes a default placeholder of __previous_value_unchanged, formatted for the given column type, and stores it in the journal table. The MERGE query then accounts for placeholder values, so that the destination table always contains the last non-TOASTed value.

Out of range value support

The connector supports replicating tables with columns of types date, timestamp, and timestamptz that contain out-of-range values. If the connector encounters an out-of-range value in the CDC stream, it substitutes a default placeholder based on the type of the column.

Placeholder values for out-of-range values

Column type

Placeholder value

date

-9999-01-01 through 9999-12-31.

timestamp

0001-01-01 00:00:00 through 9999-12-31 23:59:59.999999999.

timestamptz

0001-01-01 00:00:00+00 through 9999-12-31 23:59:59.999999999+00.

Note

-Infinity and Infinity values are also replaced with the respective placeholders for all three types.

Table replication status

Interim failures, such as connection errors, do not prevent table replication. However, permanent failures, such as unsupported data types, prevent table replication.

To troubleshoot replication issues or verify that a table has been successfully removed from the replication flow, check the Table State Store:

  1. In the Openflow runtime canvas, right-click a processor group and choose Controller Services. A table listing controller services displays.

  2. Locate the row labeled Table State Store, click the More Three vertical dots indicating more options button on the right side of the row, and then choose View State.

A list of tables and their current states displays. Type in the search box to filter the list by table name. The possible states are:

  • NEW: The table is scheduled for replication but replication hasn’t started.

  • SNAPSHOT_REPLICATION: The connector is copying existing data. This status displays until all records are stored in the destination table.

  • INCREMENTAL_REPLICATION: The connector is actively replicating changes. This status displays after snapshot replication ends and continues to display indefinitely until a table is either removed from replication or replication fails.

  • FAILED: Replication has permanently stopped due to an error.

Note

The Openflow runtime canvas doesn’t display table status changes — only the current table status. However, table status changes are recorded in logs when they occur. Look for the following log message:

Replication state for table <database_name>.<schema_name>.<table_name> changed from <old_state> to <new_state>
Copy

If a permanent failure prevents table replication, remove the table from replication. After you address the problem that caused the failure, you can add the table back to replication. For more information, see Restart table replication.

Understanding data retention

The connector follows a data retention philosophy where customer data is never automatically deleted. You maintain full ownership and control over your replicated data, and the connector preserves historical information rather than permanently removing it.

This approach has the following implications:

  • Rows deleted from the source table are soft-deleted in the destination table rather than physically removed.

  • Columns dropped from the source table are renamed in the destination table rather than dropped.

  • Journal tables are retained indefinitely and are not automatically cleaned up.

Destination table metadata columns

Each destination table includes the following metadata columns that track replication information:

Column name

Type

Description

_SNOWFLAKE_INSERTED_AT

TIMESTAMP_NTZ

The timestamp when the row was originally inserted into the destination table.

_SNOWFLAKE_UPDATED_AT

TIMESTAMP_NTZ

The timestamp when the row was last updated in the destination table.

_SNOWFLAKE_DELETED

BOOLEAN

Indicates whether the row was deleted from the source table. When true, the row has been soft-deleted and no longer exists in the source.

Soft-deleted rows

When a row is deleted from the source table, the connector does not physically remove it from the destination table. Instead, the row is marked as deleted by setting the _SNOWFLAKE_DELETED metadata column to true.

This approach allows you to:

  • Retain historical data for auditing or compliance purposes.

  • Query deleted records when needed.

  • Decide when and how to permanently remove data based on your requirements.

To query only active (non-deleted) rows, filter on the _SNOWFLAKE_DELETED column:

SELECT * FROM my_table WHERE _SNOWFLAKE_DELETED = FALSE;
Copy

To query deleted rows:

SELECT * FROM my_table WHERE _SNOWFLAKE_DELETED = TRUE;
Copy

Dropped columns

When a column is dropped from the source table, the connector does not drop the corresponding column from the destination table. Instead, the column is renamed by appending the __SNOWFLAKE_DELETED suffix to preserve historical values.

For example, if a column named EMAIL is dropped from the source table, it is renamed to EMAIL__SNOWFLAKE_DELETED in the destination table. Rows that existed before the column was dropped retain their original values, while rows added after the drop have NULL in this column.

You can still query historical values from the renamed column:

SELECT EMAIL__SNOWFLAKE_DELETED FROM my_table;
Copy

Renamed columns

Due to limitations in CDC (Change Data Capture) mechanisms, the connector cannot distinguish between a column being renamed and a column being dropped followed by a new column being added. As a result, when you rename a column in the source table, the connector treats this as two separate operations: dropping the original column and adding a new column with the new name.

For example, if you rename a column from A to B in the source table, the destination table will contain:

  • A__SNOWFLAKE_DELETED: Contains values from before the rename. Rows added after the rename have NULL in this column.

  • B: Contains values from after the rename. Rows that existed before the rename have NULL in this column.

Querying renamed columns

To retrieve data from both the original and renamed columns as a single unified column, use a COALESCE or CASE expression:

SELECT
    COALESCE(B, A__SNOWFLAKE_DELETED) AS A_RENAMED_TO_B
FROM my_table;
Copy

Alternatively, using a CASE expression:

SELECT
    CASE
        WHEN B IS NOT NULL THEN B
        ELSE A__SNOWFLAKE_DELETED
    END AS A_RENAMED_TO_B
FROM my_table;
Copy

Creating a view for renamed columns

Rather than manually modifying the destination table, you can create a view that presents the renamed column as a single unified column. This approach is recommended because it preserves the original data and avoids potential issues with ongoing replication.

CREATE VIEW my_table_unified AS
SELECT
    *,
    COALESCE(B, A__SNOWFLAKE_DELETED) AS A_RENAMED_TO_B
FROM my_table;
Copy

Important

Manually modifying the destination table structure (such as dropping or renaming columns) is not recommended, as it may interfere with ongoing replication and cause data inconsistencies.

Journal tables

During incremental replication, changes from the source database are first written to journal tables before being merged into the destination tables. The connector does not automatically remove data from journal tables, as this data may be useful for auditing, debugging, or reprocessing purposes.

Journal tables are created in the same schema as their corresponding destination tables and follow this naming convention:

<TABLE_NAME>_JOURNAL_<timestamp>_<number>

Where:

  • <TABLE_NAME> is the name of the destination table.

  • <timestamp> is the creation timestamp in Unix epoch format (seconds since January 1, 1970), ensuring uniqueness.

  • <number> starts at 1 and increments whenever the destination table schema changes, either due to schema changes in the source table or modifications to column filters.

For example, if your destination table is SALES.ORDERS, the journal table might be named SALES.ORDERS_JOURNAL_1705320000_1.

Important

Do not drop journal tables while replication is in progress. Removing an active journal table may cause data loss or replication failures. Only drop journal tables after the corresponding source table has been fully removed from replication.

Managing journal table storage

If you need to manage storage costs by removing old journal data, you can create a Snowflake task that periodically cleans up journal tables for tables that are no longer being replicated.

Before implementing journal cleanup, verify that:

  • The corresponding source tables have been fully removed from replication.

  • You no longer need the journal data for auditing or processing purposes.

For information on creating and managing tasks for automated cleanup, see Introduction to tasks.

Next steps

Review Openflow Connector for PostgreSQL: Data mapping to understand how the connector maps data types to Snowflake data types. Review Set up the Openflow Connector for PostgreSQL to set up the connector.