Set up the Openflow Connector for Google Sheets¶
Note
The connector is subject to the Connector Terms.
This topic describes the steps to set up the Openflow Connector for Google Sheets.
Prerequisites¶
Ensure that you have reviewed About Openflow Connector for Google Sheets.
Ensure that you have set up a Openflow.
Get the credentials¶
As a Google Cloud administrator, perform the following tasks:
Ensure that you have the following:
A Google user with Super Admin permissions (https://support.google.com/a/answer/2405986?hl)
A Google Cloud Project (https://developers.google.com/workspace/guides/create-project) with the following roles:
Organization Policy Administrator (https://cloud.google.com/iam/docs/understanding-roles#orgpolicy.policyAdmin)
Organization Administrator (https://cloud.google.com/iam/docs/understanding-roles#resourcemanager.organizationAdmin)
Enable service account key creation. Google disables service account key creation by default. This key creation policy must be turned off for Snowflake Openflow to use the service account JSON. To enable service account key creation, perform the following tasks:
Log in to the Google Cloud Console (https://console.cloud.google.com/) with a super admin account that has the Organizational Policy Admin role.
Ensure that you are in the project associated with your organization, not the project in your organization.
Select Organization Policies.
Select the Disable service account key creation policy.
Select Manage Policy and turn off enforcement.
Select Set Policy.
Create a service account and key (https://developers.google.com/workspace/guides/create-credentials#service-account).
Set up Snowflake account¶
As a Snowflake account administrator, perform the following tasks:
Create a new role or use an existing role and grant the Database privileges.
Create a new Snowflake service user with the type as SERVICE.
Grant the Snowflake service user the role you created in the previous steps.
Configure with key-pair auth for the Snowflake SERVICE user from step 2.
Snowflake strongly recommends this step. Configure a secrets manager supported by Openflow, for example, AWS, Azure, and Hashicorp, and store the public and private keys in the secret store.
Note
If for any reason, you do not wish to use a secrets manager, then you are responsible for safeguarding the public key and private key files used for key-pair authentication according to the security policies of your organization.
Once the secrets manager is configured, determine how you will authenticate to it. On AWS, it’s recommended that you the EC2 instance role associated with Openflow as this way no other secrets have to be persisted.
In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right. Navigate to Controller Settings » Parameter Provider and then fetch your parameter values.
At this point all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow.
If any other Snowflake users require access to the raw ingested documents and tables ingested by the connector (for example, for custom processing in Snowflake), then grant those users the role created in step 1.
Designate a warehouse for the connector to use. Start with the smallest warehouse size, then experiment with size depending on the number of tables being replicated, and the amount of data transferred. Large table numbers typically scale better with multi-cluster warehouses, rather than larger warehouse sizes.
Configure the connector¶
As a data engineer, perform the following tasks to configure a connector:
Create a database and schema in Snowflake for the connector to store ingested data.
Download the
connector definition file
.Import the connector definition into Openflow:
Open the Snowflake Openflow canvas.
Add a process group. To do this, drag and drop the Process Group icon from the tool palette at the top of the page onto the canvas. Once you release your pointer, a Create Process Group dialog appears.
On the Create Process Group dialog, select the connector definition file to import.
Right-click on the imported process group and select Parameters.
Populate the required parameter values as described in Flow parameters.
Flow parameters¶
The configuration of the connector definition is divided into three parameter contexts:
Config Snowflake connection: Used to connect to Snowflake
Config Google connection: Used to connect to Google Sheets
Flow Google Sheets to Snowflake: Contains all parameters from both configurations and additional parameters specific to a given process group
Note
The Flow Google Sheets to Snowflake parameter context contains spreadsheet-specific details, so you must create new parameter contexts for each new spreadsheet and process group.
To create a new parameter context, go to the Openflow Canvas menu, select Parameter Contexts and add a new parameter context. It inherits parameters from both the Config Snowflake connection and Config Google connection parameter contexts.
The following tables decribe the flow parameters that you can configure based on the parameter contexts:
Config Snowflake connection¶
Parameter |
Description |
---|---|
Snowflake Account |
The Snowflake account, in the [organization-name]-[account-name] format, where data retrieved from the Google Sheets API is stored. |
Snowflake User |
The Snowflake user with a role specified in the Snowflake Role parameter. |
Snowflake Private Key |
The RSA private key used for authentication. The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either Snowflake Private Key File or Snowflake Private Key must be defined |
Snowflake Private Key File |
The file that contains the RSA private key used for authentication to Snowflake, which is formatted according to PKCS8 standards and has standard PEM headers and footers. The header line starts with |
Snowflake Private Key Password |
The password associated with the Snowflake Private Key File |
Snowflake Role |
The Snowflake role with USAGE and CREATE TABLE privileges on the destination database and schema. |
Snowflake Warehouse |
The Snowflake warehouse used to load data into tables. |
Config Google connection¶
Parameter |
Description |
---|---|
Service Account JSON |
Contents of the file containing Service Account credentials, such as client_id, client_email, and private_key. Copy the entire contents of the file. |
Flow Google Sheets to Snowflake¶
The following table lists only those parameters that are not inherited from other parameter contexts.
Parameter |
Description |
---|---|
Date Time Render Option |
Determines how dates should be rendered in the output. You can select one of these options: |
Destination Database |
The destination database in which the destination table is created. |
Destination Schema |
The destination schema in which the destination table is created. |
Destination Table Prefix |
The destination table prefix is where report data pulled from Google Sheets is stored. The connector creates one destination table for each range. The first row in a sheet represents the column names in the destination table. |
Ranges |
The list of ranges to retrieve from the spreadsheet. If no range is specified, all sheets in the specified spreadsheet will be downloaded.
Provide each range in either A1 or R1C1 notation (https://developers.google.com/sheets/api/guides/concepts#cell), separated by a comma. For example: |
Run Schedule |
Run schedule on which data is retrieved from Google Sheets and saved in Snowflake.
By default, the timer-driven scheduling strategy is used and here the user specifies an interval, for example, |
Spreadsheet ID |
The unique identifier (https://developers.google.com/sheets/api/guides/concepts) for a spreadsheet. You can find it in the URL of the spreadsheet. |
Value Render Option |
Determines how values should be rendered in the output.
You can select one of these options: |
Run the flow¶
Right-click on the plane and select Enable all Controller Services.
Right-click on the imported process group and select Start. The connector starts the data ingestion.