Set up the Openflow Connector for Excel¶
Note
The connector is subject to the Connector Terms.
This topic describes the steps to set up the Openflow Connector for Excel.
Prerequisites¶
Ensure that you have reviewed About Openflow Connector for Excel.
Ensure that you have set up Openflow.
Get the credentials¶
As an AWS administrator, perform the following tasks:
Log in to your AWS IAM console.
Select the number under Users, then select Create user.
Specify the user name, group, and additional permissions if needed. The user must have at least
s3:GetObject
access to objects read by the connector from the S3 bucket.After the user is created, in the user’s view, navigate to Security Credentials » Access Keys.
Select Create access key. The new access key must grant access only to specific resources. For better security and access control, Snowflake recommends to only allow access to specific S3 buckets.
Take note of the Access Key and Secret Access Key.
Set up Snowflake account¶
As a Snowflake account administrator, perform the following tasks:
Create a new role or use an existing role and grant the Database privileges.
Create a new Snowflake service user with the type as SERVICE.
Grant the Snowflake service user the role you created in the previous steps.
Configure with key-pair auth for the Snowflake SERVICE user from step 2.
Strongly recommended Configure a secrets manager supported by Openflow, for example, AWS, Azure, and Hashicorp, and store the public and private keys in the secret store.
Note
If you do not want to use a secrets manager, you are responsible for safeguarding the public key and private key files used for key-pair authentication according to the security policies of your organization.
Once the secrets manager is configured, determine how you will authenticate to it. On AWS, it’s recommended that you use the EC2 instance role associated with Openflow so that no other secrets have to be persisted.
In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right. Navigate to Controller Settings » Parameter Provider and fetch your parameter values.
At this point all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow.
If any other Snowflake users require access to the raw ingested documents and tables ingested by the connector (for example, for custom processing in Snowflake), then grant those users the role created in step 1.
Designate a warehouse for the connector to use. Start with the smallest warehouse size, then experiment with size depending on the number of tables being replicated, and the amount of data transferred. Large table numbers typically scale better with multi-cluster warehouses, rather than larger warehouse sizes.
Configure the connector¶
As a data engineer, perform the following tasks to configure a connector:
Create a database and schema in Snowflake for the connector to store ingested data.
Download the
connector definition file
.Import the connector definition into Openflow:
Open the Snowflake Openflow canvas.
Add a process group. To do this, drag and drop the Process Group icon from the tool palette at the top of the page onto the canvas. Once you release your pointer, a Create Process Group dialog appears.
On the Create Process Group dialog, select the connector definition file to import.
Configure the connector to fetch all secrets required by the connector, for example, private key for key-pair authentication and certificates, from the supported secrets manager.
Right-click on the imported process group and select Parameters.
Populate the required parameter values as described in Flow parameters.
Flow parameters¶
This section describes the flow parameters that you can configure based on the following parameter contexts:
Excel data ingestion parameters¶
Parameter |
Description |
Required |
---|---|---|
Ranges |
The A1 notation of the comma-separated ranges to retrieve values from. For example: Sheet1!A1:B2,Sheet2!D4:E5,Sheet3. The first row in the selected range must represent column names. If not specified, then the whole workbook will be ingested. |
No |
Schedule |
Schedule for the connector ingestion |
Yes |
Protection Type |
Protection type on the Excel file. The value can be either |
Yes |
File Password |
Password that protects the Excel file. Applicable only if the protection type is |
No |
AWS S3 source parameters¶
Parameter |
Description |
Required |
---|---|---|
AWS Access Key ID |
Access key ID for AWS user that is used to fetch the Excel file. |
Yes |
AWS Secret Access Key |
Secret access key for AWS user that is used to fetch the Excel file. |
Yes |
AWS Region |
AWS region where the S3 bucket resides. |
Yes |
S3 Bucket |
The S3 bucket from which the Excel file should be fetched. |
Yes |
S3 Object Keys |
List of comma-separated object keys within the S3 bucket that contain Excel files to fetch. Example: |
Yes |
Snowflake destination parameters¶
Parameter |
Description |
Required |
---|---|---|
Snowflake Account Identifier |
Snowflake account name formatted as [organization-name]-[account-name] where data retrieved from the Excel file will be persisted. |
Yes |
Snowflake User |
Username for a Snowflake account. |
Yes |
Snowflake Role |
Snowflake role that will be used by the connector. |
Yes |
Snowflake Warehouse |
Snowflake warehouse used to run queries when inserting data into the destination table. |
Yes |
Snowflake Key |
The private key in PEM format, which is used in key-pair authentication. |
Yes |
Snowflake Key Password |
Passphrase decrypting private key. Must be left without a value if the key is not password protected. |
No |
Destination Database |
Name of the Snowflake database where the data will be ingested. |
Yes |
Destination Schema |
Name of the Snowflake schema where tables will be created. |
Yes |
Destination Table Prefix |
The prefix of the table in the destination schema where data retrieved from Excel file will be persisted. It will be created automatically by the connector. |
Yes |
Run the flow¶
Right-click on the plane and select Enable all Controller Services.
Right-click on the imported process group and select Start. The connector starts the data ingestion.
(Optional) Reconfigure the currently running connector¶
You can reconfigure the connector parameters after the connector has already started ingesting data. If you need to change the ingested files or ranges, perform the following steps to make sure that the data is sent to Snowflake properly:
Stop the connector: Ensure that all Openflow processors are stopped.
Access configuration settings: Navigate to the connector’s configuration settings within Openflow.
Modify parameters: Adjust the parameters as required.
Start the connector: Start the connector and also ensure that all controller services have started.