Create a data flow using Openflow¶
This topic describes the process of creating a dataflow in Openflow.
Prerequisites¶
Procedure¶
Once you have your Runtime environment set up, let’s create a simple data pipeline. As an example, we’ll generate records based on a specified schema, filter those records based on a SQL query, and then send the data to Snowflake.
For a detailed description of how to build data flows, see Apache NiFi documentation.
Open the Openflow application. The large grid area that is likely empty, is called the canvas and is the home for the components that you’ll create to implement your dataflow.
Create a process group. Drag and drop the Process Group icon from the tool palette at the top of the page onto the canvas. Once you release your pointer, a Create Process Group pop-up appears.
Enter a name for your dataflow, for example, Flow Example, and click Add.
Optional: Right click on the process group that you just created and select Enter Group` on the contextual menu. Optionally, you can double click on the process group. This creates a visual abstraction away from the top-level of the canvas.
Add a processor. To add a processor, select the Processor tool and drag it onto the canvas and release your click.
The Add Processor dialog appears.
Select GenerateRecord` processor from the list and click Add.
The canvas now shows a newly added processor you.
Note
You can add multiple processors.
Add the following processors. They will be configured in later steps:
QueryRecord
PutDatabaseRecord
Configure the processors.
Double click a processor. The Edit Processor dialog appears.
Modify the following properties:
Settings
Scheduling
Properties
Relationships:
Comments
Create connections between the processors.
Hover over the first processor. A circle with an arrow inside appears in the middle of the processor.
Click on the circle with the arrow inside and drag the pointer towards the second processor. This will create a red dotted line indicating it is not ready to make a connection.
Move the sprite over the second processor.
The dotted line turns green and a green border appear around the target processor.
Release the mouse. The Create Connection pop-up window appears.
Note the From Processor and To Processor names. Select the :ui: ‘Relationships` section, check Success.
Click Add. The new connection is created.
The connection is backed by a queue of FlowFiles that houses them until the next processor is triggered and consumes them.
Add the SnowflakeConnectionService controller service to the flow.
Edit the controller service and fill the required fields.
Log in to your Snowflake account and create a database.
In the PUBLIC schema of the database, create a standard table.
create table SAMPLE_DATA (
name STRING,
country STRING
)
Run the flow on Openflow.
Query the data.