ParseTableImage 2025.3.28.13-SNAPSHOT

BUNDLE

com.snowflake.openflow.runtime | runtime-document-layout-nar

DESCRIPTION

Extracts the text from a Table image and writes it to the FlowFile content in csv format.

TAGS

document, element, image, openflow, rag, retrieval augmented generation, table, unstructured

INPUT REQUIREMENT

REQUIRED

Supports Sensitive Dynamic Properties

false

PROPERTIES

Property

Description

Communication Timeout

The amount of time to wait for a response from the microservices before timing out.

Custom Table Structure Recognition Service URL

The Custom URL of the Openflow Table Structure Recognition Service.

MIME Type

The MIME Type of the image file.

OCR Confidence Threshold

The minimum confidence level required for a text block to be included in the output. Text blocks with a confidence level below this value will be excluded.

OCR Service

An OCR Service for reading files to output text.

Service Location Strategy

Determines how Service Locations are configured within this processor for the Table Structure Recognition Service.

RELATIONSHIPS

NAME

DESCRIPTION

table.not.found

If the processor determines that an input FlowFile does not contain a table, the original FlowFile will be routed to this relationship.

failure

If a FlowFile cannot be convert into a CSV, the input FlowFile will be routed to this relationship.

success

When the table text has been successfully extracted, the CSV representation of the text will be routed to this relationship.

comms.failure

If the processor is unable to communicate with one of the necessary services, the input FlowFile will be routed to this relationship.

WRITES ATTRIBUTES

NAME

DESCRIPTION

filename

The filename of the FlowFile.

mime.type

The MIME type of the FlowFile.

table.text.json

If the processor successfully extracts the table text, or if it is determined that the FlowFile does not contain a table, this attribute will be removed.

SEE ALSO

Language: English