PerformSnowflakeCortexOCR 2025.3.28.13-SNAPSHOT¶
BUNDLE¶
com.snowflake.openflow.runtime | runtime-snowflake-processors-nar
DESCRIPTION¶
Performs Optical Character Recognition (OCR) on PDF documents using Snowflake Cortex ML functions. Documents must be staged in a Snowflake internal stage with server-side encryption enabled. The processor extracts text content from PDFs and can output the results either as FlowFile content or as an attribute.
INPUT REQUIREMENT¶
REQUIRED
Supports Sensitive Dynamic Properties¶
false
PROPERTIES¶
Property |
Description |
---|---|
Database |
The Snowflake database containing the stage |
Filename |
The filename of the file to perform OCR on, it must be uploaded to the stage prior to performing ocr.FlowFile attributes may be referenced via Expression Language. |
Max Attribute Size |
The maximum size of the ocr results that can written to an attribute. If the ocr results are larger than this, it will be routed to ‘failure’. |
OCR Mode |
Specifies how document text and structure should be extracted. In ‘OCR’ mode, only raw text content is extracted, ignoring formatting and table structures. In ‘LAYOUT’ mode, the output preserves table structures as markdown. |
Output Strategy |
Determines response output destination |
Results Attribute |
The name of the attribute to write the response to. |
Schema |
The Snowflake schema containing the stage |
Snowflake Connection Service |
Database Connection Service for accessing Snowflake |
Stage |
The Snowflake stage where PDFs will be temporarily stored. The stage must have server-side encryption enabled.FlowFile attributes may be referenced via Expression Language |
RELATIONSHIPS¶
NAME |
DESCRIPTION |
---|---|
failure |
FlowFiles that cannot be processed are routed to this relationship |
success |
FlowFiles that are successfully processed are routed to this relationship |
WRITES ATTRIBUTES¶
NAME |
DESCRIPTION |
---|---|
mime.type |
The MIME type of the output content (text/plain when output strategy is FLOW_FILE) |