ChunkDocument 2025.3.28.13-SNAPSHOT¶

Bundle¶

com.snowflake.openflow.runtime | runtime-document-layout-nar

Description¶

Given an input Openflow Document, chunks the data into segments that are more applicable for LLM synthesis or semantic embedding. The original document is routed to the ‘original’ relationship, while each of the chunks is routed as plaintext to the ‘chunks’ relationship.

Tags¶

chunk, document, openflow, rag, retrieval augmented generation, segment, text, unstructured

Input Requirement¶

REQUIRED

Supports Sensitive Dynamic Properties¶

false

Properties¶

Property	Description
Chunk Overlap	The number of characters to include from preceding and subsequent chunks. Note that if using a Chunking Strategy of ‘Section’, the Chunk Overlap will only take effect if multiple chunks are created for a given section due to the Max Chunk Size.
Chunking Strategy	Specifies how the document should be chunked.
Include Processing Elements	Specifies whether or not to include processing elements in the chunks.
Max Chunk Size	The maximum number of characters that should be included in each chunk.
Subsection Strategy	When a Section is found with one or more subsections, and the Section plus all subsections are small enough to fit within a single chunk, this property specifies how the subsections should be handled.

Relationships¶

Name	Description
failure	If the text of a FlowFile cannot be extracted for any reason, the input FlowFile will be routed to this relationship.
chunks	The chunks of the document are routed to this Relationship upon successful chunking.
original	The original document is routed to this Relationship upon successful chunking.

Writes attributes¶

Name	Description
container.id	The ID of the container that the chunk belongs to.
container.title	The title of the container that the chunk belongs to.
mime.type	The MIME type of the chunk will be set to text/plain
chunk.<key>	The metadata associated with the Document Container that was chunked.
fragment.index	The index of the chunk within the document.
fragment.count	The total number of chunks that were created from the document, plus 1. The +1 accounts for the Document itself, which is routed to the ‘original’ relationship and is convenient for use when merging the original Document back together with processing analysis of the chunks.
document.id	The ID of the Document that was chunked. This is useful for merging the chunks back together with the original Document. The ID that is used is the UUID of the incoming FlowFile.
document.chunk.strategy	The strategy that was used to chunk the Document. One of ‘Section’ or ‘Paragraph’.
document.chunk.overlap	The number of characters from the previous chunk and subsequent chunk that are included in the given FlowFile’s text.
document.chunk.max.chars	The maximum number of characters that should be included in each chunk.
document.chunk.processing.elements.included	Specifies whether or not Processing Elements were included in the chunks. One of ‘true’ or ‘false’.
document.chunk.subsections.included	Specifies whether or not subsections were allowed to be included in the chunks. One of ‘true’ or ‘false’.