ExtractText 2025.10.9.21¶

捆绑包¶

org.apache.nifi | nifi-standard-nar

描述¶

Evaluates one or more Regular Expressions against the content of a FlowFile. The results of those Regular Expressions are assigned to FlowFile Attributes. Regular Expressions are entered by adding user-defined properties; the name of the property maps to the Attribute Name into which the result will be placed. The attributes are generated differently based on the enabling of named capture groups. If named capture groups are not enabled: The first capture group, if any found, will be placed into that attribute name. But all capture groups, including the matching string sequence itself will also be provided at that attribute name with an index value provided, with the exception of a capturing group that is optional and does not match - for example, given the attribute name "regex" and expression "abc(def)?(g)" we would add an attribute "regex.1" with a value of "def" if the "def" matched. If the "def" did not match, no attribute named "regex.1" would be added but an attribute named "regex.2" with a value of "g" will be added regardless. If named capture groups are enabled: Each named capture group, if found will be placed into the attributes name with the name provided. If enabled the matching string sequence itself will be placed into the attribute name. If multiple matches are enabled, and index will be applied after the first set of matches. The exception is a capturing group that is optional and does not match For example, given the attribute name "regex" and expression "abc(?<NAMED>def)?(?<NAMED-TWO>g)" we would add an attribute "regex. NAMED" with the value of "def" if the "def" matched. We would add an attribute "regex. NAMED-TWO" with the value of "g" if the "g" matched regardless. The value of the property must be a valid Regular Expressions with one or more capturing groups. If named capture groups are enabled, all capture groups must be named. If they are not, then the processor configuration will fail validation. If the Regular Expression matches more than once, only the first match will be used unless the property enabling repeating capture group is set to true. If any provided Regular Expression matches, the FlowFile(s) will be routed to 'matched'. If no provided Regular Expression matches, the FlowFile will be routed to 'unmatched' and no attributes will be applied to the FlowFile.

标签¶

Regular Expression、Text、evaluate、extract、regex

输入要求¶

REQUIRED

支持敏感的动态属性¶

false

属性¶

属性	描述
字符集	对文件进行编码的字符集
启用规范等效性	表示只有当两个字符的完整规范分解匹配时，它们才匹配。
启用不区分大小写的匹配	Indicates that two characters match even if they are in a different case. Can also be specified via the embedded flag (?i).
启用 DOTALL 模式	Indicates that the expression '.' should match any character, including a line terminator. Can also be specified via the embedded flag (?s).
启用模式的字面量解析	表示不应赋予元字符和转义字符任何特殊含义。
启用多行模式	Indicates that '^' and '$' should match just after and just before a line terminator or end of sequence, instead of only the beginning or end of the entire input. Can also be specified via the embeded flag (?m).
启用 Unicode 预定义字符类	Specifies conformance with the Unicode Technical Standard #18: Unicode Regular Expression Annex C: Compatibility Properties. Can also be specified via the embedded flag (?U).
启用 Unicode 感知大小写折叠	When used with 'Enable Case-insensitive Matching', matches in a manner consistent with the Unicode Standard. Can also be specified via the embedded flag (?u).
启用 Unix 行模式	Indicates that only the 'line terminator is recognized in the behavior of'. ','^ ', and'$'. Can also be specified via the embedded flag (?d).
启用命名组支持	If set to true, when named groups are present in the regular expression, the name of the group will be used in the attribute name as opposed to the group index. All capturing groups must be named, if the number of groups (not including capture group 0) does not equal the number of named groups validation will fail.
启用重复捕获组	如果设置为 True，则将提取与捕获组匹配的每个字符串。否则，如果正则表达式匹配多次，则仅提取第一次匹配。
括捕获组 0	表示应将“捕获组 0”作为属性包括在内。“捕获组 0”代表全部正则表达式匹配，通常不使用，且长度可能较长。
最大缓冲区大小	指定要缓冲的最大数据量（每个 FlowFile），以便应用正则表达式。大于指定最大值的 FlowFiles 将无法进行全面评估。
最大捕获组长度	指定给定捕获组值可以包含的最大字符数。任何超出最大值的字符都将被截断。
允许在模式中添加空格和注释	In this mode, whitespace is ignored, and embedded comments starting with # are ignored until the end of a line. Can also be specified via the embedded flag (?x).

关系¶

名称	描述
matched	成功评估正则表达式且 FlowFiles 因此被修改后，将 FlowFile 路由到此关系
unmatched	当提供的正则表达式不匹配 FlowFiles 的内容时，将 FlowFile 路由到此关系