- 类别:
REGEXP_INSTR¶
返回字符串主体中正则表达式模式的指定匹配项的位置。如果未找到匹配项,则返回 0。
另请参阅 字符串函数(正则表达式)。
语法¶
REGEXP_INSTR( <subject> , <pattern> [ , <position> [ , <occurrence> [ , <option> [ , <regexp_parameters> [ , <group_num> ] ] ] ] ] )
实参¶
必填:
subject
用于搜索匹配项的字符串。
pattern
要匹配的模式。
可选:
position
函数开始搜索匹配项时,字符串开头的字符数。
默认值:
1
(搜索匹配项时,从左边的第一个字符开始)occurrence
指定要匹配的模式第几次出现。该函数会跳过第一个
occurrence - 1
匹配项。默认:
1
option
指定是返回匹配项 (
0
) 的第一个字符的偏移,还是返回匹配项 (1
) 结束后第一个字符的偏移。默认:
0
regexp_parameters
包含一个或多个字符的字符串,指定用于搜索匹配项的正则表达式参数。支持的值为:
c
:区分大小写。i
:不区分大小写。m
:多行模式。e
:提取子匹配项。s
:“.”通配符也与换行符匹配。
有关详细信息,请参阅 正则表达式参数 文档。
默认:
c
备注
默认情况下,REGEXP_INSTR 返回主题的整个匹配部分的开始或结束字符偏移。但是,如果指定了
e
(适用于“extract”)参数,则 REGEXP_INSTR 返回与模式中第一个子表达式匹配的主题部分的开始或结束字符偏移。如果指定了e
,但未同时指定group_num
,则group_num
的值将默认为 1(第一组)。 如果模式中没有子表达式,则 REGEXP_INSTR 表现得如同未设置e
。有关使用e
的示例,请参阅本主题中的 示例。group_num
group_num
参数指定要提取的组。通过在正则表达式中使用括号来指定组。如果指定了
group_num
,则 Snowflake 允许在未同时指定'e'
选项的情况下执行提取。隐含了'e'
。Snowflake 至多支持 1024 个组。
有关使用
group_num
的示例,请参阅本主题中的 示例。
使用说明¶
位置从 1 开始,而不是从 0 开始。例如,“MAN”中“M”的位置是 1,而不是 0。
有关其他使用说明,请参阅关于正则表达式函数的 一般使用说明。
排序规则详细信息¶
Arguments with collation specifications are currently not supported.
示例¶
基本示例¶
以下几个示例使用此数据:
CREATE TABLE demo1 (id INT, string1 VARCHAR); INSERT INTO demo1 (id, string1) VALUES (1, 'nevermore1, nevermore2, nevermore3.') ;
搜索匹配的字符串。在此案例中,字符串是“nevermore”,后跟一个十进制数字(例如“nevermore1”):
select id, string1, regexp_substr(string1, 'nevermore\\d') AS "SUBSTRING", regexp_instr( string1, 'nevermore\\d') AS "POSITION" from demo1 order by id; +----+-------------------------------------+------------+----------+ | ID | STRING1 | SUBSTRING | POSITION | |----+-------------------------------------+------------+----------| | 1 | nevermore1, nevermore2, nevermore3. | nevermore1 | 1 | +----+-------------------------------------+------------+----------+
搜索匹配的字符串,但从字符串中的第 5 个字符开始,而不是从字符串中的第 1 个字符开始:
select id, string1, regexp_substr(string1, 'nevermore\\d', 5) AS "SUBSTRING", regexp_instr( string1, 'nevermore\\d', 5) AS "POSITION" from demo1 order by id; +----+-------------------------------------+------------+----------+ | ID | STRING1 | SUBSTRING | POSITION | |----+-------------------------------------+------------+----------| | 1 | nevermore1, nevermore2, nevermore3. | nevermore2 | 13 | +----+-------------------------------------+------------+----------+
搜索匹配的字符串,但查找第 3 个匹配项,而不是第 1 个匹配项:
select id, string1, regexp_substr(string1, 'nevermore\\d', 1, 3) AS "SUBSTRING", regexp_instr( string1, 'nevermore\\d', 1, 3) AS "POSITION" from demo1 order by id; +----+-------------------------------------+------------+----------+ | ID | STRING1 | SUBSTRING | POSITION | |----+-------------------------------------+------------+----------| | 1 | nevermore1, nevermore2, nevermore3. | nevermore3 | 25 | +----+-------------------------------------+------------+----------+
此查询与上一个查询几乎相同,但此查询显示如何使用 option
参数来指明是需要匹配表达式的位置,还是需要匹配表达式后第一个字符的位置:
select id, string1, regexp_substr(string1, 'nevermore\\d', 1, 3) AS "SUBSTRING", regexp_instr( string1, 'nevermore\\d', 1, 3, 0) AS "START_POSITION", regexp_instr( string1, 'nevermore\\d', 1, 3, 1) AS "AFTER_POSITION" from demo1 order by id; +----+-------------------------------------+------------+----------------+----------------+ | ID | STRING1 | SUBSTRING | START_POSITION | AFTER_POSITION | |----+-------------------------------------+------------+----------------+----------------| | 1 | nevermore1, nevermore2, nevermore3. | nevermore3 | 25 | 35 | +----+-------------------------------------+------------+----------------+----------------+
此查询显示,如果您搜索上次实际出现之外的出现,则返回的位置为 0:
select id, string1, regexp_substr(string1, 'nevermore', 1, 4) AS "SUBSTRING", regexp_instr( string1, 'nevermore', 1, 4) AS "POSITION" from demo1 order by id; +----+-------------------------------------+-----------+----------+ | ID | STRING1 | SUBSTRING | POSITION | |----+-------------------------------------+-----------+----------| | 1 | nevermore1, nevermore2, nevermore3. | NULL | 0 | +----+-------------------------------------+-----------+----------+
捕获组示例¶
本部分介绍如何使用正则表达式的“组”功能。
本部分中的前几个示例不使用捕获组;本部分从一些简单的示例开始,然后继续使用捕获组的示例。
这些示例使用下面创建的字符串:
CREATE TABLE demo2 (id INT, string1 VARCHAR); INSERT INTO demo2 (id, string1) VALUES -- A string with multiple occurrences of the word "the". (2, 'It was the best of times, it was the worst of times.'), -- A string with multiple occurrences of the word "the" and with extra -- blanks between words. (3, 'In the string the extra spaces are redundant.'), -- A string with the character sequence "the" inside multiple words -- ("thespian" and "theater"), but without the word "the" by itself. (4, 'A thespian theater is nearby.') ;
下一个示例查找:
单词“the”
后跟一个或多个非单词字符(例如,用于分隔单词的空格)
后跟一个或多个单词字符。
“单词字符”不仅包括字母 a-z 和 A-Z,还包括下划线(“_”)和十进制数字 0-9,但不包括空格、标点符号等。
select id, string1, regexp_substr(string1, 'the\\W+\\w+') as "SUBSTRING", regexp_instr(string1, 'the\\W+\\w+') as "POSITION" from demo2 order by id; +----+-------------------------------------------------------------+--------------+----------+ | ID | STRING1 | SUBSTRING | POSITION | |----+-------------------------------------------------------------+--------------+----------| | 2 | It was the best of times, it was the worst of times. | the best | 8 | | 3 | In the string the extra spaces are redundant. | the string | 7 | | 4 | A thespian theater is nearby. | NULL | 0 | +----+-------------------------------------------------------------+--------------+----------+
从字符串的位置 1 开始,查找以下内容第 2 次出现的地方
单词“the”
后跟一个或多个非单词字符
后跟一个或多个单词字符。
select id, string1, regexp_substr(string1, 'the\\W+\\w+', 1, 2) as "SUBSTRING", regexp_instr(string1, 'the\\W+\\w+', 1, 2) as "POSITION" from demo2 order by id; +----+-------------------------------------------------------------+-------------+----------+ | ID | STRING1 | SUBSTRING | POSITION | |----+-------------------------------------------------------------+-------------+----------| | 2 | It was the best of times, it was the worst of times. | the worst | 34 | | 3 | In the string the extra spaces are redundant. | the extra | 22 | | 4 | A thespian theater is nearby. | NULL | 0 | +----+-------------------------------------------------------------+-------------+----------+
此示例与前面的示例类似,但添加了捕获组。此查询不返回整个匹配项的位置,而是仅返回“组”的位置(即,子字符串中与括号中正则表达式部分匹配的部分)。在此案例中,返回值应该是“the”之后的单词的位置。
select id, string1, regexp_substr(string1, 'the\\W+(\\w+)', 1, 2, 'e', 1) as "SUBSTRING", regexp_instr( string1, 'the\\W+(\\w+)', 1, 2, 0, 'e', 1) as "POSITION" from demo2 order by id; +----+-------------------------------------------------------------+-----------+----------+ | ID | STRING1 | SUBSTRING | POSITION | |----+-------------------------------------------------------------+-----------+----------| | 2 | It was the best of times, it was the worst of times. | worst | 38 | | 3 | In the string the extra spaces are redundant. | extra | 28 | | 4 | A thespian theater is nearby. | NULL | 0 | +----+-------------------------------------------------------------+-----------+----------+
如果您指定 'e'
(“extract”)参数,但不指定 group_num
,则 group_num 默认为 1:
select id, string1, regexp_substr(string1, 'the\\W+(\\w+)', 1, 2, 'e') as "SUBSTRING", regexp_instr( string1, 'the\\W+(\\w+)', 1, 2, 0, 'e') as "POSITION" from demo2 order by id; +----+-------------------------------------------------------------+-----------+----------+ | ID | STRING1 | SUBSTRING | POSITION | |----+-------------------------------------------------------------+-----------+----------| | 2 | It was the best of times, it was the worst of times. | worst | 38 | | 3 | In the string the extra spaces are redundant. | extra | 28 | | 4 | A thespian theater is nearby. | NULL | 0 | +----+-------------------------------------------------------------+-----------+----------+
如果您指定 group_num
,则即使您未指定 'e'
(“extract”)作为其中一个参数,Snowflake 也会假定您要提取:
select id, string1, regexp_substr(string1, 'the\\W+(\\w+)', 1, 2, '', 1) as "SUBSTRING", regexp_instr( string1, 'the\\W+(\\w+)', 1, 2, 0, '', 1) as "POSITION" from demo2 order by id; +----+-------------------------------------------------------------+-----------+----------+ | ID | STRING1 | SUBSTRING | POSITION | |----+-------------------------------------------------------------+-----------+----------| | 2 | It was the best of times, it was the worst of times. | worst | 38 | | 3 | In the string the extra spaces are redundant. | extra | 28 | | 4 | A thespian theater is nearby. | NULL | 0 | +----+-------------------------------------------------------------+-----------+----------+
此示例展示如何从双单词模式(第一个单词为 A
)的第一个、第二个和第三个匹配项中检索第二个单词的位置。这也表明,尝试超越最后一个模式会导致 Snowflake 返回 0。
CREATE TABLE demo3 (id INT, string1 VARCHAR); INSERT INTO demo3 (id, string1) VALUES (5, 'A MAN A PLAN A CANAL') ;select id, string1, regexp_substr(string1, 'A\\W+(\\w+)', 1, 1, 'e', 1) as "SUBSTRING1", regexp_instr( string1, 'A\\W+(\\w+)', 1, 1, 0, 'e', 1) as "POSITION1", regexp_substr(string1, 'A\\W+(\\w+)', 1, 2, 'e', 1) as "SUBSTRING2", regexp_instr( string1, 'A\\W+(\\w+)', 1, 2, 0, 'e', 1) as "POSITION2", regexp_substr(string1, 'A\\W+(\\w+)', 1, 3, 'e', 1) as "SUBSTRING3", regexp_instr( string1, 'A\\W+(\\w+)', 1, 3, 0, 'e', 1) as "POSITION3", regexp_substr(string1, 'A\\W+(\\w+)', 1, 4, 'e', 1) as "SUBSTRING4", regexp_instr( string1, 'A\\W+(\\w+)', 1, 4, 0, 'e', 1) as "POSITION4" from demo3; +----+----------------------+------------+-----------+------------+-----------+------------+-----------+------------+-----------+ | ID | STRING1 | SUBSTRING1 | POSITION1 | SUBSTRING2 | POSITION2 | SUBSTRING3 | POSITION3 | SUBSTRING4 | POSITION4 | |----+----------------------+------------+-----------+------------+-----------+------------+-----------+------------+-----------| | 5 | A MAN A PLAN A CANAL | MAN | 3 | PLAN | 9 | CANAL | 16 | NULL | 0 | +----+----------------------+------------+-----------+------------+-----------+------------+-----------+------------+-----------+
此示例展示如何检索模式第一次出现时第一个、第二个和第三个组的位置。在本案例中,返回值是单词 MAN
的各个字母的位置。
select id, string1, regexp_substr(string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1, 'e', 1) as "SUBSTR1", regexp_instr( string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1, 0, 'e', 1) as "POS1", regexp_substr(string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1, 'e', 2) as "SUBSTR2", regexp_instr( string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1, 0, 'e', 2) as "POS2", regexp_substr(string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1, 'e', 3) as "SUBSTR3", regexp_instr( string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1, 0, 'e', 3) as "POS3" from demo3; +----+----------------------+---------+------+---------+------+---------+------+ | ID | STRING1 | SUBSTR1 | POS1 | SUBSTR2 | POS2 | SUBSTR3 | POS3 | |----+----------------------+---------+------+---------+------+---------+------| | 5 | A MAN A PLAN A CANAL | M | 3 | A | 4 | N | 5 | +----+----------------------+---------+------+---------+------+---------+------+
其他示例¶
以下示例匹配单词 was
的出现。匹配从字符串中的第 1 个字符开始,并返回第一次出现后的字符在字符串中的位置:
select regexp_instr('It was the best of times, it was the worst of times', '\\bwas\\b', 1, 1) as "result" from dual;
+--------+
| result |
|--------|
| 4 |
+--------+
以下示例返回与模式匹配的字符串部分的第一个字符的偏移。匹配从字符串中的第 1 个字符开始,并返回模式的第一次出现:
select regexp_instr('It was the best of times, it was the worst of times', 'the\\W+(\\w+)',1,1,0) as "result" from dual;
+--------+
| result |
|--------|
| 8 |
+--------+
以下示例与上一个示例相同,但使用参数 e
返回与模式中第一个子表达式匹配的主题部分的字符偏移(即 the
之后的第一组单词字符):
select regexp_instr('It was the best of times, it was the worst of times', 'the\\W+(\\w+)',1,1,0,'e') as "result" from dual;
+--------+
| result |
|--------|
| 12 |
+--------+
以下示例匹配以 st
结尾且前面有 2 个或多个字母字符的单词的出现(不区分大小写)。匹配从字符串中的第 15 个字符开始,并返回第一次出现后的字符在字符串中的位置:
select regexp_instr('It was the best of times, it was the worst of times', '[[:alpha:]]{2,}st', 15, 1) as "result" from dual;
+--------+
| result |
|--------|
| 38 |
+--------+
准备示例:
create or replace table message(body varchar(255));
insert into message values
('Hellooo World'),
('How are you doing today?'),
('the quick brown fox jumps over the lazy dog'),
('PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS');
返回第一个匹配项中第一个字符的偏移:
select body, regexp_instr(body, '\\b\\S*o\\S*\\b') as result from message;
---------------------------------------------+-----------------------------------+
body | result |
---------------------------------------------+-----------------------------------+
Hellooo World | 1 |
How are you doing today? | 1 |
the quick brown fox jumps over the lazy dog | 11 |
PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS | 0 |
---------------------------------------------+-----------------------------------+
第一个匹配项中的第一个字符,从主题中的第三个字符开始:
select body, regexp_instr(body, '\\b\\S*o\\S*\\b', 3) as result from message;
---------------------------------------------+--------------------------------------+
body | result |
---------------------------------------------+--------------------------------------+
Hellooo World | 3 |
How are you doing today? | 9 |
the quick brown fox jumps over the lazy dog | 11 |
PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS | 0 |
---------------------------------------------+--------------------------------------+
第三个匹配项中的第一个字符,从主题中的第三个字符开始:
select body, regexp_instr(body, '\\b\\S*o\\S*\\b', 3, 3) as result from message; ---------------------------------------------+-----------------------------------------+ body | result | ---------------------------------------------+-----------------------------------------+ Hellooo World | 0 | How are you doing today? | 19 | the quick brown fox jumps over the lazy dog | 27 | PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS | 0 | ---------------------------------------------+-----------------------------------------+
第三个匹配项中的最后一个字符,从主题中的第三个字符开始:
select body, regexp_instr(body, '\\b\\S*o\\S*\\b', 3, 3, 1) as result from message;
---------------------------------------------+--------------------------------------------+
body | result |
---------------------------------------------+--------------------------------------------+
Hellooo World | 0 |
How are you doing today? | 24 |
the quick brown fox jumps over the lazy dog | 31 |
PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS | 0 |
---------------------------------------------+--------------------------------------------+
第三个匹配项中的最后一个字符,匹配从主题中的第三个字符开始,不区分大小写:
select body, regexp_instr(body, '\\b\\S*o\\S*\\b', 3, 3, 1, 'i') as result from message;
---------------------------------------------+-------------------------------------------------+
body | result |
---------------------------------------------+-------------------------------------------------+
Hellooo World | 0 |
How are you doing today? | 24 |
the quick brown fox jumps over the lazy dog | 31 |
PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS | 35 |
---------------------------------------------+-------------------------------------------------+
select body, regexp_instr(body, '\\S*(o)\\S*\\b', 1, 1, 0, 'i') as result from message;
---------------------------------------------+-------------------------------------------------+
body | result |
---------------------------------------------+-------------------------------------------------+
Hellooo World | 1 |
How are you doing today? | 1 |
the quick brown fox jumps over the lazy dog | 11 |
PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS | 9 |
---------------------------------------------+-------------------------------------------------+
select body, regexp_instr(body, '\\S*(o)\\S*\\b', 1, 1, 0, 'ie') as result from message;
---------------------------------------------+--------------------------------------------------+
body | result |
---------------------------------------------+--------------------------------------------------+
Hellooo World | 7 |
How are you doing today? | 2 |
the quick brown fox jumps over the lazy dog | 13 |
PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS | 10 |
---------------------------------------------+--------------------------------------------------+