- 类别:
REGEXP_INSTR¶
返回字符串主体中正则表达式模式的指定匹配项的位置。
另请参阅 字符串函数(正则表达式)。
语法¶
REGEXP_INSTR( <subject> , <pattern> [ , <position> [ , <occurrence> [ , <option> [ , <regexp_parameters> [ , <group_num> ] ] ] ] ] )
实参¶
必填:
subject
用于搜索匹配项的字符串。
pattern
要匹配的模式。
有关指定模式的准则,请参阅 字符串函数(正则表达式)。
可选:
position
函数开始搜索匹配项时,字符串开头的字符数。
默认值:``1``(搜索匹配项时,从左边的第一个字符开始)
occurrence
指定要从中开始返回匹配项的模式的第一次出现。
该函数会跳过第一个
occurrence - 1
匹配项。例如,如果存在 5 个匹配项,并且您为occurrence
实参指定3
个匹配项,则该函数将忽略前两个匹配项,并返回第三个、第四个和第五个匹配项。默认:
1
option
指定是返回匹配项 (
0
) 的第一个字符的偏移,还是返回匹配项 (1
) 结束后第一个字符的偏移。默认:
0
regexp_parameters
包含一个或多个字符的字符串,指定用于搜索匹配项的参数。支持的值:
参数
描述
c
区分大小写的匹配
i
不区分大小写的匹配
m
多行模式
e
提取子匹配项
s
POSIX 通配符
.
与\n
匹配默认:
c
有关更多详细信息,请参阅 为正则表达式指定参数。
备注
默认情况下,REGEXP_INSTR 返回主题的整个匹配部分的开始或结束字符偏移。但是,如果指定了
e
(适用于“extract”)参数,则 REGEXP_INSTR 返回与模式中第一个子表达式匹配的主题部分的开始或结束字符偏移。如果指定了e
,但未同时指定group_num
,则group_num
的值将默认为 1(第一组)。 如果模式中没有子表达式,则 REGEXP_INSTR 表现得如同未设置e
。有关使用e
的示例,请参阅本主题中的 示例。group_num
group_num
参数指定要提取的组。通过在正则表达式中使用括号来指定组。如果指定了
group_num
,则 Snowflake 允许在未同时指定e
选项的情况下执行提取。e
选项是隐含的。Snowflake 至多支持 1024 个组。
有关使用
group_num
的示例,请参阅本主题中的 捕获组示例。
返回¶
返回类型 NUMBER 的值。
如果未找到匹配项,则返回 0
。
使用说明¶
位置从 1 开始,而不是从 0 开始。例如,“MAN”中“M”的位置是 1,而不是 0。
有关其他使用说明,请参阅关于正则表达式函数的 一般使用说明。
排序规则详细信息¶
Arguments with collation specifications currently aren't supported.
示例¶
以下示例使用 REGEXP_INSTR 函数。
基本示例¶
创建表并插入数据:
CREATE OR REPLACE TABLE demo1 (id INT, string1 VARCHAR);
INSERT INTO demo1 (id, string1) VALUES
(1, 'nevermore1, nevermore2, nevermore3.');
搜索匹配的字符串。在这种情况下,字符串是 nevermore
,后跟一个十进制数字(例如,nevermore1
)。示例使用 REGEXP_SUBSTR 函数显示匹配的子字符串:
SELECT id,
string1,
REGEXP_SUBSTR(string1, 'nevermore\\d') AS substring,
REGEXP_INSTR( string1, 'nevermore\\d') AS position
FROM demo1
ORDER BY id;
+----+-------------------------------------+------------+----------+
| ID | STRING1 | SUBSTRING | POSITION |
|----+-------------------------------------+------------+----------|
| 1 | nevermore1, nevermore2, nevermore3. | nevermore1 | 1 |
+----+-------------------------------------+------------+----------+
搜索匹配的字符串,但从字符串中的第五个字符开始,而不是从字符串中的第一个字符开始:
SELECT id,
string1,
REGEXP_SUBSTR(string1, 'nevermore\\d', 5) AS substring,
REGEXP_INSTR( string1, 'nevermore\\d', 5) AS position
FROM demo1
ORDER BY id;
+----+-------------------------------------+------------+----------+
| ID | STRING1 | SUBSTRING | POSITION |
|----+-------------------------------------+------------+----------|
| 1 | nevermore1, nevermore2, nevermore3. | nevermore2 | 13 |
+----+-------------------------------------+------------+----------+
搜索匹配的字符串,但查找第三个匹配项,而不是第一个匹配项:
SELECT id,
string1,
REGEXP_SUBSTR(string1, 'nevermore\\d', 1, 3) AS substring,
REGEXP_INSTR( string1, 'nevermore\\d', 1, 3) AS position
FROM demo1
ORDER BY id;
+----+-------------------------------------+------------+----------+
| ID | STRING1 | SUBSTRING | POSITION |
|----+-------------------------------------+------------+----------|
| 1 | nevermore1, nevermore2, nevermore3. | nevermore3 | 25 |
+----+-------------------------------------+------------+----------+
此查询与上一个查询几乎相同,但此查询显示如何使用 option
实参来指明是需要匹配表达式的位置,还是需要匹配表达式后第一个字符的位置:
SELECT id,
string1,
REGEXP_SUBSTR(string1, 'nevermore\\d', 1, 3) AS substring,
REGEXP_INSTR( string1, 'nevermore\\d', 1, 3, 0) AS start_position,
REGEXP_INSTR( string1, 'nevermore\\d', 1, 3, 1) AS after_position
FROM demo1
ORDER BY id;
+----+-------------------------------------+------------+----------------+----------------+
| ID | STRING1 | SUBSTRING | START_POSITION | AFTER_POSITION |
|----+-------------------------------------+------------+----------------+----------------|
| 1 | nevermore1, nevermore2, nevermore3. | nevermore3 | 25 | 35 |
+----+-------------------------------------+------------+----------------+----------------+
此查询显示,如果您搜索上次实际出现之外的出现,则返回的位置为 0:
SELECT id,
string1,
REGEXP_SUBSTR(string1, 'nevermore', 1, 4) AS substring,
REGEXP_INSTR( string1, 'nevermore', 1, 4) AS position
FROM demo1
ORDER BY id;
+----+-------------------------------------+-----------+----------+
| ID | STRING1 | SUBSTRING | POSITION |
|----+-------------------------------------+-----------+----------|
| 1 | nevermore1, nevermore2, nevermore3. | NULL | 0 |
+----+-------------------------------------+-----------+----------+
捕获组示例¶
本部分介绍如何使用正则表达式的“组”功能。
本部分中的前几个示例不使用捕获组。本部分从一些简单的示例开始,然后继续使用捕获组的示例。
这些示例使用下面创建的字符串:
CREATE OR REPLACE TABLE demo2 (id INT, string1 VARCHAR);
INSERT INTO demo2 (id, string1) VALUES
(2, 'It was the best of times, it was the worst of times.'),
(3, 'In the string the extra spaces are redundant.'),
(4, 'A thespian theater is nearby.');
SELECT * FROM demo2;
+----+-------------------------------------------------------------+
| ID | STRING1 |
|----+-------------------------------------------------------------|
| 2 | It was the best of times, it was the worst of times. |
| 3 | In the string the extra spaces are redundant. |
| 4 | A thespian theater is nearby. |
+----+-------------------------------------------------------------+
字符串具有以下特征:
id
为2
的字符串中多次出现单词“the”。id
为3
的字符串中多次出现单词“the”,单词之间有额外的空格。id
为4
字符的字符串中,字符序列“the”出现在多个单词中(如“thespian”和“theater”),但未单独出现“the”这个单词。
此示例查找单词 the
的第一次出现,后跟一个或多个非单词字符(例如,用于分隔单词的空格),后跟一个或多个单词字符。
“单词字符”不仅包括字母 a-z 和 A-Z,还包括下划线(“_”)和十进制数字 0-9,但不包括空格、标点符号等。
SELECT id,
string1,
REGEXP_SUBSTR(string1, 'the\\W+\\w+') AS substring,
REGEXP_INSTR(string1, 'the\\W+\\w+') AS position
FROM demo2
ORDER BY id;
+----+-------------------------------------------------------------+--------------+----------+
| ID | STRING1 | SUBSTRING | POSITION |
|----+-------------------------------------------------------------+--------------+----------|
| 2 | It was the best of times, it was the worst of times. | the best | 8 |
| 3 | In the string the extra spaces are redundant. | the string | 7 |
| 4 | A thespian theater is nearby. | NULL | 0 |
+----+-------------------------------------------------------------+--------------+----------+
从字符串的位置 1 开始,查找单词 the
的第二次出现,后跟一个或多个非单词字符,后跟一个或多个单词字符。
SELECT id,
string1,
REGEXP_SUBSTR(string1, 'the\\W+\\w+', 1, 2) AS substring,
REGEXP_INSTR(string1, 'the\\W+\\w+', 1, 2) AS position
FROM demo2
ORDER BY id;
+----+-------------------------------------------------------------+-------------+----------+
| ID | STRING1 | SUBSTRING | POSITION |
|----+-------------------------------------------------------------+-------------+----------|
| 2 | It was the best of times, it was the worst of times. | the worst | 34 |
| 3 | In the string the extra spaces are redundant. | the extra | 22 |
| 4 | A thespian theater is nearby. | NULL | 0 |
+----+-------------------------------------------------------------+-------------+----------+
此示例与前面的示例类似,但添加了捕获组。此查询不返回整个匹配项的位置,而是仅返回 组 的位置(即,子字符串中与括号中正则表达式部分匹配的部分)。在这种情况下,返回的值是单词 the
在第二次出现后的位置。
SELECT id,
string1,
REGEXP_SUBSTR(string1, 'the\\W+(\\w+)', 1, 2, 'e', 1) AS substring,
REGEXP_INSTR( string1, 'the\\W+(\\w+)', 1, 2, 0, 'e', 1) AS position
FROM demo2
ORDER BY id;
+----+-------------------------------------------------------------+-----------+----------+
| ID | STRING1 | SUBSTRING | POSITION |
|----+-------------------------------------------------------------+-----------+----------|
| 2 | It was the best of times, it was the worst of times. | worst | 38 |
| 3 | In the string the extra spaces are redundant. | extra | 28 |
| 4 | A thespian theater is nearby. | NULL | 0 |
+----+-------------------------------------------------------------+-----------+----------+
如果您指定 'e'`(提取)参数,但不指定 :samp:`{group_num}
,则 group_num
默认为 1
:
SELECT id,
string1,
REGEXP_SUBSTR(string1, 'the\\W+(\\w+)', 1, 2, 'e') AS substring,
REGEXP_INSTR( string1, 'the\\W+(\\w+)', 1, 2, 0, 'e') AS position
FROM demo2
ORDER BY id;
+----+-------------------------------------------------------------+-----------+----------+
| ID | STRING1 | SUBSTRING | POSITION |
|----+-------------------------------------------------------------+-----------+----------|
| 2 | It was the best of times, it was the worst of times. | worst | 38 |
| 3 | In the string the extra spaces are redundant. | extra | 28 |
| 4 | A thespian theater is nearby. | NULL | 0 |
+----+-------------------------------------------------------------+-----------+----------+
如果您指定 group_num
,则即使您未指定 :code:`'e'`(提取)作为其中一个参数,Snowflake 也会假定您要提取:
SELECT id,
string1,
REGEXP_SUBSTR(string1, 'the\\W+(\\w+)', 1, 2, '', 1) AS substring,
REGEXP_INSTR( string1, 'the\\W+(\\w+)', 1, 2, 0, '', 1) AS position
FROM demo2
ORDER BY id;
+----+-------------------------------------------------------------+-----------+----------+
| ID | STRING1 | SUBSTRING | POSITION |
|----+-------------------------------------------------------------+-----------+----------|
| 2 | It was the best of times, it was the worst of times. | worst | 38 |
| 3 | In the string the extra spaces are redundant. | extra | 28 |
| 4 | A thespian theater is nearby. | NULL | 0 |
+----+-------------------------------------------------------------+-----------+----------+
此示例展示如何从双单词模式(第一个单词为 A
)的第一个、第二个和第三个匹配项中检索第二个单词的位置。这也表明,尝试超越最后一个模式会导致 Snowflake 返回 0。
创建表并插入数据:
CREATE TABLE demo3 (id INT, string1 VARCHAR);
INSERT INTO demo3 (id, string1) VALUES
(5, 'A MAN A PLAN A CANAL');
运行查询:
SELECT id,
string1,
REGEXP_SUBSTR(string1, 'A\\W+(\\w+)', 1, 1, 'e', 1) AS substring1,
REGEXP_INSTR( string1, 'A\\W+(\\w+)', 1, 1, 0, 'e', 1) AS position1,
REGEXP_SUBSTR(string1, 'A\\W+(\\w+)', 1, 2, 'e', 1) AS substring2,
REGEXP_INSTR( string1, 'A\\W+(\\w+)', 1, 2, 0, 'e', 1) AS position2,
REGEXP_SUBSTR(string1, 'A\\W+(\\w+)', 1, 3, 'e', 1) AS substring3,
REGEXP_INSTR( string1, 'A\\W+(\\w+)', 1, 3, 0, 'e', 1) AS position3,
REGEXP_SUBSTR(string1, 'A\\W+(\\w+)', 1, 4, 'e', 1) AS substring4,
REGEXP_INSTR( string1, 'A\\W+(\\w+)', 1, 4, 0, 'e', 1) AS position4
FROM demo3;
+----+----------------------+------------+-----------+------------+-----------+------------+-----------+------------+-----------+
| ID | STRING1 | SUBSTRING1 | POSITION1 | SUBSTRING2 | POSITION2 | SUBSTRING3 | POSITION3 | SUBSTRING4 | POSITION4 |
|----+----------------------+------------+-----------+------------+-----------+------------+-----------+------------+-----------|
| 5 | A MAN A PLAN A CANAL | MAN | 3 | PLAN | 9 | CANAL | 16 | NULL | 0 |
+----+----------------------+------------+-----------+------------+-----------+------------+-----------+------------+-----------+
此示例展示如何检索模式第一次出现时第一个、第二个和第三个组的位置。在本案例中,返回值是单词 MAN
的各个字母的位置。
SELECT id,
string1,
REGEXP_SUBSTR(string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1, 'e', 1) AS substring1,
REGEXP_INSTR( string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1, 0, 'e', 1) AS position1,
REGEXP_SUBSTR(string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1, 'e', 2) AS substring2,
REGEXP_INSTR( string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1, 0, 'e', 2) AS position2,
REGEXP_SUBSTR(string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1, 'e', 3) AS substring3,
REGEXP_INSTR( string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1, 0, 'e', 3) AS position3
FROM demo3;
+----+----------------------+------------+-----------+------------+-----------+------------+-----------+
| ID | STRING1 | SUBSTRING1 | POSITION1 | SUBSTRING2 | POSITION2 | SUBSTRING3 | POSITION3 |
|----+----------------------+------------+-----------+------------+-----------+------------+-----------|
| 5 | A MAN A PLAN A CANAL | M | 3 | A | 4 | N | 5 |
+----+----------------------+------------+-----------+------------+-----------+------------+-----------+
其他示例¶
以下示例匹配单词 was
的出现。匹配从字符串中的第一个字符开始,并返回第一次出现后的字符在字符串中的位置:
SELECT REGEXP_INSTR('It was the best of times, it was the worst of times',
'\\bwas\\b',
1,
1) AS result;
+--------+
| RESULT |
|--------|
| 4 |
+--------+
以下示例返回与模式匹配的字符串部分的第一个字符的偏移。匹配从字符串中的第一个字符开始,并返回模式的第一次出现:
SELECT REGEXP_INSTR('It was the best of times, it was the worst of times',
'the\\W+(\\w+)',
1,
1,
0) AS result;
+--------+
| RESULT |
|--------|
| 8 |
+--------+
以下示例与上一个示例相同,但使用参数 e
返回与模式中第一个子表达式匹配的主题部分的字符偏移(即 the
之后的第一组单词字符):
SELECT REGEXP_INSTR('It was the best of times, it was the worst of times',
'the\\W+(\\w+)',
1,
1,
0,
'e') AS result;
+--------+
| RESULT |
|--------|
| 12 |
+--------+
以下示例匹配以 st
结尾且前面有两个或多个字母字符的单词的出现(不区分大小写)。匹配从字符串中的第十五个字符开始,并返回第一次出现后的字符在字符串中的位置(以 worst
开始):
SELECT REGEXP_INSTR('It was the best of times, it was the worst of times',
'[[:alpha:]]{2,}st',
15,
1) AS result;
+--------+
| RESULT |
|--------|
| 38 |
+--------+
要运行下一组示例,请创建表并插入数据:
CREATE OR REPLACE TABLE message(body VARCHAR(255));
INSERT INTO message VALUES
('Hellooo World'),
('How are you doing today?'),
('the quick brown fox jumps over the lazy dog'),
('PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS');
返回包含小写 o
的第一个匹配中第一个字符的偏移:
SELECT body,
REGEXP_INSTR(body, '\\b\\S*o\\S*\\b') AS result
FROM message;
+---------------------------------------------+--------+
| BODY | RESULT |
|---------------------------------------------+--------|
| Hellooo World | 1 |
| How are you doing today? | 1 |
| the quick brown fox jumps over the lazy dog | 11 |
| PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS | 0 |
+---------------------------------------------+--------+
返回包含小写 o
的第一个匹配中第一个字符的偏移,从主题中的第三个字符开始:
SELECT body,
REGEXP_INSTR(body, '\\b\\S*o\\S*\\b', 3) AS result
FROM message;
+---------------------------------------------+--------+
| BODY | RESULT |
|---------------------------------------------+--------|
| Hellooo World | 3 |
| How are you doing today? | 9 |
| the quick brown fox jumps over the lazy dog | 11 |
| PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS | 0 |
+---------------------------------------------+--------+
返回包含小写 o
的第三个匹配中第一个字符的偏移,从主题中的第三个字符开始:
SELECT body, REGEXP_INSTR(body, '\\b\\S*o\\S*\\b', 3, 3) AS result
FROM message;
+---------------------------------------------+--------+
| BODY | RESULT |
|---------------------------------------------+--------|
| Hellooo World | 0 |
| How are you doing today? | 19 |
| the quick brown fox jumps over the lazy dog | 27 |
| PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS | 0 |
+---------------------------------------------+--------+
返回包含小写 o
的第三个匹配中最后一个字符的偏移,从主题中的第三个字符开始:
SELECT body, REGEXP_INSTR(body, '\\b\\S*o\\S*\\b', 3, 3, 1) AS result
FROM message;
+---------------------------------------------+--------+
| BODY | RESULT |
|---------------------------------------------+--------|
| Hellooo World | 0 |
| How are you doing today? | 24 |
| the quick brown fox jumps over the lazy dog | 31 |
| PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS | 0 |
+---------------------------------------------+--------+
返回包含小写 o
的第三个匹配中最后一个字符的偏移,从主题中的第三个字符开始,不区分大小写的匹配:
SELECT body, REGEXP_INSTR(body, '\\b\\S*o\\S*\\b', 3, 3, 1, 'i') AS result
FROM message;
+---------------------------------------------+--------+
| BODY | RESULT |
|---------------------------------------------+--------|
| Hellooo World | 0 |
| How are you doing today? | 24 |
| the quick brown fox jumps over the lazy dog | 31 |
| PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS | 35 |
+---------------------------------------------+--------+