类别:

字符串函数(正则表达式)

REGEXP_INSTR

返回字符串主体中正则表达式模式的指定匹配项的位置。如果未找到匹配项,则返回 0。

另请参阅 字符串函数(正则表达式)

语法

REGEXP_INSTR( <subject> , <pattern> [ , <position> [ , <occurrence> [ , <option> [ , <regexp_parameters> [ , <group_num> ] ] ] ] ] )
Copy

实参

必填:

subject

用于搜索匹配项的字符串。

pattern

要匹配的模式。

可选:

position

函数开始搜索匹配项时,字符串开头的字符数。

默认值:1 (搜索匹配项时,从左边的第一个字符开始)

occurrence

指定要匹配的模式第几次出现。该函数会跳过第一个 occurrence - 1 匹配项。

默认:1

option

指定是返回匹配项 (0) 的第一个字符的偏移,还是返回匹配项 (1) 结束后第一个字符的偏移。

默认:0

regexp_parameters

包含一个或多个字符的字符串,指定用于搜索匹配项的正则表达式参数。支持的值为:

  • c:区分大小写。

  • i:不区分大小写。

  • m:多行模式。

  • e:提取子匹配项。

  • s:“.”通配符也与换行符匹配。

有关详细信息,请参阅 正则表达式参数 文档。

默认:c

备注

默认情况下,REGEXP_INSTR 返回主题的整个匹配部分的开始或结束字符偏移。但是,如果指定了 e (适用于“extract”)参数,则 REGEXP_INSTR 返回与模式中第一个子表达式匹配的主题部分的开始或结束字符偏移。如果指定了 e,但未同时指定 group_num,则 group_num 的值将默认为 1(第一组)。 如果模式中没有子表达式,则 REGEXP_INSTR 表现得如同未设置 e。有关使用 e 的示例,请参阅本主题中的 示例

group_num

group_num 参数指定要提取的组。通过在正则表达式中使用括号来指定组。

如果指定了 group_num,则 Snowflake 允许在未同时指定 'e' 选项的情况下执行提取。隐含了 'e'

Snowflake 至多支持 1024 个组。

有关使用 group_num 的示例,请参阅本主题中的 示例

使用说明

  • 位置从 1 开始,而不是从 0 开始。例如,“MAN”中“M”的位置是 1,而不是 0。

  • 有关其他使用说明,请参阅关于正则表达式函数的 一般使用说明

排序规则详细信息

Arguments with collation specifications are currently not supported.

示例

基本示例

以下几个示例使用此数据:

CREATE TABLE demo1 (id INT, string1 VARCHAR);
INSERT INTO demo1 (id, string1) VALUES 
    (1, 'nevermore1, nevermore2, nevermore3.')
    ;
Copy

搜索匹配的字符串。在此案例中,字符串是“nevermore”,后跟一个十进制数字(例如“nevermore1”):

select id, string1,
      regexp_substr(string1, 'nevermore\\d') AS "SUBSTRING", 
      regexp_instr( string1, 'nevermore\\d') AS "POSITION"
    from demo1
    order by id;
+----+-------------------------------------+------------+----------+
| ID | STRING1                             | SUBSTRING  | POSITION |
|----+-------------------------------------+------------+----------|
|  1 | nevermore1, nevermore2, nevermore3. | nevermore1 |        1 |
+----+-------------------------------------+------------+----------+
Copy

搜索匹配的字符串,但从字符串中的第 5 个字符开始,而不是从字符串中的第 1 个字符开始:

select id, string1,
      regexp_substr(string1, 'nevermore\\d', 5) AS "SUBSTRING", 
      regexp_instr( string1, 'nevermore\\d', 5) AS "POSITION"
    from demo1
    order by id;
+----+-------------------------------------+------------+----------+
| ID | STRING1                             | SUBSTRING  | POSITION |
|----+-------------------------------------+------------+----------|
|  1 | nevermore1, nevermore2, nevermore3. | nevermore2 |       13 |
+----+-------------------------------------+------------+----------+
Copy

搜索匹配的字符串,但查找第 3 个匹配项,而不是第 1 个匹配项:

select id, string1,
      regexp_substr(string1, 'nevermore\\d', 1, 3) AS "SUBSTRING", 
      regexp_instr( string1, 'nevermore\\d', 1, 3) AS "POSITION"
    from demo1
    order by id;
+----+-------------------------------------+------------+----------+
| ID | STRING1                             | SUBSTRING  | POSITION |
|----+-------------------------------------+------------+----------|
|  1 | nevermore1, nevermore2, nevermore3. | nevermore3 |       25 |
+----+-------------------------------------+------------+----------+
Copy

此查询与上一个查询几乎相同,但此查询显示如何使用 option 参数来指明是需要匹配表达式的位置,还是需要匹配表达式后第一个字符的位置:

select id, string1,
       regexp_substr(string1, 'nevermore\\d', 1, 3) AS "SUBSTRING", 
       regexp_instr( string1, 'nevermore\\d', 1, 3, 0) AS "START_POSITION",
       regexp_instr( string1, 'nevermore\\d', 1, 3, 1) AS "AFTER_POSITION"
    from demo1
    order by id;
+----+-------------------------------------+------------+----------------+----------------+
| ID | STRING1                             | SUBSTRING  | START_POSITION | AFTER_POSITION |
|----+-------------------------------------+------------+----------------+----------------|
|  1 | nevermore1, nevermore2, nevermore3. | nevermore3 |             25 |             35 |
+----+-------------------------------------+------------+----------------+----------------+
Copy

此查询显示,如果您搜索上次实际出现之外的出现,则返回的位置为 0:

select id, string1, 
       regexp_substr(string1, 'nevermore', 1, 4) AS "SUBSTRING",
       regexp_instr( string1, 'nevermore', 1, 4) AS "POSITION"
    from demo1
    order by id;
+----+-------------------------------------+-----------+----------+
| ID | STRING1                             | SUBSTRING | POSITION |
|----+-------------------------------------+-----------+----------|
|  1 | nevermore1, nevermore2, nevermore3. | NULL      |        0 |
+----+-------------------------------------+-----------+----------+
Copy

捕获组示例

本部分介绍如何使用正则表达式的“组”功能。

本部分中的前几个示例不使用捕获组;本部分从一些简单的示例开始,然后继续使用捕获组的示例。

这些示例使用下面创建的字符串:

CREATE TABLE demo2 (id INT, string1 VARCHAR);
INSERT INTO demo2 (id, string1) VALUES 
    -- A string with multiple occurrences of the word "the".
    (2, 'It was the best of times, it was the worst of times.'),
    -- A string with multiple occurrences of the word "the" and with extra
    -- blanks between words.
    (3, 'In    the   string   the   extra   spaces  are   redundant.'),
    -- A string with the character sequence "the" inside multiple words 
    -- ("thespian" and "theater"), but without the word "the" by itself.
    (4, 'A thespian theater is nearby.')
    ;
Copy

下一个示例查找:

  • 单词“the”

  • 后跟一个或多个非单词字符(例如,用于分隔单词的空格)

  • 后跟一个或多个单词字符。

“单词字符”不仅包括字母 a-z 和 A-Z,还包括下划线(“_”)和十进制数字 0-9,但不包括空格、标点符号等。

select id, string1,
       regexp_substr(string1, 'the\\W+\\w+') as "SUBSTRING",
       regexp_instr(string1, 'the\\W+\\w+') as "POSITION"
    from demo2
    order by id;
+----+-------------------------------------------------------------+--------------+----------+
| ID | STRING1                                                     | SUBSTRING    | POSITION |
|----+-------------------------------------------------------------+--------------+----------|
|  2 | It was the best of times, it was the worst of times.        | the best     |        8 |
|  3 | In    the   string   the   extra   spaces  are   redundant. | the   string |        7 |
|  4 | A thespian theater is nearby.                               | NULL         |        0 |
+----+-------------------------------------------------------------+--------------+----------+
Copy

从字符串的位置 1 开始,查找以下内容第 2 次出现的地方

  • 单词“the”

  • 后跟一个或多个非单词字符

  • 后跟一个或多个单词字符。

select id, string1, 
       regexp_substr(string1, 'the\\W+\\w+', 1, 2) as "SUBSTRING",
       regexp_instr(string1, 'the\\W+\\w+', 1, 2) as "POSITION"
    from demo2
    order by id;
+----+-------------------------------------------------------------+-------------+----------+
| ID | STRING1                                                     | SUBSTRING   | POSITION |
|----+-------------------------------------------------------------+-------------+----------|
|  2 | It was the best of times, it was the worst of times.        | the worst   |       34 |
|  3 | In    the   string   the   extra   spaces  are   redundant. | the   extra |       22 |
|  4 | A thespian theater is nearby.                               | NULL        |        0 |
+----+-------------------------------------------------------------+-------------+----------+
Copy

此示例与前面的示例类似,但添加了捕获组。此查询不返回整个匹配项的位置,而是仅返回“组”的位置(即,子字符串中与括号中正则表达式部分匹配的部分)。在此案例中,返回值应该是“the”之后的单词的位置。

select id, string1,
       regexp_substr(string1, 'the\\W+(\\w+)', 1, 2,    'e', 1) as "SUBSTRING",
       regexp_instr( string1, 'the\\W+(\\w+)', 1, 2, 0, 'e', 1) as "POSITION"
    from demo2
    order by id;
+----+-------------------------------------------------------------+-----------+----------+
| ID | STRING1                                                     | SUBSTRING | POSITION |
|----+-------------------------------------------------------------+-----------+----------|
|  2 | It was the best of times, it was the worst of times.        | worst     |       38 |
|  3 | In    the   string   the   extra   spaces  are   redundant. | extra     |       28 |
|  4 | A thespian theater is nearby.                               | NULL      |        0 |
+----+-------------------------------------------------------------+-----------+----------+
Copy

如果您指定 'e' (“extract”)参数,但不指定 group_num,则 group_num 默认为 1:

select id, string1,
       regexp_substr(string1, 'the\\W+(\\w+)', 1, 2,    'e') as "SUBSTRING",
       regexp_instr( string1, 'the\\W+(\\w+)', 1, 2, 0, 'e') as "POSITION"
    from demo2
    order by id;
+----+-------------------------------------------------------------+-----------+----------+
| ID | STRING1                                                     | SUBSTRING | POSITION |
|----+-------------------------------------------------------------+-----------+----------|
|  2 | It was the best of times, it was the worst of times.        | worst     |       38 |
|  3 | In    the   string   the   extra   spaces  are   redundant. | extra     |       28 |
|  4 | A thespian theater is nearby.                               | NULL      |        0 |
+----+-------------------------------------------------------------+-----------+----------+
Copy

如果您指定 group_num,则即使您未指定 'e' (“extract”)作为其中一个参数,Snowflake 也会假定您要提取:

select id, string1,
       regexp_substr(string1, 'the\\W+(\\w+)', 1, 2,    '', 1) as "SUBSTRING",
       regexp_instr( string1, 'the\\W+(\\w+)', 1, 2, 0, '', 1) as "POSITION"
    from demo2
    order by id;
+----+-------------------------------------------------------------+-----------+----------+
| ID | STRING1                                                     | SUBSTRING | POSITION |
|----+-------------------------------------------------------------+-----------+----------|
|  2 | It was the best of times, it was the worst of times.        | worst     |       38 |
|  3 | In    the   string   the   extra   spaces  are   redundant. | extra     |       28 |
|  4 | A thespian theater is nearby.                               | NULL      |        0 |
+----+-------------------------------------------------------------+-----------+----------+
Copy

此示例展示如何从双单词模式(第一个单词为 A)的第一个、第二个和第三个匹配项中检索第二个单词的位置。这也表明,尝试超越最后一个模式会导致 Snowflake 返回 0。

CREATE TABLE demo3 (id INT, string1 VARCHAR);
INSERT INTO demo3 (id, string1) VALUES
    (5, 'A MAN A PLAN A CANAL')
    ;
Copy
select id, string1,
    regexp_substr(string1, 'A\\W+(\\w+)', 1, 1,    'e', 1) as "SUBSTRING1",
    regexp_instr( string1, 'A\\W+(\\w+)', 1, 1, 0, 'e', 1) as "POSITION1",
    regexp_substr(string1, 'A\\W+(\\w+)', 1, 2,    'e', 1) as "SUBSTRING2",
    regexp_instr( string1, 'A\\W+(\\w+)', 1, 2, 0, 'e', 1) as "POSITION2",
    regexp_substr(string1, 'A\\W+(\\w+)', 1, 3,    'e', 1) as "SUBSTRING3",
    regexp_instr( string1, 'A\\W+(\\w+)', 1, 3, 0, 'e', 1) as "POSITION3",
    regexp_substr(string1, 'A\\W+(\\w+)', 1, 4,    'e', 1) as "SUBSTRING4",
    regexp_instr( string1, 'A\\W+(\\w+)', 1, 4, 0, 'e', 1) as "POSITION4"
    from demo3;
+----+----------------------+------------+-----------+------------+-----------+------------+-----------+------------+-----------+
| ID | STRING1              | SUBSTRING1 | POSITION1 | SUBSTRING2 | POSITION2 | SUBSTRING3 | POSITION3 | SUBSTRING4 | POSITION4 |
|----+----------------------+------------+-----------+------------+-----------+------------+-----------+------------+-----------|
|  5 | A MAN A PLAN A CANAL | MAN        |         3 | PLAN       |         9 | CANAL      |        16 | NULL       |         0 |
+----+----------------------+------------+-----------+------------+-----------+------------+-----------+------------+-----------+
Copy

此示例展示如何检索模式第一次出现时第一个、第二个和第三个组的位置。在本案例中,返回值是单词 MAN 的各个字母的位置。

select id, string1, 
    regexp_substr(string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1,    'e', 1) as "SUBSTR1",
    regexp_instr( string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1, 0, 'e', 1) as "POS1",
    regexp_substr(string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1,    'e', 2) as "SUBSTR2",
    regexp_instr( string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1, 0, 'e', 2) as "POS2",
    regexp_substr(string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1,    'e', 3) as "SUBSTR3",
    regexp_instr( string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1, 0, 'e', 3) as "POS3"
    from demo3;
+----+----------------------+---------+------+---------+------+---------+------+
| ID | STRING1              | SUBSTR1 | POS1 | SUBSTR2 | POS2 | SUBSTR3 | POS3 |
|----+----------------------+---------+------+---------+------+---------+------|
|  5 | A MAN A PLAN A CANAL | M       |    3 | A       |    4 | N       |    5 |
+----+----------------------+---------+------+---------+------+---------+------+
Copy

其他示例

以下示例匹配单词 was 的出现。匹配从字符串中的第 1 个字符开始,并返回第一次出现后的字符在字符串中的位置:

select regexp_instr('It was the best of times, it was the worst of times', '\\bwas\\b', 1, 1) as "result" from dual;

+--------+
| result |
|--------|
|      4 |
+--------+
Copy

以下示例返回与模式匹配的字符串部分的第一个字符的偏移。匹配从字符串中的第 1 个字符开始,并返回模式的第一次出现:

select regexp_instr('It was the best of times, it was the worst of times', 'the\\W+(\\w+)',1,1,0) as "result" from dual;

+--------+
| result |
|--------|
|      8 |
+--------+
Copy

以下示例与上一个示例相同,但使用参数 e 返回与模式中第一个子表达式匹配的主题部分的字符偏移(即 the 之后的第一组单词字符):

select regexp_instr('It was the best of times, it was the worst of times', 'the\\W+(\\w+)',1,1,0,'e') as "result" from dual;

+--------+
| result |
|--------|
|     12 |
+--------+
Copy

以下示例匹配以 st 结尾且前面有 2 个或多个字母字符的单词的出现(不区分大小写)。匹配从字符串中的第 15 个字符开始,并返回第一次出现后的字符在字符串中的位置:

select regexp_instr('It was the best of times, it was the worst of times', '[[:alpha:]]{2,}st', 15, 1) as "result" from dual;

+--------+
| result |
|--------|
|     38 |
+--------+
Copy

准备示例:

create or replace table message(body varchar(255));
insert into message values
('Hellooo World'),
('How are you doing today?'),
('the quick brown fox jumps over the lazy dog'),
('PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS');
Copy

返回第一个匹配项中第一个字符的偏移:

select body, regexp_instr(body, '\\b\\S*o\\S*\\b') as result from message;

---------------------------------------------+-----------------------------------+
                    body                     | result                            |
---------------------------------------------+-----------------------------------+
 Hellooo World                               | 1                                 |
 How are you doing today?                    | 1                                 |
 the quick brown fox jumps over the lazy dog | 11                                |
 PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS     | 0                                 |
---------------------------------------------+-----------------------------------+
Copy

第一个匹配项中的第一个字符,从主题中的第三个字符开始:

select body, regexp_instr(body, '\\b\\S*o\\S*\\b', 3) as result from message;

---------------------------------------------+--------------------------------------+
                    body                     | result                               |
---------------------------------------------+--------------------------------------+
 Hellooo World                               | 3                                    |
 How are you doing today?                    | 9                                    |
 the quick brown fox jumps over the lazy dog | 11                                   |
 PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS     | 0                                    |
---------------------------------------------+--------------------------------------+
Copy

第三个匹配项中的第一个字符,从主题中的第三个字符开始:

select body, regexp_instr(body, '\\b\\S*o\\S*\\b', 3, 3) as result from message;

---------------------------------------------+-----------------------------------------+
                    body                     | result                                  |
---------------------------------------------+-----------------------------------------+
 Hellooo World                               | 0                                       |
 How are you doing today?                    | 19                                      |
 the quick brown fox jumps over the lazy dog | 27                                      |
 PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS     | 0                                       |
---------------------------------------------+-----------------------------------------+
Copy

第三个匹配项中的最后一个字符,从主题中的第三个字符开始:

select body, regexp_instr(body, '\\b\\S*o\\S*\\b', 3, 3, 1) as result from message;

---------------------------------------------+--------------------------------------------+
                    body                     | result                                     |
---------------------------------------------+--------------------------------------------+
 Hellooo World                               | 0                                          |
 How are you doing today?                    | 24                                         |
 the quick brown fox jumps over the lazy dog | 31                                         |
 PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS     | 0                                          |
---------------------------------------------+--------------------------------------------+
Copy

第三个匹配项中的最后一个字符,匹配从主题中的第三个字符开始,不区分大小写:

select body, regexp_instr(body, '\\b\\S*o\\S*\\b', 3, 3, 1, 'i') as result from message;

---------------------------------------------+-------------------------------------------------+
                    body                     | result                                          |
---------------------------------------------+-------------------------------------------------+
 Hellooo World                               | 0                                               |
 How are you doing today?                    | 24                                              |
 the quick brown fox jumps over the lazy dog | 31                                              |
 PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS     | 35                                              |
---------------------------------------------+-------------------------------------------------+
Copy
select body, regexp_instr(body, '\\S*(o)\\S*\\b', 1, 1, 0, 'i') as result from message;

---------------------------------------------+-------------------------------------------------+
                    body                     | result                                          |
---------------------------------------------+-------------------------------------------------+
 Hellooo World                               | 1                                               |
 How are you doing today?                    | 1                                               |
 the quick brown fox jumps over the lazy dog | 11                                              |
 PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS     | 9                                               |
---------------------------------------------+-------------------------------------------------+
Copy
select body, regexp_instr(body, '\\S*(o)\\S*\\b', 1, 1, 0, 'ie') as result from message;

---------------------------------------------+--------------------------------------------------+
                    body                     | result                                           |
---------------------------------------------+--------------------------------------------------+
 Hellooo World                               | 7                                                |
 How are you doing today?                    | 2                                                |
 the quick brown fox jumps over the lazy dog | 13                                               |
 PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS     | 10                                               |
---------------------------------------------+--------------------------------------------------+
Copy
语言: 中文