- 类别:
REGEXP_SUBSTR¶
返回字符串中与正则表达式匹配的子字符串。如果未找到匹配项,则返回 NULL。
另请参阅:字符串函数(正则表达式)
语法¶
REGEXP_SUBSTR( <subject> , <pattern> [ , <position> [ , <occurrence> [ , <regex_parameters> [ , <group_num> ] ] ] ] )
实参¶
必填:
subject
用于搜索匹配项的字符串。
pattern
要匹配的模式。
有关指定模式的准则,请参阅 字符串函数(正则表达式)。
可选:
position
函数开始搜索匹配项时,字符串开头的字符数。
默认值:
1
(搜索匹配项时,从左边的第一个字符开始)occurrence
指定要匹配的模式第几次出现。该函数会跳过第一个
occurrence - 1
匹配项。默认:
1
regex_parameters
包含一个或多个字符的字符串,指定用于搜索匹配项的正则表达式参数。支持的值为:
c
:区分大小写。i
:不区分大小写。m
:多行模式。e
:提取子匹配项。s
:“.”通配符也与换行符匹配。
有关详细信息,请参阅 正则表达式参数 文档。
默认:
c
备注
默认情况下,REGEXP_SUBSTR 返回主题的整个匹配部分。但是,如果指定了
e
(用于“提取”)参数,则 REGEXP_SUBSTR 返回字符串中与模式中第一组匹配的部分。如果指定了e
,但未同时指定group_num
,则group_num
的值将默认为 1(第一组)。如果模式中没有子表达式,则 REGEXP_SUBSTR 表现得如同未设置e
。有关使用e
的示例,请参阅本主题中的 示例。group_num
group_num
参数指定要提取的组。通过在正则表达式中使用括号来指定组。如果指定了
group_num
,则 Snowflake 允许在未同时指定'e'
选项的情况下执行提取。隐含了'e'
。Snowflake 至多支持 1024 个组。
有关使用
group_num
的示例,请参阅本主题中的 示例。
返回¶
该函数返回一个匹配子字符串的 VARCHAR 类型值。
使用说明¶
有关使用正则表达式的其他信息,请参阅 字符串函数(正则表达式)。
排序规则详细信息¶
Arguments with collation specifications are currently not supported.
示例¶
此 REGEXP_INSTR 函数的文档包含许多同时使用 REGEXP_SUBSTR 和 REGEXP_INSTR 的示例。您可能也想看看这些例子。
接下来的几个示例使用下面创建的字符串表:
CREATE TABLE demo2 (id INT, string1 VARCHAR);; INSERT INTO demo2 (id, string1) VALUES -- A string with multiple occurrences of the word "the". (2, 'It was the best of times, it was the worst of times.'), -- A string with multiple occurrences of the word "the" and with extra -- blanks between words. (3, 'In the string the extra spaces are redundant.'), -- A string with the character sequence "the" inside multiple words -- ("thespian" and "theater"), but without the word "the" by itself. (4, 'A thespian theater is nearby.') ;
下一个示例查找:
单词“the”
后跟一个或多个非单词字符
后跟一个或多个单词字符。
“单词字符”不仅包括字母 a-z 和 A-Z,还包括下划线(“_”)和十进制数字 0-9,但不包括空格、标点符号等。
select id, regexp_substr(string1, 'the\\W+\\w+') as "RESULT" from demo2 order by id; +----+--------------+ | ID | RESULT | |----+--------------| | 2 | the best | | 3 | the string | | 4 | NULL | +----+--------------+
从字符串的位置 1 开始,查找以下内容第 2 次出现的地方
单词“the”
后跟一个或多个非单词字符
后跟一个或多个单词字符。
select id, regexp_substr(string1, 'the\\W+\\w+', 1, 2) as "RESULT" from demo2 order by id; +----+-------------+ | ID | RESULT | |----+-------------| | 2 | the worst | | 3 | the extra | | 4 | NULL | +----+-------------+
从字符串的位置 1 开始,查找以下内容第 2 次出现的地方
单词“the”
后跟一个或多个非单词字符
后跟一个或多个单词字符。
与其返回整个匹配项,不如只返回“组”(即子字符串中与括号中正则表达式部分匹配的部分)。在这种情况下,返回值应该是“the”之后的单词。
select id, regexp_substr(string1, 'the\\W+(\\w+)', 1, 2, 'e', 1) as "RESULT" from demo2 order by id; +----+--------+ | ID | RESULT | |----+--------| | 2 | worst | | 3 | extra | | 4 | NULL | +----+--------+
此示例说明如何从其中第一个单词为 A
的双字模式的第一、第二和第三个匹配项中检索第二个单词。这也表明,如果尝试获取超出最后一个模式的词,Snowflake 会返回 NULL。
CREATE TABLE demo3 (id INT, string1 VARCHAR);; INSERT INTO demo3 (id, string1) VALUES (5, 'A MAN A PLAN A CANAL') ;select id, regexp_substr(string1, 'A\\W+(\\w+)', 1, 1, 'e', 1) as "RESULT1", regexp_substr(string1, 'A\\W+(\\w+)', 1, 2, 'e', 1) as "RESULT2", regexp_substr(string1, 'A\\W+(\\w+)', 1, 3, 'e', 1) as "RESULT3", regexp_substr(string1, 'A\\W+(\\w+)', 1, 4, 'e', 1) as "RESULT4" from demo3; +----+---------+---------+---------+---------+ | ID | RESULT1 | RESULT2 | RESULT3 | RESULT4 | |----+---------+---------+---------+---------| | 5 | MAN | PLAN | CANAL | NULL | +----+---------+---------+---------+---------+
此示例说明如何在模式首次出现时检索第一、第二和第三组。在这种情况下,返回值是单词 MAN
的各个字母。
select id, regexp_substr(string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1, 'e', 1) as "RESULT1", regexp_substr(string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1, 'e', 2) as "RESULT2", regexp_substr(string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1, 'e', 3) as "RESULT3" from demo3; +----+---------+---------+---------+ | ID | RESULT1 | RESULT2 | RESULT3 | |----+---------+---------+---------| | 5 | M | A | N | +----+---------+---------+---------+
以下是一些其他示例。
-- Prepare example
create or replace table message(body varchar(255));
insert into message values
('Hellooo World'),
('How are you doing today?'),
('the quick brown fox jumps over the lazy dog'),
('PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS');
SELECT body, REGEXP_SUBSTR(body, '\\b\\S*o\\S*\\b') AS result FROM message;
+---------------------------------------------+------------------------------------------+
| BODY | result |
|---------------------------------------------+------------------------------------------|
| Hellooo World | Hellooo |
| How are you doing today? | How |
| the quick brown fox jumps over the lazy dog | brown |
| PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS | NULL |
+---------------------------------------------+------------------------------------------+
SELECT body, REGEXP_SUBSTR(body, '\\b\\S*o\\S*\\b', 3) AS result FROM message;
+---------------------------------------------+-------------------------------------------+
| BODY | result |
|---------------------------------------------+-------------------------------------------|
| Hellooo World | llooo |
| How are you doing today? | you |
| the quick brown fox jumps over the lazy dog | brown |
| PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS | NULL |
+---------------------------------------------+-------------------------------------------+
SELECT body, REGEXP_SUBSTR(body, '\\b\\S*o\\S*\\b', 3, 3) AS result FROM message;
+---------------------------------------------+----------------------------------------------+
| BODY | result |
|---------------------------------------------+----------------------------------------------|
| Hellooo World | NULL |
| How are you doing today? | today |
| the quick brown fox jumps over the lazy dog | over |
| PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS | NULL |
+---------------------------------------------+----------------------------------------------+
SELECT body, REGEXP_SUBSTR(body, '\\b\\S*o\\S*\\b', 3, 3, 'i') AS result FROM message;
+---------------------------------------------+---------------------------------------------------+
| BODY | result |
|---------------------------------------------+---------------------------------------------------|
| Hellooo World | NULL |
| How are you doing today? | today |
| the quick brown fox jumps over the lazy dog | over |
| PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS | LIQUOR |
+---------------------------------------------+---------------------------------------------------+
此示例说明,您可以通过指定空字符串来显式省略任何正则表达式参数。
SELECT body, REGEXP_SUBSTR(body, '(H\\S*o\\S*\\b).*', 1, 1, '') AS result FROM message;
+---------------------------------------------+----------------------------------------------------+
| BODY | result |
|---------------------------------------------+----------------------------------------------------|
| Hellooo World | Hellooo World |
| How are you doing today? | How are you doing today? |
| the quick brown fox jumps over the lazy dog | NULL |
| PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS | NULL |
+---------------------------------------------+----------------------------------------------------+
SELECT body, REGEXP_SUBSTR(body, '(H\\S*o\\S*\\b) .*', 1, 1, 'e') AS result FROM message;
+---------------------------------------------+------------------------------------------------------+
| BODY | result |
|---------------------------------------------+------------------------------------------------------|
| Hellooo World | Hellooo |
| How are you doing today? | How |
| the quick brown fox jumps over the lazy dog | NULL |
| PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS | NULL |
+---------------------------------------------+------------------------------------------------------+
以下示例说明了重叠出现的情况:
-- Prepare example create or replace table overlap (id number, a string); insert into overlap values (1,',abc,def,ghi,jkl,'); insert into overlap values (2,',abc,,def,,ghi,,jkl,'); select * from overlap; select id, regexp_substr(a,'[[:punct:]][[:alnum:]]+[[:punct:]]', 1, 2) as result from overlap; +----+--------+ | ID | result | |----+--------| | 1 | ,ghi, | | 2 | ,def, | +----+--------+
以下示例说明如何使用模式匹配和连接从 Apache HTTP Server 访问日志中创建 JSON 对象:
-- Prepare example CREATE OR REPLACE TABLE log (logs varchar); INSERT INTO log (logs) VALUES ('127.0.0.1 - - [10/Jan/2018:16:55:36 -0800] "GET / HTTP/1.0" 200 2216'), ('192.168.2.20 - - [14/Feb/2018:10:27:10 -0800] "GET /cgi-bin/try/ HTTP/1.0" 200 3395');SELECT '{ "ip_addr":"' || REGEXP_SUBSTR (logs,'\\b\\d{1,3}\.\\d{1,3}\.\\d{1,3}\.\\d{1,3}\\b') || '", "date":"' || REGEXP_SUBSTR (logs,'([\\w:\/]+\\s[+\-]\\d{4})') || '", "request":"' || REGEXP_SUBSTR (logs,'\"((\\S+) (\\S+) (\\S+))\"', 1, 1, 'e') || '", "status":"' || REGEXP_SUBSTR (logs,'(\\d{3}) \\d+', 1, 1, 'e') || '", "size":"' || REGEXP_SUBSTR (logs,'\\d{3} (\\d+)', 1, 1, 'e') || '"}' as Apache_HTTP_Server_Access FROM log; +-----------------------------------------------------------------------------------------------------------------------------------------+ | APACHE_HTTP_SERVER_ACCESS | |-----------------------------------------------------------------------------------------------------------------------------------------| | { "ip_addr":"127.0.0.1", "date":"10/Jan/2018:16:55:36 -0800", "request":"GET / HTTP/1.0", "status":"200", "size":"2216"} | | { "ip_addr":"192.168.2.20", "date":"14/Feb/2018:10:27:10 -0800", "request":"GET /cgi-bin/try/ HTTP/1.0", "status":"200", "size":"3395"} | +-----------------------------------------------------------------------------------------------------------------------------------------+