Categories:

String & binary functions

SOUNDEX

返回包含输入字符串的语音表示形式的字符串。

You can use this function to determine whether two strings (e.g. the family names Levine and Lavine, the words to and too, etc.) have similar pronounciations in the English language.

This function uses the Soundex phonetic algorithm, which is described in Soundex System (https://www.archives.gov/research/census/soundex). Note, however, that Snowflake provides no special handling for surname prefixes (e.g. “Van”, “De”, “La”, etc.).

SOUNDEX('Pfister') returns P236. Because the first two letters (P and f) are adjacent and share the same Soundex code number (1), the function ignores the Soundex code number for the second letter.

Some database systems (e.g. Teradata) use a variant that retains the Soundex code number for the second letter when the first and second letters use the same number. For that variant, the string for Pfister is P123 (not P236). To use that variant, call the SOUNDEX_P123 function instead.

See also:

SOUNDEX_P123

语法

SOUNDEX( <varchar_expr> )

实参

varchar_expr

返回发音表示形式的字符串。字符串应使用拉丁文或 Unicode 字符集。

返回

返回的值为包含输入字符串的发音表示形式的 VARCHAR。换句话说,返回值是一个字符串(不是发音),表示输入字符串的发音(而不是拼写)。

请注意以下事项:

  • The returned value starts with a letter that represents the first letter in the string followed by 3 digits (e.g. s400, c130).

    For more information about how the return value is calculated, see the Soundex phonetic algorithm (in Wikipedia).

  • As mentioned earlier, if you want to use the variant that retains the Soundex code number for the second letter when the first and second letters use the same number, call the SOUNDEX_P123 function instead.

使用说明

  • Because the function returns only four characters (one letter and three digits), the output is primarily determined by the first few syllables of the input, rather than the entire string.

例如,以下语句比较三个字符串,并为每个字符串返回相同的 SOUNDEX 值,因为即使它们的拼写和含义完全不同,但它们以发音相似的音节开头:

SELECT SOUNDEX('I love rock and roll music.'),
    SOUNDEX('I love rocks and gemstones.'),
    SOUNDEX('I leave a rock wherever I go.');
+----------------------------------------+--------------------------+------------------------------------------+
| SOUNDEX('I LOVE ROCK AND ROLL MUSIC.') | SOUNDEX('I LOVE ROCKS.') | SOUNDEX('I LEAVE A ROCK WHEREVER I GO.') |
|----------------------------------------+--------------------------+------------------------------------------|
| I416                                   | I416                     | I416                                     |
+----------------------------------------+--------------------------+------------------------------------------+

示例

以下查询返回两个名称的 SOUNDEX 值,这两个名称拼写不同,但通常发音相似:

SELECT SOUNDEX('Marks'), SOUNDEX('Marx');
+------------------+-----------------+
| SOUNDEX('MARKS') | SOUNDEX('MARX') |
|------------------+-----------------|
| M620             | M620            |
+------------------+-----------------+

以下查询演示如何使用 SOUNDEX 在不同表中查找可能相关的行:

创建并加载表:

CREATE TABLE sounding_board (v VARCHAR);
CREATE TABLE sounding_bored (v VARCHAR);
INSERT INTO sounding_board (v) VALUES ('Marsha');
INSERT INTO sounding_bored (v) VALUES ('Marcia');

在不使用 SOUNDEX 的情况下查找相关记录:

SELECT * 
    FROM sounding_board AS board, sounding_bored AS bored 
    WHERE bored.v = board.v;
+---+---+
| V | V |
|---+---|
+---+---+

使用 SOUNDEX 查找相关记录:

SELECT * 
    FROM sounding_board AS board, sounding_bored AS bored 
    WHERE SOUNDEX(bored.v) = SOUNDEX(board.v);
+--------+--------+
| V      | V      |
|--------+--------|
| Marsha | Marcia |
+--------+--------+