集运算符¶

集运算符将多个查询块的中间结果组合成单个结果集。

通用语法¶

[ ( ] <query> [ ) ]
{
  INTERSECT |
  { MINUS | EXCEPT } |
  UNION [ { DISTINCT | ALL } ] [ BY NAME ]
}
[ ( ] <query> [ ) ]
[ ORDER BY ... ]
[ LIMIT ... ]

Copy

一般使用说明¶

每个查询本身可以包含查询运算符，因此您可以将多个查询表达式与集合运算符组合在一起。
可以将 ORDER BY 和 LIMIT / FETCH 子句应用于集合运算符的结果。
使用这些运算符时：
- 确保每个查询选择相同数量的列，包含 UNION BY NAME 或 UNION ALL BY NAME 的查询除外。
- 确保在来自不同源的行之间，每列的数据类型均保持一致。在使用 UNION 运算符和转换不匹配的数据类型部分中，一个示例说明了数据类型不匹配时的潜在问题和解决方案。
- 通常，请确保列的“含义”和数据类型相匹配。以下使用 UNION ALL 运算符的查询不会产生预期的结果：
  SELECT LastName, FirstName FROM employees UNION ALL SELECT FirstName, LastName FROM contractors;
  Copy
  使用星号指定表的所有列时，出错风险会增加，例如：
  SELECT * FROM table1 UNION ALL SELECT * FROM table2;
  Copy
  如果表的列数相同，但列的顺序不同，则使用这些运算符时，查询结果可能会不正确。
  
  在此场景中，UNION BY NAME 和 UNION ALL BY NAME 运算符是例外。例如，以下查询返回正确的结果：
  SELECT LastName, FirstName FROM employees UNION ALL BY NAME SELECT FirstName, LastName FROM contractors;
  Copy
- 输出列的名称基于第一个查询的列名称。例如，请考虑以下查询：
  SELECT LastName, FirstName FROM employees UNION ALL SELECT FirstName, LastName FROM contractors;
  Copy
  此查询的行为与以下查询类似：
  SELECT LastName, FirstName FROM employees UNION ALL SELECT FirstName AS LastName, LastName AS FirstName FROM contractors;
  Copy
集运算符的优先级与 ANSI 和 ISO SQL 标准相符：
- UNION [ALL] 和 MINUS (EXCEPT) 运算符具有同等优先级。
- INTERSECT 运算符的优先级高于 UNION [ALL] 和 MINUS (EXCEPT)。
Snowflake 从左到右处理优先级相同的运算符。

您可以使用括号来强制改变表达式的计算顺序。

并非所有数据库供应商均遵循 ANSI/ISO 标准来确定集运算符的优先级。Snowflake 建议使用括号来指定计算顺序，尤其是在将代码从其他供应商移植到 Snowflake 时，或者是编写可能在其他数据库和 Snowflake 上执行的代码时。

示例的示例表¶

本主题中的某些示例使用以下示例表。两个表都有邮政编码列。一个表记录每个销售办事处的邮政编码，另一个表记录每个客户的邮政编码。

CREATE OR REPLACE TABLE sales_office_postal_example(
  office_name VARCHAR,
  postal_code VARCHAR);

INSERT INTO sales_office_postal_example VALUES ('sales1', '94061');
INSERT INTO sales_office_postal_example VALUES ('sales2', '94070');
INSERT INTO sales_office_postal_example VALUES ('sales3', '98116');
INSERT INTO sales_office_postal_example VALUES ('sales4', '98005');

CREATE OR REPLACE TABLE customer_postal_example(
  customer VARCHAR,
  postal_code VARCHAR);

INSERT INTO customer_postal_example VALUES ('customer1', '94066');
INSERT INTO customer_postal_example VALUES ('customer2', '94061');
INSERT INTO customer_postal_example VALUES ('customer3', '98444');
INSERT INTO customer_postal_example VALUES ('customer4', '98005');

Copy

INTERSECT¶

返回一个查询的结果集中同样出现在另一个查询的结果集中的行，并消除重复项。

语法¶

[ ( ] <query> [ ) ]
INTERSECT
[ ( ] <query> [ ) ]

Copy

INTERSECT 运算符示例¶

要在 sales_office_postal_example 表和 customer_postal_example 表中查找邮政编码，请查询示例表：

SELECT postal_code FROM sales_office_postal_example
INTERSECT
SELECT postal_code FROM customer_postal_example
ORDER BY postal_code;

Copy

+-------------+
| POSTAL_CODE |
|-------------|
| 94061       |
| 98005       |
+-------------+

MINUS、EXCEPT¶

返回第一个查询返回的行不是第二个查询也返回的行。

MINUS 和 EXCEPT 关键字具有相同含义，可以互换使用。

语法¶

[ ( ] <query> [ ) ]
MINUS
[ ( ] <query> [ ) ]

[ ( ] <query> [ ) ]
EXCEPT
[ ( ] <query> [ ) ]

Copy

MINUS 运算符示例¶

查询示例表以查找在 sales_office_postal_example 表中却不在 customer_postal_example 表中的邮政编码：

SELECT postal_code FROM sales_office_postal_example
MINUS
SELECT postal_code FROM customer_postal_example
ORDER BY postal_code;

Copy

+-------------+
| POSTAL_CODE |
|-------------|
| 94070       |
| 98116       |
+-------------+

查询示例表以查找在 customer_postal_example 表中却不在 sales_office_postal_example 表中的邮政编码：

SELECT postal_code FROM customer_postal_example
MINUS
SELECT postal_code FROM sales_office_postal_example
ORDER BY postal_code;

Copy

+-------------+
| POSTAL_CODE |
|-------------|
| 94066       |
| 98444       |
+-------------+

UNION [ { DISTINCT | ALL } ] [ BY NAME ]¶

合并两个查询的结果集：

UNION [ DISTINCT ] 按列位置合并行，并消除重复项。
UNION ALL 按列位置合并行，不消除重复项。
UNION [ DISTINCT ] BY NAME 按列名合并行，并消除重复项。
UNION ALL BY NAME 按列名合并行，不消除重复项。

默认值为 UNION DISTINCT（即按列位置合并行，并消除重复项）。DISTINCT 关键字是可选的。DISTINCT 关键字和 ALL 关键字互斥。

当要合并的表中的列位置匹配时，使用 UNION 或 UNION ALL。对于以下用例，请使用 UNION BY NAME 或 UNION ALL BY NAME：

要合并的表中的列具有不同的顺序。
要合并的表具有不断演化的架构，其中列是新增或重新排序的。
您希望合并的列子集在各个表中位置不同。

语法¶

[ ( ] <query> [ ) ]
UNION [ { DISTINCT | ALL } ] [ BY NAME ]
[ ( ] <query> [ ) ]

Copy

BY NAME 子句的使用说明¶

除了通用的使用说明外，以下内容适用于 UNION BY NAME 和 UNION ALL BY NAME：

具有相同标识符的列将进行匹配并合并。对于未加引号的标识符，匹配时不区分大小写；对于加了引号的标识符，匹配时区分大小写。
输入的列数不需要相同。如果某列存在于一个输入中，但不存在于另一个输入中，那么在合并结果集中，该列将以 NULL 值填充那些缺失的行。
合并结果集中列的顺序由从左到右首次出现的唯一列顺序决定。

UNION 运算符示例¶

以下示例使用 UNION 运算符：

按列位置合并两个查询的结果
按列名合并两个查询的结果
使用别名将两个具有不同列名的查询的结果合并在一起
使用 UNION 运算符和转换不匹配的数据类型

按列位置合并两个查询的结果¶

要根据列的位置合并两个针对示例表的查询结果集，可以使用 UNION 运算符：

SELECT office_name office_or_customer, postal_code FROM sales_office_postal_example
UNION
SELECT customer, postal_code FROM customer_postal_example
ORDER BY postal_code;

Copy

+--------------------+-------------+
| OFFICE_OR_CUSTOMER | POSTAL_CODE |
|--------------------+-------------|
| sales1             | 94061       |
| customer2          | 94061       |
| customer1          | 94066       |
| sales2             | 94070       |
| sales4             | 98005       |
| customer4          | 98005       |
| sales3             | 98116       |
| customer3          | 98444       |
+--------------------+-------------+

按列名合并两个查询的结果¶

使用不同的列顺序创建两个表并插入数据：

CREATE OR REPLACE TABLE union_demo_column_order1 (
  a INTEGER,
  b VARCHAR);

INSERT INTO union_demo_column_order1 VALUES
  (1, 'one'),
  (2, 'two'),
  (3, 'three');

CREATE OR REPLACE TABLE union_demo_column_order2 (
  B VARCHAR,
  A INTEGER);

INSERT INTO union_demo_column_order2 VALUES
  ('three', 3),
  ('four', 4);

Copy

要按列名合并两个查询中的结果集，请使用 UNION BY NAME 运算符：

SELECT * FROM union_demo_column_order1
UNION BY NAME
SELECT * FROM union_demo_column_order2
ORDER BY a;

Copy

+---+-------+
| A | B     |
|---+-------|
| 1 | one   |
| 2 | two   |
| 3 | three |
| 4 | four  |
+---+-------+

输出显示查询消除了重复行（即 A 列为 3、B 列为 three 的那一行）。

要合并表而不消除重复行，请使用 UNION ALL BY NAME 运算符：

SELECT * FROM union_demo_column_order1
UNION ALL BY NAME
SELECT * FROM union_demo_column_order2
ORDER BY a;

Copy

+---+-------+
| A | B     |
|---+-------|
| 1 | one   |
| 2 | two   |
| 3 | three |
| 3 | three |
| 4 | four  |
+---+-------+

请注意，两个表中列名的大小写不匹配。union_demo_column_order1 表中的列名是小写的，union_demo_column_order2 表中的列名是大写的。如果在列名前后使用引号运行查询，则会返回错误，因为带引号的标识符的匹配区分大小写。例如，以下查询在列名前后加上了引号：

SELECT 'a', 'b' FROM union_demo_column_order1
UNION ALL BY NAME
SELECT 'B', 'A' FROM union_demo_column_order2
ORDER BY a;

Copy

000904 (42000): SQL compilation error: error line 4 at position 9
invalid identifier 'A'

使用别名将两个具有不同列名的查询的结果合并在一起¶

当您使用 UNION BY NAME 运算符按列名合并两个示例表的查询结果时，由于列名不匹配，结果集中的行会显示 NULL 值。

SELECT office_name, postal_code FROM sales_office_postal_example
UNION BY NAME
SELECT customer, postal_code FROM customer_postal_example
ORDER BY postal_code;

Copy

+-------------+-------------+-----------+
| OFFICE_NAME | POSTAL_CODE | CUSTOMER  |
|-------------+-------------+-----------|
| sales1      | 94061       | NULL      |
| NULL        | 94061       | customer2 |
| NULL        | 94066       | customer1 |
| sales2      | 94070       | NULL      |
| sales4      | 98005       | NULL      |
| NULL        | 98005       | customer4 |
| sales3      | 98116       | NULL      |
| NULL        | 98444       | customer3 |
+-------------+-------------+-----------+

输出显示，具有不同标识符的列不会被合并，而只存在于某一个表中的列在结果集中会显示 NULL 值。由于 postal_code 列在两个表中都存在，因此在输出中 postal_code 列没有 NULL 值。

以下查询使用别名 office_or_customer，以便在查询执行期间，具有不同名称的列被统一为相同的列名：

SELECT office_name AS office_or_customer, postal_code FROM sales_office_postal_example
UNION BY NAME
SELECT customer AS office_or_customer, postal_code FROM customer_postal_example
ORDER BY postal_code;

Copy

+--------------------+-------------+
| OFFICE_OR_CUSTOMER | POSTAL_CODE |
|--------------------+-------------|
| sales1             | 94061       |
| customer2          | 94061       |
| customer1          | 94066       |
| sales2             | 94070       |
| sales4             | 98005       |
| customer4          | 98005       |
| sales3             | 98116       |
| customer3          | 98444       |
+--------------------+-------------+

使用 UNION 运算符和转换不匹配的数据类型¶

本示例演示了数据类型不匹配时使用 UNION 运算符的潜在问题，然后提供了解决方案。

首先创建表并插入一些数据：

CREATE OR REPLACE TABLE union_test1 (v VARCHAR);
CREATE OR REPLACE TABLE union_test2 (i INTEGER);

INSERT INTO union_test1 (v) VALUES ('Smith, Jane');
INSERT INTO union_test2 (i) VALUES (42);

Copy

执行按列位置进行的 UNION 操作，两个表中对应列的数据类型不同（union_test1 中为 VARCHAR 值，union_test2 中为 INTEGER 值）：

SELECT v FROM union_test1
UNION
SELECT i FROM union_test2;

Copy

此查询会返回错误：

100038 (22018): Numeric value 'Smith, Jane' is not recognized

现在使用显式类型转换，将输入转换为兼容类型：

SELECT v::VARCHAR FROM union_test1
UNION
SELECT i::VARCHAR FROM union_test2;

Copy

+-------------+
| V::VARCHAR  |
|-------------|
| Smith, Jane |
| 42          |
+-------------+