Working with Subqueries¶
A subquery is a query within another query. Subqueries in a FROM or WHERE clause are used to provide data that will be used to limit or compare/evaluate the data returned by the containing query.
Types of Subqueries¶
Correlated vs. Uncorrelated Subqueries¶
Subqueries can be categorized as correlated or uncorrelated:
- A correlated subquery refers to one or more columns from outside of
the subquery. (The columns are typically referenced inside the
WHEREclause of the subquery.) A correlated subquery can be thought of as a filter on the table that it refers to, as if the subquery were evaluated on each row of the table in the outer query. - An uncorrelated subquery has no such external column references. It is an independent query, the results of which are returned to and used by the outer query once (not per row).
For example:
Scalar vs. Non-scalar Subqueries¶
Subqueries can also be categorized as scalar or non-scalar:
- A scalar subquery returns a single value (one column of one row). If no rows qualify to be returned, the subquery returns NULL.
- A non-scalar subquery returns 0, 1, or multiple rows, each of which may contain 1 or multiple columns. For each column, if there is no value to return, the subquery returns NULL. If no rows qualify to be returned, the subquery returns 0 rows (not NULLs).
Types Supported by Snowflake¶
Snowflake currently supports the following types of subqueries:
Subquery Operators¶
Subquery operators operate on nested query expressions. They can be used to compute values that are:
Differences Between Correlated and Non-Correlated Subqueries¶
The following query demonstrates an uncorrelated subquery in a WHERE clause. The subquery gets the per capita GDP of Brazil, and the outer query selects all the jobs (in any country) that pay less than the per-capita GDP of Brazil. The subquery is uncorrelated because the value that it returns does not depend upon any column of the outer query. The subquery only needs to be called once during the entire execution of the outer query.
The next query demonstrates a correlated subquery in a WHERE clause.
The query lists jobs where the annual pay of the job is less than the
per-capita GDP in that country.
This subquery is correlated because it is called once for each row in the
outer query and is passed a value, p.country (country name), from the row.
Note
The MAX aggregate function is not logically necessary in this case because the
international_GDP table has only one row per country; however, because the server doesn’t know that, and because the server
requires that the subquery return no more than one row, the query uses the aggregate function to force the server to recognize that the
subquery will return only one row each time that the subquery is executed.
The functions MIN and AVG also work because applying either of these to a single value returns that value unchanged.
Scalar Subqueries¶
A scalar subquery is a subquery that returns at most one row. A scalar subquery can appear anywhere that a value expression can appear, including the SELECT list, GROUP BY clause, or as an argument to a function in a WHERE or HAVING clause.
Usage Notes¶
- A scalar subquery can contain only one item in the SELECT list.
- If a scalar subquery returns more than one row, a runtime error is generated.
- Correlated scalar subqueries are currently supported only if they can be statically determined to return one row (e.g. if the SELECT list contains an aggregate function with no GROUP BY).
- Uncorrelated scalar subqueries are supported anywhere that a value expression is allowed.
- Subqueries with a correlation inside of FLATTEN are currently unsupported.
- The LIMIT / FETCH clause is allowed only in uncorrelated scalar subqueries.
Examples¶
This example shows a basic uncorrelated subquery in a WHERE clause:
This example shows an uncorrelated subquery in a FROM clause; this basic subquery
returns a subset of the information in the international_GDP table.
The overall query lists jobs in “high-wage” countries where the annual pay
of the job is the same as the per_capita_GDP in that country.
Limitations¶
Although subqueries can contain a wide range of SELECT statements, they have the following limitations:
- Some clauses are not allowed inside of ANY/ALL/NOT EXISTS subqueries.
- The only type of subquery that allows a LIMIT / FETCH clause is an uncorrelated scalar subquery. Also, because an uncorrelated scalar subquery returns only 1 row, the LIMIT clause has little or no practical value inside a subquery.