类别：: 查询语法

RESAMPLE¶

Returns a data set that includes both input rows and generated rows for missing data points, given a user-defined time-based granularity.

语法¶

FROM <object_reference> [ [ AS ] <alias_name> ]
  RESAMPLE(
    USING <time_series_column>
    INCREMENT BY <time_series_constant>
    [ PARTITION BY <partition_column> [ , ... ] ]
    [ METADATA_COLUMNS
        { IS_GENERATED() | BUCKET_START() } [ [ AS ] <alias_name> ] [ , ... ] ]
    )

必填参数¶

FROM object_reference

Specifies the name of a table or another object reference that contains the input data set, such as a subquery. For more information about object references, see FROM.

USING time_series_column

Specifies the column that contains time-based values in the time series. The column must be a date and time data type or a numeric data type. For example, UNIX timestamp values might be stored in NUMBER(38,0) columns, where 1743447600 is equivalent to 2025-3-31 12:00:00.

INCREMENT BY time_series_constant

Specifies an INTERVAL constant or a numeric constant, depending on the data type of the USING column. This constant represents the width of each time interval. The slices are aligned relative to midnight on January 1, 1970 (1970-01-01 00:00:00). The TIME_SLICE function uses the same alignment; for more information, see the TIME_SLICE usage notes.

When the USING parameter specifies a date or time column, the INCREMENT BY expression must be an INTERVAL constant.
When the USING parameter specifies a numeric column, the INCREMENT BY expression must also be numeric.

The starting point for a generated time series is based on the minimum time of the time_series_constant.

If this constant is a numeric constant, it must be positive (greater than 0).

可选参数¶

[ AS ] alias_name

Specifies an alternative name for the object reference. The alias can be used in any other subclause within the FROM clause. Alias names must follow the rules for 对象标识符.

PARTITION BY partition_column

Partitions the result set on one or more input columns and generates new rows within each partition.

METADATA_COLUMNS {function} [ [ AS ] {alias_name} ]

Adds one or more metadata columns to the resampled result set. To add the columns, call one or both of the following functions:

IS_GENERATED(): Adds an is_generated column to the result set that marks which rows are new (generated by the RESAMPLE operation) and which rows already existed.
BUCKET_START(): Adds a bucket_start column to the result set. This column returns the value that marks the beginning of the current bucket or interval that the RESAMPLE operation produces, based on the values in the column specified in the USING clause. You can use the BUCKET_START column to identify which interval a particular row belongs to after resampling.

If you specify both metadata columns, separate them with a comma.

Generated columns can have aliases. Alias names must follow the rules for 对象标识符.

使用说明¶

An INTERVAL constant in the INCREMENT BY clause has the following requirements:
- The constant must be equal to or greater than one second. Smaller units (millisecond, microsecond, nanosecond) aren't supported.
- When the USING column is a DATE data type, you can't specify a unit in the interval that is more granular than day (hour, minute, second). For example, the constants 'INTERVAL 1 day, 2 hours' and 'INTERVAL 25 hours' aren't allowed.
- To avoid ambiguity, certain date parts can't be mixed. The supported date parts fall into three discrete groups:
  - year, quarter, month
  - week
  - day, hour, minute, second
  For example, the following intervals, which cross these group boundaries, aren't allowed:
  - INTERVAL '1 week, 3 days'
  - INTERVAL '2 weeks, 12 hours'
  - INTERVAL '3 months, 1 week'

With respect to joins, the RESAMPLE construct works in a similar way to the SAMPLE / TABLESAMPLE construct. Resampling applies to only one table, not all preceding tables or the entire expression prior to the RESAMPLE clause. To resample the result of a join, use a subquery for the join, then resample the resulting table. See 联接采样.
The RESAMPLE clause is evaluated before WHERE clause conditions are applied. If you want to resample a filtered data set, filter it first (for example, by creating a new table that you can resample or by using a subquery that is computed first inside the main RESAMPLE query). The following query resamples the whole table, then discards everything but the rows for Atlanta and Boston.
```
SELECT *
  FROM heavy_weather
    RESAMPLE(
      USING start_time
      INCREMENT BY INTERVAL '1 day')
  WHERE city IN('Atlanta','Boston')
  ORDER BY start_time, city, county;
```
A potential rewrite with a subquery would be:
```
SELECT *
  FROM (SELECT * FROM heavy_weather WHERE city IN('Atlanta','Boston'))
    RESAMPLE(
      USING start_time
      INCREMENT BY INTERVAL '1 day')
  ORDER BY start_time, city, county;
```
When resampled rows are generated, they contain NULL values for columns that aren't partitioned. If you use a WHERE clause to filter columns that aren't partitioned, your filter might be applied to generated rows in unexpected ways.

For example, because the following query does not have a PARTITION BY clause, it does not work as expected. In the generated rows, city and county are NULL. The WHERE clause then filters out all generated rows; only the original rows that match the filter criteria are preserved.
```
SELECT *
  FROM march_temps
    RESAMPLE(
      USING observed
      INCREMENT BY INTERVAL '5 minutes')
  WHERE city = 'Big Bear City' AND county = 'San Bernardino'
  ORDER BY observed;
```
To solve the problem, the following query partitions by city and county. The RESAMPLE clause generates rows for combinations of city and county that exist in the source data. The WHERE clause condition preserves the appropriate generated rows because city and county have non-NULL values in those rows.
```
SELECT *
  FROM march_temps
    RESAMPLE(
      USING observed
      INCREMENT BY INTERVAL '5 minutes'
      PARTITION BY city, county)
  WHERE city = 'Big Bear City' AND county = 'San Bernardino'
  ORDER BY observed;
```
In comparison,

示例¶

The following examples show how to use the RESAMPLE construct in queries.

RESAMPLE example that uses a numeric column¶

The following example has a UNIX timestamp in the source table. This numeric column is specified in the RESAMPLE clause as the USING column. Create and load the following table:

CREATE OR REPLACE TABLE sensor_data_unixtime (device_id VARCHAR(10), unixtime NUMBER(38,0), avg_temp NUMBER(6,4), vibration NUMBER (5,4), motor_rpm INT);

INSERT INTO sensor_data_unixtime VALUES
  ('DEVICE3', 1696150802, 36.1103, 0.4226, 1560),
  ('DEVICE3', 1696150803, 35.2987, 0.4326, 1561),
  ('DEVICE3', 1696150804, 40.0001, 0.3221, 1562),
  ('DEVICE3', 1696150805, 38.0422, 0.3333, 1589),
  ('DEVICE3', 1696150807, 33.1524, 0.4865, 1499),
  ('DEVICE3', 1696150808, 32.0422, 0.4221, 1498),
  ('DEVICE3', 1696150809, 31.1519, 0.4751, 1600),
  ('DEVICE3', 1696150810, 29.1524, 0.4639, 1605),
  ('DEVICE3', 1696150812, 35.2987, 0.4336, 1585),
  ('DEVICE3', 1696150813, 40.0000, 0.4226, 1560)
;

Now run the following RESAMPLE query:

SELECT * FROM sensor_data_unixtime
  RESAMPLE(USING unixtime INCREMENT BY 1) ORDER BY unixtime;

+-----------+------------+----------+-----------+-----------+
| DEVICE_ID |   UNIXTIME | AVG_TEMP | VIBRATION | MOTOR_RPM |
|-----------+------------+----------+-----------+-----------|
| DEVICE3   | 1696150802 |  36.1103 |    0.4226 |      1560 |
| DEVICE3   | 1696150803 |  35.2987 |    0.4326 |      1561 |
| DEVICE3   | 1696150804 |  40.0001 |    0.3221 |      1562 |
| DEVICE3   | 1696150805 |  38.0422 |    0.3333 |      1589 |
| DEVICE3   | 1696150806 |     NULL |      NULL |      NULL |
| DEVICE3   | 1696150807 |  33.1524 |    0.4865 |      1499 |
| DEVICE3   | 1696150808 |  32.0422 |    0.4221 |      1498 |
| DEVICE3   | 1696150809 |  31.1519 |    0.4751 |      1600 |
| DEVICE3   | 1696150810 |  29.1524 |    0.4639 |      1605 |
| DEVICE3   | 1696150811 |     NULL |      NULL |      NULL |
| DEVICE3   | 1696150812 |  35.2987 |    0.4336 |      1585 |
| DEVICE3   | 1696150813 |  40.0000 |    0.4226 |      1560 |
+-----------+------------+----------+-----------+-----------+

The following query fails because the INCREMENT BY expression must be a positive numeric constant when the USING column is numeric:

SELECT * FROM sensor_data_unixtime
  RESAMPLE(USING unixtime INCREMENT BY INTERVAL '1 second') ORDER BY unixtime;

009954 (42601): SQL compilation error:
RESAMPLE INCREMENT BY has to be numeric type when USING parameter is numeric.

RESAMPLE example that returns generated rows only¶

The following example resamples the march_temps table (as created in Using the RESAMPLE clause) and includes metadata columns named generated_row and bucket_start in the result:

CREATE OR REPLACE TABLE march_temps_every_five_mins AS
  SELECT * FROM march_temps
    RESAMPLE(
      USING observed
      INCREMENT BY INTERVAL '5 minutes'
      PARTITION BY city, county
      METADATA_COLUMNS IS_GENERATED() AS generated_row, BUCKET_START()
      )
  ORDER BY observed;

The following query returns only the generated rows from the march_temps_every_five_mins table:

SELECT * FROM march_temps_every_five_mins
  WHERE generated_row = 'True';

+-------------------------+-------------+------------------+----------------+---------------+-------------------------+
| OBSERVED                | TEMPERATURE | CITY             | COUNTY         | GENERATED_ROW | BUCKET_START            |
|-------------------------+-------------+------------------+----------------+---------------+-------------------------|
| 2025-03-15 09:45:00.000 |        NULL | Big Bear City    | San Bernardino | True          | 2025-03-15 09:45:00.000 |
| 2025-03-15 09:50:00.000 |        NULL | Big Bear City    | San Bernardino | True          | 2025-03-15 09:50:00.000 |
| 2025-03-15 10:00:00.000 |        NULL | South Lake Tahoe | El Dorado      | True          | 2025-03-15 10:00:00.000 |
| 2025-03-15 10:00:00.000 |        NULL | Big Bear City    | San Bernardino | True          | 2025-03-15 10:00:00.000 |
| 2025-03-15 10:05:00.000 |        NULL | South Lake Tahoe | El Dorado      | True          | 2025-03-15 10:05:00.000 |
| 2025-03-15 10:05:00.000 |        NULL | Big Bear City    | San Bernardino | True          | 2025-03-15 10:05:00.000 |
| 2025-03-15 10:15:00.000 |        NULL | Big Bear City    | San Bernardino | True          | 2025-03-15 10:15:00.000 |
+-------------------------+-------------+------------------+----------------+---------------+-------------------------+

RESAMPLE example that uses BUCKET_START() to aggregate resampled rows¶

The following example uses the bucket_start metadata column to aggregate resampled rows. The query counts the number of observations per city that have the same bucket start time, given a resampled result set that is incremented by a 1-day interval. To run this example, create the march_temps table, as described in Using the RESAMPLE clause.

SELECT bucket_start, county, COUNT(*)
  FROM march_temps
    RESAMPLE(
      USING observed
      INCREMENT BY INTERVAL '1 day'
      METADATA_COLUMNS IS_GENERATED(), BUCKET_START()
      )
  WHERE IS_GENERATED = 'False'
  GROUP BY bucket_start, county;

+-------------------------+----------------+----------+
| BUCKET_START            | COUNTY         | COUNT(*) |
|-------------------------+----------------+----------|
| 2025-03-15 00:00:00.000 | El Dorado      |        4 |
| 2025-03-15 00:00:00.000 | San Bernardino |        4 |
+-------------------------+----------------+----------+

RESAMPLE example that uses BUCKET_START() to filter out non-uniform rows¶

You can use the bucket_start metadata column to filter out non-uniform data from a resampled result set. For example:

SELECT *
  FROM march_temps
    RESAMPLE(
      USING observed
      INCREMENT BY INTERVAL '5 minutes'
      METADATA_COLUMNS BUCKET_START() AS bucket_first_row
      )
  WHERE observed = bucket_first_row
  ORDER BY observed;

This query resamples the table, then removes two original rows that don't conform to the 5-minute interval (those with values 09:49:00 and 10:18:00).