Sample data: TPC-H

如 TPC Benchmark™ H (TPC-H) (http://www.tpc.org/tpch/) 规范中所述:

“TPC-H 是一项决策支持基准。它由一套面向业务的临时查询和并发数据修改组成。查询和用于填充数据库的数据被选为具有广泛的行业相关性。这项基准说明了决策支持系统,它可以检查大量数据,执行高度复杂的查询,并回答关键的业务问题。”

Database and schemas

TPC-H 提供了各种数据集大小来测试不同的缩放因子。出于演示目的,我们共享了四个版本的 TPC-H 数据。在 SNOWFLAKE_SAMPLE_DATA 共享数据库中,按下列架构提供数据:

  • TPCH_SF1:由基行大小(几百万个元素)组成。

  • TPCH_SF10:由基行大小 x 10 组成。

  • TPCH_SF100:由基行大小 x 100(数亿个元素)组成。

  • TPCH_SF1000:由基行大小 x 1000(数十亿个元素)组成。

Database entities, relationships, and characteristics

TPC-H 的组件由八个单独的表(基表)组成。以下 ER 图说明了这些表中各列之间的关系:

TPC-H 基准测试数据架构

(来源:TPC Benchmark H 标准规范 (http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.1.pdf))

Query definitions

Each TPC-H query asks a business question and includes the corresponding query to answer the question. Some of the TPC-H queries are included in Snowflake's Get Started tutorials.

This section describes one of the queries. For more information about TPC-H and all the queries that are involved, see the official TPC Benchmark H Standard Specification (http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.1.pdf).

Q1: Pricing summary report query

此查询报告已开票、发货和退货的业务量。

Business question

The Pricing Summary Report Query provides a summary pricing report for all line items that were shipped as of a given date. The date is within 60-120 days of the greatest ship date contained in the database.

Functional query definition

该查询列出扩展价格、折扣扩展价格、折扣扩展价格加税、平均数量、平均扩展价格和平均折扣的合计。这些汇总按 RETURNFLAG 和 LINESTATUS 分组,并按 RETURNFLAG 和 LINESTATUS 的升序列出。包括每组中的行项目数量计数:

use schema snowflake_sample_data.tpch_sf1;   -- or snowflake_sample_data.{tpch_sf10 | tpch_sf100 | tpch_sf1000}

select
       l_returnflag,
       l_linestatus,
       sum(l_quantity) as sum_qty,
       sum(l_extendedprice) as sum_base_price,
       sum(l_extendedprice * (1-l_discount)) as sum_disc_price,
       sum(l_extendedprice * (1-l_discount) * (1+l_tax)) as sum_charge,
       avg(l_quantity) as avg_qty,
       avg(l_extendedprice) as avg_price,
       avg(l_discount) as avg_disc,
       count(*) as count_order
 from
       lineitem
 where
       l_shipdate <= dateadd(day, -90, to_date('1998-12-01'))
 group by
       l_returnflag,
       l_linestatus
 order by
       l_returnflag,
       l_linestatus;
Copy
语言: 中文