Sample data: TPC-H¶
如 TPC Benchmark™ H (TPC-H) (http://www.tpc.org/tpch/) 规范中所述:
“TPC-H 是一项决策支持基准。它由一套面向业务的临时查询和并发数据修改组成。查询和用于填充数据库的数据被选为具有广泛的行业相关性。这项基准说明了决策支持系统,它可以检查大量数据,执行高度复杂的查询,并回答关键的业务问题。”
Database and schemas¶
TPC-H 提供了各种数据集大小来测试不同的缩放因子。出于演示目的,我们共享了四个版本的 TPC-H 数据。在 SNOWFLAKE_SAMPLE_DATA 共享数据库中,按下列架构提供数据:
TPCH_SF1:由基行大小(几百万个元素)组成。
TPCH_SF10:由基行大小 x 10 组成。
TPCH_SF100:由基行大小 x 100(数亿个元素)组成。
TPCH_SF1000:由基行大小 x 1000(数十亿个元素)组成。
Database entities, relationships, and characteristics¶
TPC-H 的组件由八个单独的表(基表)组成。以下 ER 图说明了这些表中各列之间的关系:
(来源:TPC Benchmark H 标准规范 (http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.1.pdf))
Query definitions¶
Each TPC-H query asks a business question and includes the corresponding query to answer the question. Some of the TPC-H queries are included in Snowflake's Get Started tutorials.
This section describes one of the queries. For more information about TPC-H and all the queries that are involved, see the official TPC Benchmark H Standard Specification (http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.1.pdf).
Q1: Pricing summary report query¶
此查询报告已开票、发货和退货的业务量。
Business question¶
The Pricing Summary Report Query provides a summary pricing report for all line items that were shipped as of a given date. The date is within 60-120 days of the greatest ship date contained in the database.
Functional query definition¶
该查询列出扩展价格、折扣扩展价格、折扣扩展价格加税、平均数量、平均扩展价格和平均折扣的合计。这些汇总按 RETURNFLAG 和 LINESTATUS 分组,并按 RETURNFLAG 和 LINESTATUS 的升序列出。包括每组中的行项目数量计数:
use schema snowflake_sample_data.tpch_sf1; -- or snowflake_sample_data.{tpch_sf10 | tpch_sf100 | tpch_sf1000} select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1-l_discount)) as sum_disc_price, sum(l_extendedprice * (1-l_discount) * (1+l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= dateadd(day, -90, to_date('1998-12-01')) group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus;