表设计注意事项

本主题提供设计和管理表时的最佳实践、一般准则和重要注意事项。

列的日期/时间数据类型

When defining columns to contain dates or timestamps, Snowflake recommends choosing a date or timestamp data type rather than a character data type. Snowflake stores DATE and TIMESTAMP data more efficiently than VARCHAR, resulting in better query performance. Choose an appropriate date or timestamp data type, depending on the level of granularity required.

参照完整性约束

When they are created on standard tables, referential integrity constraints, as defined by primary-key/foreign-key relationships, are informational; they are not enforced. NOT NULL constraints are enforced, but other constraints are not. However, constraints on hybrid tables are enforced; see Overview of constraints.

一般来说,约束会提供宝贵的元数据。主键和外键使您的项目团队能够理解架构设计,并查看表及其列之间的关系。

此外,大多数商业智能 (BI) 和可视化工具将外键定义与表一起导入,并构建适当的联接条件。这种方法可以节省时间,而且相较于不得不猜测如何联接表并手动配置工具,更不容易出错。基于主键和外键的联接也有助于确保设计的完整性,因为联接不会留给不同的开发者去解释。一些 BI 和可视化工具还利用约束信息更高效地重写查询,例如使用联接消除。

Specify a constraint when creating or modifying a table using the CREATE | ALTER TABLE … CONSTRAINT commands.

In the following example, the CREATE TABLE statement for the second table (salesorders) defines an out-of-line foreign key constraint that references a column in the first table (salespeople):

CREATE OR REPLACE TABLE salespeople (
  sp_id INT NOT NULL UNIQUE,
  name VARCHAR DEFAULT NULL,
  region VARCHAR,
  constraint pk_sp_id PRIMARY KEY (sp_id)
);
CREATE OR REPLACE TABLE salesorders (
  order_id INT NOT NULL UNIQUE,
  quantity INT DEFAULT NULL,
  description VARCHAR,
  sp_id INT NOT NULL UNIQUE,
  constraint pk_order_id PRIMARY KEY (order_id),
  constraint fk_sp_id FOREIGN KEY (sp_id) REFERENCES salespeople(sp_id)
);

Query the GET_DDL function to retrieve a DDL statement that could be executed to recreate the specified table. The statement includes the constraints currently set on a table.

例如:

SELECT GET_DDL('TABLE', 'mydb.public.salesorders');
+-----------------------------------------------------------------------------------------------------+
| GET_DDL('TABLE', 'MYDB.PUBLIC.SALESORDERS')                                                         |
%-----------------------------------------------------------------------------------------------------%
| create or replace TABLE SALESORDERS (                                                               |
|   ORDER_ID NUMBER(38,0) NOT NULL,                                                                   |
|   QUANTITY NUMBER(38,0),                                                                            |
|   DESCRIPTION VARCHAR(16777216),                                                                    |
|   SP_ID NUMBER(38,0) NOT NULL,                                                                      |
|   unique (SP_ID),                                                                                   |
|   constraint PK_ORDER_ID primary key (ORDER_ID),                                                    |
|   constraint FK_SP_ID foreign key (SP_ID) references MYDATABASE.PUBLIC.SALESPEOPLE(SP_ID)           |
| );                                                                                                  |
+-----------------------------------------------------------------------------------------------------+

Alternatively, retrieve a list of all table constraints by schema (or across all schemas in a database) by querying the TABLE_CONSTRAINTS view in the Information Schema.

例如:

SELECT table_name, constraint_type, constraint_name
  FROM mydb.INFORMATION_SCHEMA.TABLE_CONSTRAINTS
  WHERE constraint_schema = 'PUBLIC'
  ORDER BY table_name;
+-------------+-----------------+-----------------------------------------------------+
| TABLE_NAME  | CONSTRAINT_TYPE | CONSTRAINT_NAME                                     |
|-------------+-----------------+-----------------------------------------------------|
| SALESORDERS | UNIQUE          | SYS_CONSTRAINT_fce2257e-c343-4e66-9bea-fc1c041b00a6 |
| SALESORDERS | FOREIGN KEY     | FK_SP_ID                                            |
| SALESORDERS | PRIMARY KEY     | PK_ORDER_ID                                         |
| SALESORDERS | UNIQUE          | SYS_CONSTRAINT_bf90e2b3-fd4a-4764-9576-88fb487fe989 |
| SALESPEOPLE | PRIMARY KEY     | PK_SP_ID                                            |
+-------------+-----------------+-----------------------------------------------------+

何时设置群集密钥

Specifying a clustering key is not necessary for most tables. Snowflake performs automatic tuning via the optimization engine and micro-partitioning. In many cases, data is loaded and organized into micro-partitions by date or timestamp, and is queried along the same dimension.

何时应为表指定群集密钥?首先要注意的是,对小型表进行聚类通常不会显著提高查询性能。

对于较大的数据集,在以下情况下,可以考虑为表指定群集密钥:

  • The order in which the data is loaded doesn’t match the dimension by which it is most commonly queried (for example, the data is loaded by date, but reports filter the data by ID). If your existing scripts or reports query the data by both date and ID (and potentially a third or fourth column), you might see some performance improvement by creating a multi-column clustering key.
  • Query Profile indicates that a significant percentage of the total duration time for typical queries against the table is spent scanning. This applies to queries that filter on one or more specific columns.

请注意,重聚类会以不同的顺序重写现有数据。之前的排序将存储 7 天,以提供故障安全保护。对表进行重聚类会产生与重新排序的数据大小相关的计算成本。

For more information, see Automatic Clustering.

何时指定列长度

Snowflake compresses column data effectively; therefore, creating columns larger than necessary has minimal impact on the size of data tables. Likewise, there is no query performance difference between a column with a maximum length declaration (for example, VARCHAR(134217728)), and a smaller precision.

但是,当列数据的大小可预测时,Snowflake 建议定义适当的列长度,原因如下:

  • Data loading operations are more likely to detect issues such as columns loaded out of order; for example, a 50-character string loaded erroneously into a VARCHAR(10) column. Such issues produce errors.
  • When the column length is unspecified, some third-party tools might anticipate consuming the maximum size value, which can translate into increased client-side memory usage or unusual behavior.

在 VARIANT 列中存储半结构化数据与展平嵌套结构

如果还不确定要对半结构化数据执行哪种类型的操作,Snowflake 建议暂时将数据存储在 VARIANT 列中。对于大多数常规数据,并且只使用本机数据类型(字符串和整数),对关系数据和 VARIANT 列中的数据的操作的存储要求和查询性能非常相似。

为了更好地削减和减少存储消耗,如果半结构化数据包含以下内容,Snowflake 建议对象和关键数据平展到单独的关系列中:

  • 日期和时间戳,尤其是非 ISO 8601 日期和时间戳,作为字符串值
  • 字符串中的数字
  • 数组

非本机值(如日期和时间戳)在加载到 VARIANT 列中时存储为字符串,因此对这些值的操作可能比存储在具有相应数据类型的关系列中时更慢,并且占用更多空间。

如果您知道数据的用例,请对典型数据集执行测试。将数据集加载到表中的 VARIANT 列中。使用 FLATTEN 函数将计划查询的对象和键提取到单独的表中。对这两个表运行一组典型的查询,查看哪种结构可提供最佳性能。

将永久表转换为瞬态表或反之

Currently, it is not possible to change a permanent table to a transient table using the ALTER TABLE command. The TRANSIENT property is set at table creation and cannot be modified.

同样,也无法直接将瞬态表更改为永久表。

要将现有的永久表转换为瞬态表(或反之),同时保留数据和其他特征(如列默认值和授予的权限),可以使用以下示例中描述的界面之一创建一个新表:

使用 CREATE TABLE 命令的 COPY GRANTS 子句:

CREATE TRANSIENT TABLE my_new_table LIKE my_old_table COPY GRANTS;

然后使用 INSERT 命令复制数据:

INSERT INTO my_new_table SELECT * FROM my_old_table;

如果要保留所有数据,但不保留授予的权限和其他特征,可以使用以下界面中的一个:

Use a CREATE TABLE AS SELECT (CTAS) statement:

CREATE TRANSIENT TABLE my_transient_table AS SELECT * FROM mytable;

另一种复制表的方法(但将生命周期从永久更改为瞬态)是使用以下界面之一克隆表:

使用 CREATE TABLE 命令的 CLONE 子句:

CREATE TRANSIENT TABLE foo CLONE bar COPY GRANTS;

旧分区 受影响(它们不会成为瞬态),但克隆中添加的新分区将遵循瞬态生命周期。

不能将瞬态表克隆到永久表。