在 Qubole 中配置 Snowflake for Spark

要在 Qubole 中配置 Snowflake for Spark,只需将 Snowflake 添加为 Qubole 数据存储库即可。本主题分步说明如何使用 Qubole Data Service (QDS) UI 执行此任务。

Note

You can also use the QDS REST API to add Snowflake as a data store. For step-by-step instructions, see Adding a Snowflake Data Warehouse as a Data Store (http://docs.qubole.com/en/latest/partner-integration/snowflake-integration/add-a-snowflake-data-warehouse.html) (in the Qubole Documentation).

先决条件

  • 您必须是 QDS 系统管理员才能添加数据存储库。
  • 您必须拥有 Qubole Enterprise Edition 账户。
  • The role used in the connection needs USAGE and CREATE STAGE privileges on the schema that contains the table that you will read from or write to via Qubole.

为长时间运行的查询准备外部位置

If some of your jobs exceed 36 hours in length, consider preparing an external location to use to exchange data between Snowflake and Spark. For more information, see Preparing an External Location For Files.

在 QDS UI 中将 Snowflake 添加为数据存储库

  1. From the Home menu, click Explore.

  2. In the dropdown list on the Explore page, select + Add Data Store.

  3. 在以下字段中输入所需信息:

    • Data Store Name: Enter the name of the data store to be created.
    • Database Type: Select ‘Snowflake’.
    • Catalog Name: Enter the name of the Snowflake catalog.
    • Database Name: Enter the name of the database in Snowflake where the data is stored.
    • Warehouse Name: Enter the name of the Snowflake virtual warehouse to use for queries.
    • Host Address: Enter the base URL of your Snowflake account (e.g. myorganization-myaccount.snowflakecomputing.cn). See Configuring a client, driver, library, or third-party application to connect to Snowflake for details on specifying your account identifier in this URL.
    • Username: Enter the login name for your Snowflake user (used to connect to the host).
    • Password: Enter the password for your Snowflake user (used to connect to the host).

    Note that all the values are case-sensitive, except for Host Address.

  4. Click Save to create the data store.

对于要添加为数据存储库的每个 Snowflake 数据库,请重复这些步骤。或者,您可以编辑数据存储库以更改 Snowflake 数据库或数据存储库的任何其他属性(例如,更改用于查询的虚拟仓库)。

Note

After adding a Snowflake data store, restart the Spark cluster (if you are using an already-running Spark cluster). Restarting the Spark cluster installs the .jar files for the Snowflake Connector for Spark and the Snowflake JDBC Driver.

在 Qubole 中验证 Snowflake 数据存储库

To verify that the Snowflake data store was created and has been activated, click on the dropdown list in the upper-left of the Explore page. A green dot indicates that the data store has been activated.

You should also verify that the table explorer widget in the left pane of the Explore page displays all of the tables in the Snowflake database specified in the data store.

Qubole 中的查询下推

Spark 查询受益于 Snowflake 的自动查询下推优化,从而提高了性能。默认情况下,Qubole 中已启用 Snowflake 查询下推。

For more details about query pushdown, see Pushing Spark Query Processing to Snowflake (Snowflake Blog).