从云存储加载数据:Microsoft Azure

简介

This tutorial shows you how to load data from Microsoft Azure cloud storage into Snowflake using SQL. You can access a pre-loaded Snowsight template worksheet to complete these tasks.

Note

Snowflake bills a minimal amount for the on-disk storage used for any sample data in this tutorial. The tutorial provides steps to drop objects and minimize storage cost. Snowflake requires a virtual warehouse to load the data and execute queries. A running virtual warehouse consumes Snowflake credits.

If you are using a 30-day trial account (https://signup.snowflake.com/), which provides free credits, you won’t incur any costs.

您将学习的内容

在本教程中,您将学习如何:

  • 使用具有创建和使用本教程所需的 Snowflake 对象的权限的角色。
  • 使用仓库访问资源。
  • 选择会话使用的数据库和架构。
  • 创建表。
  • 为您的云平台创建存储集成。
  • 为您的存储集成创建暂存区。
  • 将数据从暂存区加载到表中。
  • 查询表中的数据。

先决条件

本教程假定满足以下条件:

Step 1. Sign in using Snowsight

To access Snowsight over the public Internet, do the following:

  1. In a supported web browser, navigate to https://app.snowflake.cn.
  2. Provide your account identifier or account URL. If you’ve previously signed in to Snowsight, you might see an account name that you can select.
  3. Sign in using your Snowflake account credentials.

第 2 步:打开用于从 Microsoft Azure 加载数据的 SQL 工作表

You can use worksheets to write and run SQL commands on your database. You can access a pre-loaded template worksheet for this tutorial. The worksheet has the SQL commands that you will run to create database objects, load data, and query the data. Because it is a template worksheet, you will be invited to enter your own values for certain SQL parameters. For more information about worksheets, see Getting started with worksheets.

本教程的工作表未预加载到试用账户中。要打开本教程的工作表,请按照下列步骤操作:

要打开预加载的模板工作表,请按照以下步骤操作:

  1. In the navigation menu, select Projects » Templates.
  2. Find and open Load data from Microsoft Azure.

工作表的开头部分如下图所示:

SQL load from cloud template worksheet, which contains the SQL commands for this tutorial, along with descriptive comments.

第 3 步:设置要使用的角色和仓库

The role you use determines the privileges you have. In this tutorial, use the ACCOUNTADMIN system role so that you can view and manage objects in your account. For more information, see Using the ACCOUNTADMIN Role.

A warehouse provides the compute resources that you need to execute DML operations, load data, and run queries. These resources include CPU, memory, and temporary storage. You can use the SNOWFLAKE_LEARNING_WH warehouse for this tutorial. For more information, see Virtual warehouses.

要设置要使用的角色和仓库,请执行以下操作:

  1. In the open worksheet, place your cursor in the USE ROLE line.

    USE ROLE ACCOUNTADMIN;
  2. At the top of the worksheet, select Run.

    Note

    In this tutorial, run SQL statements one at a time. Don’t select Run all.

  3. Place your cursor in the USE WAREHOUSE line, then select Run.

    USE WAREHOUSE SNOWFLAKE_LEARNING_WH;

第 4 步:设置可以加载的表

A database is a repository for your data. The data is stored in tables that you can manage and query. A schema is a logical grouping of database objects, such as tables and views. For example, a schema might contain the database objects required for a specific application. For more information, see Databases, Tables and Views - Overview.

In this tutorial, you use the database SNOWFLAKE_LEARNING_DB, a schema that concatenates your username with _LOAD_DATA_FROM_MICROSOFT_AZURE, and a table that you create named calendar.

要选择此数据库和架构以在会话中使用并创建表,请执行以下操作:

  1. In the open worksheet, place your cursor in the USE DATABASE line, then select Run.

    USE DATABASE SNOWFLAKE_LEARNING_DB;
  2. Place your cursor in each SET line, then select Run.

    SET user_name = current_user();
    SET schema_name = CONCAT($user_name, '_LOAD_DATA_FROM_MICROSOFT_AZURE');
  3. Place your cursor in the USE SCHEMA IDENTIFIER line, then select Run.

    USE SCHEMA IDENTIFIER($schema_name);
  4. Place your cursor in the CREATE TABLE lines, complete the table definition, add an optional comment, and select Run. For example, the following table contains six columns:

    CREATE OR REPLACE TABLE calendar
      (
      full_date DATE
      ,day_name VARCHAR(10)
      ,month_name VARCHAR(10)
      ,day_number VARCHAR(2)
      ,full_year VARCHAR(4)
      ,holiday BOOLEAN
      )
      COMMENT = 'Table to be loaded from Azure calendar data file';
  5. To confirm that the table was created successfully, place your cursor in the SELECT line, then select Run.

    SELECT * FROM calendar;

输出将显示您创建的表的列。目前,该表中没有任何行。

第 5 步:创建存储集成

在您可以从云存储加载数据之前,您必须配置特定于您的云提供商的存储集成。以下示例特定于 Microsoft Azure 存储。

存储集成是已命名的第一类 Snowflake 对象,无需传递显式云提供商凭据(如密钥或访问令牌)。集成对象会存储一个被称为“应用程序注册”的 Azure Identity and Access Management (IAM) 用户 ID。

要为 Azure 创建存储集成,请执行以下操作:

  1. Use the Azure portal to configure an Azure container for loading data. For details, see Configure an Azure container for loading data.

  2. In the open worksheet, place your cursor in the in the CREATE STORAGE INTEGRATION lines, define the required parameters, and select Run. For example:

    CREATE OR REPLACE STORAGE INTEGRATION azure_data_integration
      TYPE = EXTERNAL_STAGE
      STORAGE_PROVIDER = 'AZURE'
      AZURE_TENANT_ID = '075f576e-6f9b-4955-8e99-4086736225d9'
      ENABLED = TRUE
      STORAGE_ALLOWED_LOCATIONS = ('azure://tutorial99.blob.core.windows.net/snow-tutorial-container/');

    Set AZURE_TENANT_ID to the Office 365 tenant ID for the storage account that contains the allowed storage locations that you want to use. You can find this ID in the Azure portal under Microsoft Entra ID > Properties > Tenant ID. (Microsoft Entra ID is the new name for Azure Active Directory.)

    Set STORAGE_ALLOWED_LOCATIONS to the path for the Azure container where your source data file is stored. Use the format shown in this example, where tutorial99 is the storage account name and snow-tutorial-container is the container name.

  3. Place your cursor in the DESCRIBE INTEGRATION line, specify the name of the storage integration you created, and select Run.

    DESCRIBE INTEGRATION azure_data_integration;

此命令会检索自动为您的 Snowflake 账户创建的客户端应用程序的 AZURE_CONSENT_URL 和 AZURE_MULTI_TENANT_APP_NAME。您将使用这些值在 Azure 门户中配置 Snowflake 的权限。

此命令的输出类似于以下内容:

Output of DESCRIBE INTEGRATION command, with the following columns: property, property_type, property_value, property_default.
  1. Place your cursor in the SHOW INTEGRATIONS line and select Run. This command returns information about the storage integration you created.

    SHOW INTEGRATIONS;

此命令的输出类似于以下内容:

Output of SHOW INTEGRATIONS command, with the following columns: name, type, category, enabled, comment, created_on.
  1. Use the Azure portal to configure permissions for the client application (which was created automatically for your trial account) to access storage containers. Follow Step 2: Grant Snowflake Access to the Storage Locations under Configure an Azure container for loading data.

第 6 步:创建暂存区

暂存区是保存要加载到 Snowflake 数据库的数据文件的位置。本教程会创建一个暂存区,该暂存区可以从特定类型的云存储(例如 Azure 容器)加载数据。

要创建暂存区,请执行以下操作:

  1. In the open worksheet, place your cursor in the CREATE STAGE lines, specify a name, the storage integration you created, the bucket URL, and the correct file format, then select Run. For example:

    CREATE OR REPLACE STAGE cloud_data_db.azure_data.azuredata_stage
      STORAGE_INTEGRATION = azure_data_integration
      URL = 'azure://tutorial99.blob.core.windows.net/snow-tutorial-container/'
      FILE_FORMAT = (TYPE = CSV);
  2. 返回有关您创建的暂存区的信息:

    SHOW STAGES;

此命令的输出类似于以下内容:

Output of SHOW STAGES command, with the following columns: created_on, name, database_name, schema_name, url.

第 7 步:从暂存区加载数据

Load the table from the stage you created by using the COPY INTO <table> command. For more information about loading from Azure containers, see Copy data from an Azure stage.

To load the data into the table, place your cursor in the COPY INTO lines, specify the table name, the stage you created, and name of the file (or files) you want to load, then select Run. For example:

COPY INTO cloud_data_db.azure_data.calendar
  FROM @cloud_data_db.azure_data.azuredata_stage
    FILES = ('calendar.txt');

您的输出结果类似于下图:

Five rows are copied into the table. The output has the following columns: file, status, rows_parsed, rows_loaded, error_limit.

第 8 步:查询表

Now that the data is loaded, you can run queries on the calendar table.

To run a query in the open worksheet, select the line or lines of the SELECT command, and then select Run. For example, run the following query:

SELECT * FROM calendar;

您的输出结果类似于下图:

All the rows in the table are selected. This example has full_date, day_name, month_name, day_num, year_num, and holiday columns.

第 9 步:清理、总结和其他资源

恭喜!您已成功完成此试用账户教程。

请花几分钟时间查看简短的总结和教程中涵盖的要点。您可能还需要考虑删除在教程中创建的任何对象进行清理。例如,您可能希望删除已创建和加载的表:

DROP TABLE calendar;

As long as they are no longer needed, you can also drop the other objects you created, such as the storage integration and stage. For details, see Data Definition Language (DDL) commands.

总结和要点

In summary, you used a pre-loaded template worksheet in Snowsight to complete the following steps:

  1. 设置要使用的角色和仓库。
  2. 选择会话使用的数据库和架构。
  3. 创建表。
  4. 创建存储集成并配置云存储的权限。
  5. 创建暂存区,并将该暂存区的数据加载到表中。
  6. 查询数据。

以下是有关加载和查询数据的一些要点:

  • You need the required permissions to create and manage objects in your account. In this tutorial, you use the ACCOUNTADMIN system role for these privileges.

    This role is not normally used to create objects. Instead, we recommend creating a hierarchy of roles aligned with business functions in your organization. For more information, see Using the ACCOUNTADMIN Role.

  • You need a warehouse for the resources required to create and manage objects and run SQL commands. This tutorial uses the SNOWFLAKE_LEARNING_WH warehouse included with the template environment.

  • 您使用了数据库来存储数据,并使用架构对数据库对象进行逻辑分组。

  • 您创建了一个存储集成和一个暂存区,用于从存储在 Azure 容器中的 CSV 文件加载数据。

  • 将数据加载到数据库后,您可以使用 SELECT 语句查询数据。

下一步是什么?

请使用以下资源继续了解 Snowflake :