Snowflake in 20 minutes¶

Introduction¶

This tutorial uses the Snowflake command line client, SnowSQL, to introduce key concepts and tasks, including:

Creating Snowflake objects—You create a database and a table for storing data.
Loading data—We provide small sample CSV data files for you to load into the table.
Querying—You explore sample queries.

Note

Snowflake bills a minimal amount for the on-disk storage used for the sample data in this tutorial. The tutorial provides steps to drop the database and minimize storage cost.

Snowflake requires a virtual warehouse to load the data and execute queries. A running virtual warehouse consumes Snowflake credits. In this tutorial, you will be using a 30-day trial account (https://signup.snowflake.com/), which provides free credits, so you won’t incur any costs.

What you’ll learn¶

In this tutorial you’ll learn how to:

Create Snowflake objects—You create a database and a table for storing data.
Install SnowSQL—You install and use SnowSQL, the Snowflake command line query tool.

Users of Visual Studio Code might consider using the Snowflake Extension for Visual Studio Code instead of SnowSQL.
Load CSV data files—You use various mechanisms to load data into tables from CSV files.
Write and execute sample queries—You write and execute a variety of queries against newly loaded data.

Prerequisites¶

This tutorial requires a database, table, and virtual warehouse to load and query data. Creating these Snowflake objects requires a Snowflake user with a role with the necessary access control privileges. In addition, SnowSQL is required to execute the SQL statements in the tutorial. Lastly, the tutorial requires CSV files that contain sample data to load.

You can complete this tutorial using an existing Snowflake warehouse, database, and table, and your own local data files, but we recommend using the Snowflake objects and the set of provided data.

To set up Snowflake for this tutorial, complete the following before continuing:

There are no blank spaces before or after the commas separating the fields in each record. This is the default that Snowflake expects when loading CSV data.

Log in to SnowSQL¶

After you have SnowSQL, start SnowSQL to connect to Snowflake:

Open a command line window.
Start SnowSQL:
```
$ snowsql -a <account_identifier> -u <user_name>
```
Copy
Where:
- <account_identifier> is the unique identifier for your Snowflake account.
  The preferred format of the account identifier is as follows:
  
  organization_name-account_name
  Names of your Snowflake organization and account. For more information, see Format 1 (preferred): Account name in your organization.
- <user_name> is the login name for your Snowflake user.
Note

If your account has an identity provider (IdP) that has been defined for your account, you can use a web browser to authenticate instead of a password, as the following example demonstrates:
$ snowsql -a <account_identifier> -u <user_name> --authenticator externalbrowser
Copy
For more information, see Using a web browser for federated authentication/SSO.
When SnowSQL prompts you, enter the password for your Snowflake user.

If you log in successfully, SnowSQL displays a command prompt that includes your current warehouse, database, and schema.

Note

If you get locked out of the account and can’t obtain the account identifier, you can find it in the Welcome email that Snowflake sent to you when you signed up for the trial account, or you can work with your ORGADMIN to get the account details. You can also find the values for locator, cloud, and region in the Welcome email.

If your Snowflake user doesn’t have a default warehouse, database, and schema, or if you didn’t configure SnowSQL to specify a default warehouse, database, and schema, the prompt displays no warehouse, no database, and no schema. For example:

user-name#(no warehouse)@(no database).(no schema)>

Copy

This prompt indicates that there is no warehouse, database, and schema selected for the current session. You create these objects in the next step. As you follow the next steps in this tutorial to create these objects, the prompt automatically updates to include the names of these objects.

For more information, see Connecting through SnowSQL.

Create Snowflake objects¶

During this step you create the following Snowflake objects:

A database (sf_tuts) and a table (emp_basic). You load sample data into this table.
A virtual warehouse (sf_tuts_wh). This warehouse provides the compute resources needed to load data into the table and query the table. For this tutorial, you create an X-Small warehouse.

At the completion of this tutorial, you will remove these objects.

Create a database¶

Create the sf_tuts database using the CREATE DATABASE command:

CREATE OR REPLACE DATABASE sf_tuts;

Copy

In this tutorial, you use the default schema (public) available for each database, rather than creating a new schema.

Note that the database and schema you just created are now in use for your current session, as reflected in the SnowSQL command prompt. You can also use the context functions to get this information.

SELECT CURRENT_DATABASE(), CURRENT_SCHEMA();

Copy

The following is an example result:

+--------------------+------------------+
| CURRENT_DATABASE() | CURRENT_SCHEMA() |
|--------------------+------------------|
| SF_TUTS            | PUBLIC           |
+--------------------+------------------+

Create a table¶

Create a table named emp_basic in sf_tuts.public using the CREATE TABLE command:

CREATE OR REPLACE TABLE emp_basic (
   first_name STRING ,
   last_name STRING ,
   email STRING ,
   streetaddress STRING ,
   city STRING ,
   start_date DATE
   );

Copy

Note that the number of columns in the table, their positions, and their data types correspond to the fields in the sample CSV data files that you stage in the next step in this tutorial.

Create a virtual warehouse¶

Create an X-Small warehouse named sf_tuts_wh using the CREATE WAREHOUSE command:

CREATE OR REPLACE WAREHOUSE sf_tuts_wh WITH
   WAREHOUSE_SIZE='X-SMALL'
   AUTO_SUSPEND = 180
   AUTO_RESUME = TRUE
   INITIALLY_SUSPENDED=TRUE;

Copy

The sf_tuts_wh warehouse is initially suspended, but the DML statement also sets AUTO_RESUME = true. The AUTO_RESUME setting causes a warehouse to automatically start when SQL statements that require compute resources are executed.

After you create the warehouse, it’s now in use for your current session. This information is displayed in your SnowSQL command prompt. You can also retrieve the name of the warehouse by using the following context function:

SELECT CURRENT_WAREHOUSE();

Copy

The following is an example result:

+---------------------+
| CURRENT_WAREHOUSE() |
|---------------------|
| SF_TUTS_WH          |
+---------------------+

Stage data files¶

A Snowflake stage is a location in cloud storage that you use to load and unload data from a table. Snowflake supports the following types of stages:

Internal stages—Used to store data files internally within Snowflake. Each user and table in Snowflake gets an internal stage by default for staging data files.
External stages—Used to store data files externally in Amazon S3, Google Cloud Storage, or Microsoft Azure. If your data is already stored in these cloud storage services, you can use an external stage to load data in Snowflake tables.

In this tutorial, we upload the sample data files (downloaded in Prerequisites) to the internal stage for the emp_basic table that you created earlier. You use the PUT command to upload the sample data files to that stage.

Staging sample data files¶

Execute the PUT command in SnowSQL to upload local data files to the table stage provided for the emp_basic table you created.

PUT file://<file-path>[/\]employees0*.csv @sf_tuts.public.%emp_basic;

Copy

For example:

Linux or macOS

PUT file:///tmp/employees0*.csv @sf_tuts.public.%emp_basic;

Copy

Windows

PUT file://C:\temp\employees0*.csv @sf_tuts.public.%emp_basic;

Copy

Let’s take a closer look at the command:

file://<file-path>[/]employees0*.csv specifies the full directory path and names of the files on your local machine to stage. Note that file system wildcards are allowed, and if multiple files fit the pattern they are all displayed.
@<namespace>.%<table_name> indicates to use the stage for the specified table, in this case the emp_basic table.

The command returns the following result, showing the staged files:

+-----------------+--------------------+-------------+-------------+--------------------+--------------------+----------+---------+
| source          | target             | source_size | target_size | source_compression | target_compression | status   | message |
|-----------------+--------------------+-------------+-------------+--------------------+--------------------+----------+---------|
| employees01.csv | employees01.csv.gz |         360 |         287 | NONE               | GZIP               | UPLOADED |         |
| employees02.csv | employees02.csv.gz |         355 |         274 | NONE               | GZIP               | UPLOADED |         |
| employees03.csv | employees03.csv.gz |         397 |         295 | NONE               | GZIP               | UPLOADED |         |
| employees04.csv | employees04.csv.gz |         366 |         288 | NONE               | GZIP               | UPLOADED |         |
| employees05.csv | employees05.csv.gz |         394 |         299 | NONE               | GZIP               | UPLOADED |         |
+-----------------+--------------------+-------------+-------------+--------------------+--------------------+----------+---------+

The PUT command compresses files by default using gzip, as indicated in the TARGET_COMPRESSION column.

Listing the staged files (Optional)¶

You can list the staged files using the LIST command.

LIST @sf_tuts.public.%emp_basic;

Copy

The following is an example result:

+--------------------+------+----------------------------------+------------------------------+
| name               | size | md5                              | last_modified                |
|--------------------+------+----------------------------------+------------------------------|
| employees01.csv.gz |  288 | a851f2cc56138b0cd16cb603a97e74b1 | Tue, 9 Jan 2018 15:31:44 GMT |
| employees02.csv.gz |  288 | 125f5645ea500b0fde0cdd5f54029db9 | Tue, 9 Jan 2018 15:31:44 GMT |
| employees03.csv.gz |  304 | eafee33d3e62f079a054260503ddb921 | Tue, 9 Jan 2018 15:31:45 GMT |
| employees04.csv.gz |  304 | 9984ab077684fbcec93ae37479fa2f4d | Tue, 9 Jan 2018 15:31:44 GMT |
| employees05.csv.gz |  304 | 8ad4dc63a095332e158786cb6e8532d0 | Tue, 9 Jan 2018 15:31:44 GMT |
+--------------------+------+----------------------------------+------------------------------+

Copy data into target tables¶

To load your staged data into the target table, execute COPY INTO <table>.

The COPY INTO <table> command uses the virtual warehouse you created in Create Snowflake objects to copy files.

COPY INTO emp_basic
  FROM @%emp_basic
  FILE_FORMAT = (type = csv field_optionally_enclosed_by='"')
  PATTERN = '.*employees0[1-5].csv.gz'
  ON_ERROR = 'skip_file';

Copy

Where:

The FROM clause specifies the location containing the data files (the internal stage for the table).
The FILE_FORMAT clause specifies the file type as CSV, and specifies the double-quote character (") as the character used to enclose strings. Snowflake supports diverse file types and options. These are described in CREATE FILE FORMAT.
The PATTERN clause specifies that the command should load data from the filenames matching this regular expression (.*employees0[1-5].csv.gz).
The ON_ERROR clause specifies what to do when the COPY command encounters errors in the files. By default, the command stops loading data when the first error is encountered. This example skips any file containing an error and moves on to loading the next file. (None of the files in this tutorial contain errors; this is included for illustration purposes.)

The COPY command also provides an option for validating files before they are loaded. For more information about additional error checking and validation instructions, see the COPY INTO <table> topic and the other data loading tutorials.

The COPY command returns a result showing the list of files copied and related information:

+--------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
| file               | status | rows_parsed | rows_loaded | error_limit | errors_seen | first_error | first_error_line | first_error_character | first_error_column_name |
|--------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------|
| employees02.csv.gz | LOADED |           5 |           5 |           1 |           0 | NULL        |             NULL |                  NULL | NULL                    |
| employees04.csv.gz | LOADED |           5 |           5 |           1 |           0 | NULL        |             NULL |                  NULL | NULL                    |
| employees05.csv.gz | LOADED |           5 |           5 |           1 |           0 | NULL        |             NULL |                  NULL | NULL                    |
| employees03.csv.gz | LOADED |           5 |           5 |           1 |           0 | NULL        |             NULL |                  NULL | NULL                    |
| employees01.csv.gz | LOADED |           5 |           5 |           1 |           0 | NULL        |             NULL |                  NULL | NULL                    |
+--------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+

Query loaded data¶

You can query the data loaded in the emp_basic table using standard SQL and any supported functions and operators.

You can also manipulate the data, such as updating the loaded data or inserting more data, using standard DML commands.

Retrieve all data¶

Return all rows and columns from the table:

SELECT * FROM emp_basic;

Copy

The following is a partial result:

+------------+--------------+---------------------------+-----------------------------+--------------------+------------+
| FIRST_NAME | LAST_NAME    | EMAIL                     | STREETADDRESS               | CITY               | START_DATE |
|------------+--------------+---------------------------+-----------------------------+--------------------+------------|
| Arlene     | Davidovits   | adavidovitsk@sf_tuts.com  | 7571 New Castle Circle      | Meniko             | 2017-05-03 |
| Violette   | Shermore     | vshermorel@sf_tuts.com    | 899 Merchant Center         | Troitsk            | 2017-01-19 |
| Ron        | Mattys       | rmattysm@sf_tuts.com      | 423 Lien Pass               | Bayaguana          | 2017-11-15 |
 ...
 ...
 ...
| Carson     | Bedder       | cbedderh@sf_tuts.co.au    | 71 Clyde Gallagher Place    | Leninskoye         | 2017-03-29 |
| Dana       | Avory        | davoryi@sf_tuts.com       | 2 Holy Cross Pass           | Wenlin             | 2017-05-11 |
| Ronny      | Talmadge     | rtalmadgej@sf_tuts.co.uk  | 588 Chinook Street          | Yawata             | 2017-06-02 |
+------------+--------------+---------------------------+-----------------------------+--------------------+------------+

Copy

Insert additional data rows¶

In addition to loading data from staged files into a table, you can insert rows directly into a table using the INSERT DML command.

For example, to insert two additional rows into the table:

INSERT INTO emp_basic VALUES
   ('Clementine','Adamou','cadamou@sf_tuts.com','10510 Sachs Road','Klenak','2017-9-22') ,
   ('Marlowe','De Anesy','madamouc@sf_tuts.co.uk','36768 Northfield Plaza','Fangshan','2017-1-26');

Copy

Query rows based on email address¶

Return a list of email addresses with United Kingdom top-level domains using the [ NOT ] LIKE function:

SELECT email FROM emp_basic WHERE email LIKE '%.uk';

Copy

The following is an example result:

+--------------------------+
| EMAIL                    |
|--------------------------|
| gbassfordo@sf_tuts.co.uk |
| rtalmadgej@sf_tuts.co.uk |
| madamouc@sf_tuts.co.uk   |
+--------------------------+

Query rows based on start date¶

For example, to calculate when certain employee benefits might start, add 90 days to employee start dates using the DATEADD function. Filter the list by employees whose start date occurred earlier than January 1, 2017:

SELECT first_name, last_name, DATEADD('day',90,start_date) FROM emp_basic WHERE start_date <= '2017-01-01';

Copy

The following is an example result:

+------------+-----------+------------------------------+
| FIRST_NAME | LAST_NAME | DATEADD('DAY',90,START_DATE) |
|------------+-----------+------------------------------|
| Granger    | Bassford  | 2017-03-30                   |
| Catherin   | Devereu   | 2017-03-17                   |
| Cesar      | Hovie     | 2017-03-21                   |
| Wallis     | Sizey     | 2017-03-30                   |
+------------+-----------+------------------------------+

Summary, clean up, and additional resources¶

Congratulations! You’ve successfully completed this introductory tutorial.

Take a few minutes to review a short summary and the key points covered in the tutorial. You might also want to consider cleaning up by dropping any objects you created in the tutorial. Learn more by reviewing other topics in the Snowflake Documentation.

Summary and key points¶

In summary, data loading is performed in two steps:

Stage the data files to load. The files can be staged internally (in Snowflake) or in an external location. In this tutorial, you stage files internally.
Copy data from the staged files into an existing target table. A running warehouse is required for this step.

Remember the following key points about loading CSV files:

A CSV file consists of 1 or more records, with 1 or more fields in each record, and sometimes a header record.
Records and fields in each file are separated by delimiters. The default delimiters are:

Records:

newline characters

Fields:

commas

In other words, Snowflake expects each record in a CSV file to be separated by new lines and the fields (i.e. individual values) in each record to be separated by commas. If different characters are used as record and field delimiters, you must explicitly specify this as part of the file format when loading.
There is a direct correlation between the fields in the files and the columns in the table you will be loading, in terms of:
- Number of fields (in the file) and columns (in the target table).
- Positions of the fields and columns within their respective file/table.
- Data types, such as string, number, or date, for fields and columns.
The records will not be loaded if the numbers, positions, and data types don’t align with the data.

Note

Snowflake supports loading files in which the fields don’t exactly align with the columns in the target table; however, this is a more advanced data loading topic (covered in Transforming data during a load).

Tutorial cleanup (Optional)¶

If the objects you created in this tutorial are no longer needed, you can remove them from the system with DROP <object> statements.

DROP DATABASE IF EXISTS sf_tuts;

DROP WAREHOUSE IF EXISTS sf_tuts_wh;

Copy

Exit the connection¶

To exit a connection, use the !exit command for SnowSQL (or its alias, !disconnect).

Exit drops the current connection and quits SnowSQL if it is the last connection.

What’s next?¶

Continue learning about Snowflake using the following resources:

Complete the other tutorials provided by Snowflake:
- Snowflake Tutorials
Familiarize yourself with key Snowflake concepts and features, as well as the SQL commands to perform queries and insert/update data: