Tutorial: Testing Python Snowpark¶
Introduction¶
This tutorial introduces the basics of testing your Snowpark Python code.
What You Will Learn¶
In this tutorial, you will learn how to:
-
Test your Snowpark code while connected to Snowflake.
You can use standard testing utilities, like PyTest, to test your Snowpark Python UDFs, DataFrame transformations, and stored procedures.
-
Test your Snowpark Python DataFrames locally without connecting to a Snowflake account by using the local testing framework.
You can use the local testing framework to test locally, on your development machine, before deploying code changes.
Prerequisites¶
To use the local testing framework:
You must use version 1.11.1 or higher of the Snowpark Python library. The supported versions of Python are:
Generally available versions:
- 3.9 (deprecated)
- 3.10
- 3.11
- 3.12
- 3.13
Set Up the Project¶
In this section, you’ll clone the project repository and set up the environment you’ll need for the tutorial.
-
Clone the project repository.
If you do not have git installed, go to the repository page and download the contents by clicking Code » Download Contents.
-
Set environment variables with your account credentials. The Snowpark API will use these to authenticate to your Snowflake account.
Optional: You can set this env var permanently by editing your bash profile (on Linux/MacOS) or using the System Properties menu (on Windows).
-
Create and activate a conda environment using Anaconda:
-
Create the sample table in your account by running
setup/create_table.py. This Python script will create a database called CITIBIKE, a schema called PUBLIC, and a small table called TRIPS.
You’re now ready to move to the next section. In this section you:
- Cloned the tutorial repository.
- Created environment variables with your account information.
- Created a conda environment for the project.
- Connected to Snowflake using the Snowpark API and created a sample database, schema, and table.
Try the Stored Procedure¶
The sample project includes a stored procedure handler (sproc.py) and three DataFrames transformer methods (transformers.py).
The stored procedure handler uses the UDF and DataFrame transformers to read from the source table, CITIBIKE.PUBLIC.TRIPS, and creates
two fact tables: MONTH_FACTS and BIKE_FACTS.
You can execute the stored procedure from the command line by running this command.
Now that you’ve familiarized yourself with the project, in the next section you will set up the test directory and create a PyTest Fixture for the Snowflake session.
Create a PyTest Fixture for the Snowflake Session¶
PyTest fixtures (https://docs.pytest.org/en/6.2.x/fixture.html) are functions which are executed before a test (or module of tests), typically to provide data or connections to tests.
For this project, you will create a PyTest fixture which returns a Snowpark Session object. Your test cases will use this session to connect to Snowflake.
-
Create a
testdirectory under the project root directory. -
Under the
testdirectory, create a new Python file namedconftest.py. Withinconftest.py, create a PyTest fixture for theSessionobject:
Add Unit Tests for DataFrame Transformers¶
-
In the
testdirectory, create a new Python file namedtest_transformers.py. -
In the
test_transformers.pyfile, import the transformer methods. -
Next, create unit tests for these transformers. The typical convention is to create a method for each test with the name
test_<name of method>. In our case, the tests will be:The
sessionparameter in each test case refers to the PyTest fixture that you created in the previous section. -
Now implement the test cases for each transformer. Use the following pattern.
- Create an input DataFrame.
- Create the expected output DataFrame.
- Pass the input DataFrame from step 1 into the transformer method.
- Compare the output of step 3 to the expected output from step 2.
-
You can now run PyTest to run all of the unit tests.
Add Integration Tests for Stored Procedures¶
Now that we have unit tests for the DataFrame transformer methods, let’s add an integration test for the stored procedure. The test case will follow this pattern:
- Create a table representing the input data to the stored procedure.
- Create two DataFrames with the expected contents of the stored procedure’s two output tables.
- Call the stored procedure.
- Compare the actual output tables to the DataFrames from step 2.
- Clean up: delete the input table from step 1 and the output tables from step 3.
Create a Python file named test_sproc.py under the test directory.
Import the stored procedure handler from the project directory and create a test case.
Implement the test case, starting with the creation of the input table.
Next, create DataFrames for the expected output tables.
And finally, call the stored procedure and read the output tables. Compare the actual tables against the DataFrame contents.
To run the test case, run pytest from the terminal.
To run all the tests in the project, run pytest without any other options.
Configure Local Testing¶
At this point you have a PyTest test suite for your DataFrame transformers and stored procedure.
In each test case, the Session fixture is used to connect to your Snowflake account, send the SQL from the Snowpark Python API, and retrieve the response.
Alternatively, you can use the local testing framework to run the transformations locally without a connection to Snowflake. In large test suites, this can add up to significantly faster test execution. This section shows how to update the test suite to use the local testing framework functionality.
-
Begin by updating the PyTest
Sessionfixture. We will add a command-line option to PyTest to switch between local and live testing modes. -
We must first patch this method because not all built-in functions are supported with the local testing framework, for example the
monthname()function used in thecalc_month_facts()transformer. Create a file namedpatches.pyunder the tests directory. In this file, paste the following code.The patch above accepts a single parameter,
column, which is apandas.Series-like object containing the rows of data within the column. We then use a combination of methods from the Python modulesdatetimeandcalendarto emulate the functionality of the built-inmonthname()column. Finally, we set the return type toString, as the built-in method returns strings corresponding to the months (“Jan”, “Feb”, “Mar”, etc.). -
Next, import this method into the tests for the DataFrame transformer and the stored procedure.
-
Rerun
pytestwith the local flag. -
Now apply the same patch to the stored procedure test.
-
Re-run pytest with the local flag.
-
To wrap things up, let’s compare the time taken to run the full test suite locally versus with a live connection. We will use the
timecommand to measure the time taken for both commands. Let’s start with the live connection.In this case, the test suite took 7.89 seconds to run. (Your exact time may differ depending on your computer, network connection, and other factors.)
Now let’s try with the local testing framework:
With the local testing framework the test suite, execution only took 1 second!
Learn More¶
You finished! Nicely done.
In this tutorial, you got an end-to-end view of how you can test your Python Snowpark code. Along the way, you:
-
Created a PyTest fixture and added unit tests and integration tests.
- For more information, see Writing Tests for Snowpark Python.
-
Configured local testing
- For more information, see Local testing framework.