Writing Snowpark Code in Python Worksheets¶
Write Snowpark code in Python worksheets to process data using Snowpark Python in Snowsight. By writing code in Python worksheets, you can perform your development and testing in Snowflake without needing to install dependent libraries.
To develop with Python worksheets, do the following:
For example, you might write code in a Python worksheet that extracts data from stages or database objects in Snowflake, transforms the data, and stores the transformed data in Snowflake. You could then deploy that code as a stored procedure and build a data pipeline, all without leaving Snowflake.
About Python Worksheets¶
Python worksheets let you use Snowpark Python in Snowsight to perform data manipulations and transformations. You can use third-party packages listed in the Snowflake Anaconda channel (https://repo.anaconda.com/pkgs/snowflake/) or import your own Python files from stages to use in scripts.
After running a Python worksheet, review the results and output returned by your script. The results display as a string, variant, or a table, depending on your code. See Running Python Worksheets.
Note
Because Python worksheets run inside Snowflake rather than in your local development environment, you cannot use session.add_import
to add a file that your Python code depends on, or session.add_packages
or session.add_requirements
to add packages that you need
to use in your Python code. Instead, you add those files to a stage and reference them in your code.
See Staging files using Snowsight.
Python worksheets have the following limitations:
Log levels lower than WARN do not appear in the Output for a Python worksheet by default. To log lower level messages to the output, use a logging library such as the
logging
module to set the level of messages logged.No support for breakpoints or running only portions of the Python code in a worksheet.
No support for images or webpages. Images or webpages generated by Python code cannot be displayed in Python worksheets.
Python worksheets use Python 3.11 by default, but you can choose another supported version in Packages.
If you require support for any of these options, consider using your local development environment instead. See Setting Up Your Development Environment for Snowpark Python.
Prerequisites for Python Worksheets¶
To use Python worksheets, you must do the following:
Review and accept the Anaconda Terms of Service in Snowsight. See Review and accept the Anaconda Terms of Service.
(Optional) Add Python files and packages that are not included with Anaconda (https://repo.anaconda.com/pkgs/snowflake/) that you want to use in a Python worksheet to a named stage. See Add a Python File from a Stage to a Worksheet.
Choose a warehouse to use for Python worksheets. Snowflake recommends using an X-Small warehouse for development. If you’re running a very large Snowpark workload, use a Snowpark-optimized warehouse. See Warehouse size for additional details about warehouse sizes.
Review and accept the Anaconda Terms of Service¶
Before you start using the packages provided by Anaconda inside Snowflake, you must acknowledge the External Offerings Terms.
Note
You must use the ORGADMIN role to accept the terms. You only need to accept the External Offerings Terms once for your Snowflake account. If you do not have access to the ORGADMIN role, see Enabling the ORGADMIN role in an account.
Sign in to Snowsight.
Select Admin » Billing & Terms.
In the Anaconda section, select Enable.
In the Anaconda Packages dialog, click the link to review the External Offerings Terms page.
If you agree to the terms, select Acknowledge & Continue.
If you encounter an error when attempting to accept the External Offerings Terms, it may be due to missing information in your user profile, such as a first name, last name, or email address. If you have administrator privileges, see Add user details to your user profile to update your profile using Snowsight. Otherwise, contact an administrator to update your account.
Add a Python File from a Stage to a Worksheet¶
Snowflake includes the Snowpark packages from the Snowflake Anaconda channel (https://repo.anaconda.com/pkgs/snowflake/) in Python worksheets.
If you want to use Python files or packages other than those included in Anaconda in your Python worksheet, you must upload the files to a named stage in Snowflake and then add them to the list of installed packages for your Python worksheet.
To use a Python package in your worksheet, do the following:
Sign in to Snowsight.
Open Projects » Worksheets.
Select + » Python Worksheet.
Select a database and schema.
Select Packages » Stage Packages.
Enter the path to the package in the stage:
If the selected database and schema for the worksheet contain the stage where the package is located, you can reference the stage using an unqualified name. For example,
@YourStage/path/to/example_package.py
.To reference a stage in a different database and schema, fully qualify the name of the stage. For example,
@Database.Schema.Stage/path/to/other_package.py
.
Select Import to add your package to the list of installed packages.
In your code, use
import
statements to use the package in your Python worksheet. For example, after importing packages from theexample_package.py
andother_package.py
files, write the following code to import a function calledfunction
from theexample_package
, and import the packageother_package
for use in your code:from example_package import function import other_package
Note
Packages that you add to a worksheet are available only to that worksheet. If you want to use the same package in a different Python worksheet, use this procedure to add the package to that worksheet.
For more details, see Making dependencies available to your code.
Start Developing with Python Worksheets¶
To open a worksheet and configure your development environment, do the following:
Sign in to Snowsight.
Open Projects » Worksheets.
Select + » Python Worksheet.
Select a database and schema.
Select a warehouse to use to run the worksheet. If you have a default warehouse for your user, it is pre-selected.
Python worksheets require a running warehouse to load Python packages and run Python code.
(Optional) Select Packages to install Python libraries.
The
snowflake-snowpark-python
package is required and always installed for Python worksheets.Search for packages listed in the Snowflake Anaconda channel (https://repo.anaconda.com/pkgs/snowflake/), such as numpy, pandas, requests, and urllib3. Select a package to install it for use in your worksheet, and optionally change the default package version in the list of Installed Packages.
Add your own packages and Python files by selecting Stage Packages and specifying the file path of the stage and package, then selecting Import. See Add a Python File from a Stage to a Worksheet.
Packages installed by you appear under Installed Packages.
If you installed Python libraries for your worksheet, add
import
statements to your code to use the installed libraries.For example, if you install the package scikit-learn for your Python worksheet, add an
import
statement for that package at the beginning of your codeimport scikit-learn
Run the sample Python code to validate your configuration.
Error messages or the return value from your code appears in the Results section. To view log messages, select Output. See Running Python Worksheets.
Writing Snowpark Code in Python Worksheets¶
After you follow the steps to start developing with Python worksheets, you can replace the sample code with your own.
Write your Snowpark Python code inside the handler function:
import snowflake.snowpark as snowpark
def main(session: snowpark.Session):
# your code goes here
The default handler function is main
, but you can change it in the Settings for the worksheet.
The active handler is highlighted in the worksheet.
Use the session
object provided in the boilerplate code to access data in Snowflake with the Snowpark API libraries.
For example, you can create a DataFrame for a table or execute a SQL
statement. See the Snowpark Developer Guide for Python.
As you type, you see autocomplete for Python methods, defined variables, and more. You do not see autocomplete for some third-party packages or files imported from a stage. Python worksheets also include syntax highlighting and guidance for method parameters. You can configure linting and line wrapping in the Settings for the worksheet.
Return Results of a Different Data Type¶
When you write your Python code, consider which type of data is returned by the return
statement in your code and adjust how the
worksheet returns results. By default, a Python worksheet has a return type of Table() because the placeholder code returns a DataFrame.
Depending on what your Python code returns, you might want to change the worksheet settings to display the output differently:
If your handler function returns a
DataFrame
, use the default return type of Table().If your handler function returns a list of
Row
objects, such as with thecollect
method, change the return type to Variant.If your handler function returns a string, such as
return "Hello Python"
, or a value that you want to cast as a string, change the return type to String.If your handler function returns an integer, such as with the
count
method, use a return type of Variant or String.
For details about the return type of some DataFrame methods, see Performing an Action to Evaluate a DataFrame.
To update the worksheet settings to return results of a different type, do the following:
Sign in to Snowsight.
Open Projects » Worksheets.
Open the Python worksheet for which you want to display the results as a table.
Select a warehouse to use to run the worksheet. If you have a default warehouse for your user, it is pre-selected. Make sure your warehouse is running.
Select Settings and for the Return type, select the type returned by the handler function.
Run your Python worksheet.
Review the results in the Results panel.
Passing Additional Arguments to the Handler Function¶
With a Python worksheet, you can test a Python function that takes a single argument (a Snowpark Session
object) by designating that
function as the handler for the worksheet. Every function defined in a Python worksheet needs to pass in the session: snowpark.Session
argument.
To test a function that passes in additional arguments, do the following:
Add the arguments to your function.
Define a separate, single-argument function that passes in a Snowpark
Session
. In this function, call the multi-argument function, passing in values for the additional arguments, then return the value of the function.For example, to write Snowpark Python code that filters a table of packages by the package language column, you can write the following code:
import snowflake.snowpark as snowpark from snowflake.snowpark.functions import col # Add parameters with optional type hints to the main handler function def main(session: snowpark.Session, language: str): # Your code goes here, inside the "main" handler. table_name = 'information_schema.packages' dataFrame = session.table(table_name).filter(col("language") == language) # Print a sample of the dataFrame to standard output dataFrame.show() # The return value appears in the Results tab return dataFrame # Add a second function to supply a value for the language parameter to validate that your main handler function runs. def test_language(session: snowpark.Session): return main(session, 'java')
In this example, the
main
function is the multi-argument function and thetest_language
function is the single-argument function used to validate that your code runs with the passed argument values.Set the single-argument function as the handler function to run the worksheet and validate that your code runs with the argument values.
In this example, change the handler to the
test_language
function and then select Run. You can change the handler in the worksheet Settings, or select the Show actions lightbulb next to the handler function and select Set function “test_language” as handler.
When you deploy your Python worksheet as a stored procedure, you can choose the main handler function and review the arguments and the mapped types for your stored procedure.
Running Python Worksheets¶
After you write your Python worksheet, select Run to run your Python worksheet. Running your worksheet executes all of the code in your Python worksheet. Partial or incremental execution of code is not supported.
Note
If you use a package listed in the Snowflake Anaconda channel (https://repo.anaconda.com/pkgs/snowflake/) and have not yet accepted the Anaconda terms, you might see an error about missing packages. See Using third-party packages from Anaconda.
Review Output Generated by Your Code¶
You can review standard output (stdout) or standard error (stderr) messages for your Python code in the Output panel for a Python worksheet.
You can see the output from the following types of functions in the Output panel:
Functions that write to the console, such as
print()
.Functions that print a DataFrame, such as the
show
method of the DataFrame class in Snowpark Python.
Note
Output appears after all Python processes finish running, rather than appearing in a stream as the code runs.
Log output is written to a temporary stage and is only captured if the following are true:
You select a database and schema for the worksheet.
The selected database was not created from a share.
You run the worksheet using a role that has USAGE privileges on the selected database and schema.
Review the Query History for a Python Worksheet¶
When a Python worksheet runs in Snowsight, an anonymous stored procedure runs the code and generates queries that execute the Snowpark commands in the code.
You can use the Query History page in Snowsight to review the queries that ran. See Review Query History in Snowsight.
For example, after running a worksheet, you can review the queries that ran by doing the following:
Review the Results of the worksheet.
In the Query Details for the worksheet, select » Copy Query ID
Select Projects » Worksheets to return to the list of worksheets.
Select Monitoring » Query History.
On the Query History page, display only the queries from your Python worksheet:
Select Filters, and enable the Query ID option.
Enter the Query ID of your Python worksheet.
Select Apply Filters.
Review the queries run for the worksheet.
Example Code for Python Worksheets¶
When you write Python worksheets, you can perform data transformation and manipulation tasks, including reading data from a named stage.
You can review additional examples in Working with DataFrames in Snowpark Python.
Example: Write a Simple Snowpark Program¶
In this example, write a Snowpark Python program that generates a small range of numbers and writes the range to a table that your code creates, or overwrites if it already exists, in Snowflake. To run this code example, you must have the CREATE TABLE privilege on the database schema to which you want to add the table.
Sign in to Snowsight.
Open Projects » Worksheets.
Select + » Python Worksheet.
Select a database and schema that you want to add the table to.
Select a warehouse to use to run the worksheet. If you have a default warehouse for your user, it is pre-selected. Make sure your warehouse is running.
Write the Snowpark Python code as part of the
main
function:import snowflake.snowpark as snowpark def main(session: snowpark.Session): tableName = "range_table" df_range = session.range(1, 10, 2).to_df('a') df_range.write.mode("overwrite").save_as_table(tableName) return tableName + " table successfully created"
Select Settings and for the Return type, select String for the type returned by the handler function.
Run the code.
Example: Transform Data in a Python Worksheet¶
In this example, write Python code that aggregates the entries in the TASK_HISTORY view in the ACCOUNT_USAGE schema of the SNOWFLAKE
database by scheduled time and state and saves the aggregated output to a table, aggregate_task_history
.
Note
Because this example queries account usage data, you must use a role with:
Access to query the views in the ACCOUNT_USAGE schema. See Enabling other roles to use schemas in the SNOWFLAKE database.
The CREATE TABLE privilege on the database schema to which you want to add the table.
Sign in to Snowsight.
Open Projects » Worksheets.
Select + » Python Worksheet.
Select a database and schema that you want to add the table to.
Select a warehouse to use to run the worksheet. If you have a default warehouse for your user, it is pre-selected. Make sure your warehouse is running.
Write the Snowpark Python code as part of the
main
function:import snowflake.snowpark as snowpark from snowflake.snowpark.functions import col from snowflake.snowpark.dataframe_reader import * from snowflake.snowpark.functions import * def main(session: snowpark.Session): inputTableName = "snowflake.account_usage.task_history" outputTableName = "aggregate_task_history" df = session.table(inputTableName) df.filter(col("STATE") != "SKIPPED")\ .group_by(("SCHEDULED_TIME"), "STATE").count()\ .write.mode("overwrite").save_as_table(outputTableName) return outputTableName + " table successfully written"
Select Settings and for the Return type, select String for the type returned by the handler function.
Run the code.
After you run your code in a Python worksheet, you can open a SQL worksheet and query the table. See Querying data using worksheets.
Example: Read Files from a Stage with Python Worksheets¶
Snowpark Python lets you read files from a stage and write the contents to a table or save them as a view in Snowflake.
In this example, the Python code reads the contents of a compressed CSV-formatted file containing employee data,
data_0_0_0.csv.gz
from the db1.public.files
named stage and writes the contents to a table called employees
.
Note
To run this code example, you must use a role that has:
The USAGE privilege on the stage, database, and schema used in the code.
The CREATE TABLE privilege on the database schema to which you want to add the table.
Sign in to Snowsight.
Open Projects » Worksheets.
Select + » Python Worksheet.
Select a database and schema that you want to add the table to.
Select a warehouse to use to run the worksheet. If you have a default warehouse for your user, it is pre-selected. Make sure your warehouse is running.
Write the Snowpark Python code as part of the
main
function:import snowflake.snowpark as snowpark from snowflake.snowpark.types import * schema_for_file = StructType([ StructField("name", StringType()), StructField("role", StringType()) ]) fileLocation = "@DB1.PUBLIC.FILES/data_0_0_0.csv.gz" outputTableName = "employees" def main(session: snowpark.Session): df_reader = session.read.schema(schema_for_file) df = df_reader.csv(fileLocation) df.write.mode("overwrite").save_as_table(outputTableName) return outputTableName + " table successfully written from stage"
Select Settings and for the Return type, select String for the type returned by the handler function.
Run the code.
After you run your code in a Python worksheet, you can open a SQL worksheet and query the table. See Querying data using worksheets.
For more details about working with files in a stage using Snowpark, see Working with DataFrames in Snowpark Python.