Develop with a local IDE¶
You can run Spark workloads interactively from Jupyter Notebooks, VS Code, IntelliJ, or any Python/Java/Scala interface without needing to manage a Spark cluster. The workloads run on the Snowflake infrastructure.
There are two ways to connect:
Snowpark Connect package (recommended): Install the
snowpark-connectPython package, which is required for all languages (Python, Java, and Scala). For Java and Scala projects, also add thesnowpark-connect-java-clientMaven dependency. For establishing a connection, use a TOML connection file. This approach handles server lifecycle, authentication, and session management automatically.Direct endpoint (server-side): Connect to Snowflake’s hosted Spark Connect endpoint using standard PySpark or Spark Java/Scala clients with programmatic access tokens (PATs). No Snowflake-specific packages are required.
Prerequisites¶
You have a Snowflake account with access to Snowpark Connect for Spark.
Python 3.10 or later (earlier than 3.13) is installed. Confirm your version by running
python3 --version.Ensure that your Java and Python installations use the same CPU architecture. For example, if Python is arm64, install an arm64 build of Java (not x86_64).
Connection configuration¶
Snowpark Connect for Spark connects to Snowflake using a TOML connection file. You can create this file manually or by using Snowflake CLI.
If you have Snowflake CLI installed, you can use it to define a connection. Otherwise, you can manually write connection parameters in a config.toml file.
Add a connection by using Snowflake CLI¶
You can use Snowflake CLI to add connection properties that Snowpark Connect for Spark uses to connect to Snowflake. Your changes are saved to a
config.toml file.
Run the following command to add a connection:
Follow the prompts to define a connection.
Specify
spark-connectas the connection name.This command adds a connection to your
config.tomlfile:Confirm the connection works:
Add a connection manually¶
You can write or update a connections.toml file so that your code can connect to Snowpark Connect for Spark on Snowflake.
Ensure that the file permissions allow only the owner to read and write:
Edit the file to contain a
[spark-connect]connection with your specifics:
Install Snowpark Connect for Spark¶
Create a Python virtual environment and install the Snowpark Connect for Spark package:
Note
The Java client for Snowpark Connect for Spark is a preview feature.
The Java/Scala client library manages the Snowpark Connect for Spark Python gRPC server as a child process. You
need both the library and a Python virtual environment with snowpark-connect installed.
Create a Python virtual environment:
Add the following dependencies to your
pom.xml:The library is available on Maven Central: snowpark-connect-java-client_2.12 (https://central.sonatype.com/artifact/com.snowflake/snowpark-connect-java-client_2.12), snowpark-connect-java-client_2.13 (https://central.sonatype.com/artifact/com.snowflake/snowpark-connect-java-client_2.13).
On Java 9+, add the required
--add-opensJVM arguments for Apache Arrow compatibility. See JVM module system arguments for the full list and how to configure them in Maven, IntelliJ, or on the command line.Point the library to the venv using one of these methods (in order of precedence):
Code API:
.pythonVenv("/path/to/scos-venv")on the session builderEnvironment variable:
SNOWPARK_CONNECT_PYTHON_VENV=/path/to/scos-venv
If neither is set, the library falls back to system
python3(orpythonon Windows) and checks whethersnowpark-connectis importable.
Note
The Scala client for Snowpark Connect for Spark is a preview feature.
The Java/Scala client library manages the Snowpark Connect for Spark Python gRPC server as a child process. You
need both the library and a Python virtual environment with snowpark-connect installed.
Create a Python virtual environment:
Add the following dependencies to your
build.sbt:The library is available on Maven Central: snowpark-connect-java-client_2.12 (https://central.sonatype.com/artifact/com.snowflake/snowpark-connect-java-client_2.12), snowpark-connect-java-client_2.13 (https://central.sonatype.com/artifact/com.snowflake/snowpark-connect-java-client_2.13).
On Java 9+, add the required
--add-opensJVM arguments for Apache Arrow compatibility. See JVM module system arguments for the full list and how to configure them in sbt, IntelliJ, or on the command line.Point the library to the venv using one of these methods (in order of precedence):
Code API:
.pythonVenv("/path/to/scos-venv")on the session builderEnvironment variable:
SNOWPARK_CONNECT_PYTHON_VENV=/path/to/scos-venv
If neither is set, the library falls back to system
python3(orpythonon Windows) and checks whethersnowpark-connectis importable.
Start a session and run code¶
Once you have Snowpark Connect for Spark installed and an authenticated connection in place, start a session and run Spark code.
Start the Snowpark Connect for Spark server and create a session:
Then run Spark DataFrame code:
Note
The Java client for Snowpark Connect for Spark is a preview feature.
Compile and run:
UDF support
When using user-defined functions or custom code with Java, do one of the following:
Register a class finder to monitor and upload class files.
Upload JAR dependencies. You can include the workload JAR itself if a class finder isn’t used.
Use a staged JAR.
Using Scala 2.13
By default, Snowpark Connect for Spark uses Scala 2.12. If your dependencies are built with Scala 2.13, you
must specify the Scala version using the snowpark.connect.scala.version configuration option.
Note
The Scala client for Snowpark Connect for Spark is a preview feature.
Compile and run:
UDF support
When using user-defined functions or custom code with Scala, do one of the following:
Register a class finder to monitor and upload class files.
Upload JAR dependencies. You can include the workload JAR itself if a class finder isn’t used.
Use a staged JAR.
Using Scala 2.13
By default, Snowpark Connect for Spark uses Scala 2.12. Workloads built with Scala 2.13 must specify the Scala
version using the snowpark.connect.scala.version configuration option.
Common installation issues¶
Use the following checks to resolve common Snowpark Connect for Spark installation issues.
Ensure that Java and Python are based on the same architecture.
Use the most recent Snowpark Connect for Spark package, as described in Install Snowpark Connect for Spark.
Confirm that the
pythoncommand with PySpark code is working correctly for local execution without Snowflake connectivity.For example, execute a command such as the following:
Connect directly to Snowflake’s Spark Connect endpoint¶
You can connect to Snowflake’s hosted Spark Connect endpoint using standard, off-the-shelf Spark client packages such as PySpark or Spark clients for Java and Scala. You don’t need to install any Snowflake-specific packages.
With this approach, all Spark processing runs on Snowflake’s infrastructure. Your client sends Spark Connect protocol messages directly to Snowflake, which executes the workload and returns results. Authentication uses programmatic access tokens (PATs).
This option is useful when you want to:
Avoid installing Snowflake-specific packages in your environment.
Use your existing Spark tooling (Jupyter, VS Code, terminals) with Snowflake compute and governance.
Simplify dependency management by relying only on the standard PySpark package.
Step 1: Install required packages¶
Install the Spark Connect client for your language. You don’t need to install any Snowflake packages.
Note
The Java client for Snowpark Connect for Spark is a preview feature.
Add the Spark Connect client dependency to your pom.xml:
Note
The Scala client for Snowpark Connect for Spark is a preview feature.
Add the Spark Connect client dependency to your build.sbt file:
Step 2: Set up authentication¶
Generate a programmatic access token (PAT).
For more information, see the following topics:
The following example adds a PAT named
TEST_PATfor the usersysadminand sets the expiration to 30 days.Find your Snowflake Spark Connect host URL.
Run the following SQL in Snowflake to find the hostname for your account:
Step 3: Connect and run Spark code¶
Connect to the Snowflake Spark Connect endpoint using the host URL and PAT from the previous steps.
Once connected, you can write regular Spark DataFrame code:
Note
The Java client for Snowpark Connect for Spark is a preview feature.
Compile and run:
Note
The Scala client for Snowpark Connect for Spark is a preview feature.
Compile and run: