Reading Data from External Data Sources using Snowpark Python DB-API¶
With Snowpark Python DB-API, Snowpark Python users can programmatically pull data from external databases into Snowflake. It includes:
Python DB-API Support: Connect to external databases using Python’s standard DB-API 2.0 drivers.
Streamlined setup: Use
pip
to install the necessary drivers, with no need to manage additional dependencies.
With these APIs, you can seamlessly pull data into Snowflake tables and transform it using Snowpark DataFrames for advanced analytics.
Using Snowpark Python DB-API¶
The DB-API can be used in a similar way as the Spark JDBC API (https://spark.apache.org/docs/3.5.4/sql-data-sources-jdbc.html). Most parameters are designed to be identical or similar for better parity. At the same time, Snowpark emphasizes a Python-first design with intuitive naming conventions, avoiding JDBC-specific configurations. This provides Python developers with a familiar experience. For more information that compares Snowpark Python DB-API with the Spark JDBC API, see DB-API parameters.
DB-API parameters¶
Parameter |
Snowpark DB-API |
---|---|
|
Function to create a Python DB-API connection. |
|
Specifies the table in the source database. |
|
SQL query wrapped as a subquery for reading data. |
|
Partitioning column for parallel reads. |
|
Lower bound for partitioning. |
|
Upper bound for partitioning. |
|
Number of partitions for parallelism. |
|
Timeout for SQL execution (in seconds). |
|
Number of rows fetched per round trip. |
|
Custom schema for pulling data from external databases. |
|
Number of workers for parallel fetching and pulling data from external databases. |
|
List of conditions for WHERE clause partitions. |
|
Executes a SQL or PL/SQL statement upon session initialization. |
|
Execute the workload using a Snowflake UDTF for better performance. |
|
Number of fetched batches to be merged into a single Parquet file before uploading it. |
Understanding parallelism¶
Snowpark Python DB-API uses two independent forms of parallelism based on user input:
Partition-based parallelism
When users specify partitioning information (e.g.,
column
,lower_bound
,upper_bound
,num_partitions
) or predicates, Snowflake splits the query into multiple partitions. These are processed in parallel using multiprocessing, with each worker fetching and writing its own partition.Fetch-size-based parallelism within each partition
Within a partition, the API fetches rows in chunks defined by
fetch_size
. These rows are written to Snowflake in parallel as they are fetched, allowing reading and writing to overlap and maximize throughput.
These two forms of parallelism operate independently. If neither partitioning nor fetch_size
is specified, the function loads the entire source table into memory before writing to Snowflake. This can increase memory usage and reduce performance for large datasets.
SQL Server¶
Using DB-API to connect to SQL Server from a Snowpark client¶
To connect to SQL Server from Snowpark, you will need the following three packages:
Snowpark: snowflake-snowpark-python[pandas] (https://pypi.org/project/snowflake-snowpark-python/)
SQL Server ODBC Driver: Microsoft ODBC Driver for SQL Server (https://learn.microsoft.com/en-us/sql/connect/odbc/microsoft-odbc-driver-for-sql-server). By installing the driver, you agree to Microsoft’s EULA.
The open source pyodbc library: pyodbc (https://pypi.org/project/pyodbc/)
Below are the code examples needed to connect to SQL Server from Snowpark client and a stored procedure.
Install Python SQL Driver
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)" brew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release brew update HOMEBREW_ACCEPT_EULA=Y brew install msodbcsql18 mssql-tools18
Install
snowflake-snowpark-python[pandas]
andpyodbc
pip install snowflake-snowpark-python[pandas] pip install pyodbc
Define the factory method for creating connection to SQL Server
def create_sql_server_connection(): import pyodbc HOST = "mssql_host" PORT = "mssql_port" USERNAME = "mssql_username" PASSWORD = "mssql_password" DATABASE = "mssql_db" connection_str = ( "DRIVER={{ODBC Driver 18 for SQL Server}};" "SERVER={HOST},{PORT};" "DATABASE={DATABASE};" "UID={USERNAME};" "PWD={PASSWORD};" ) connection = pyodbc.connect(connection_str) return connection # Call dbapi to pull data from mssql_table df = session.read.dbapi( create_sql_server_connection, table="mssql_table")
Using DB-API to connect to SQL Server from a stored procedure¶
Configure External Access Integration, which is required to allow Snowflake to connect to the source endpoint.
Note
PrivateLink is recommended for secure data transfer, especially when dealing with sensitive information. Ensure that your Snowflake account has the necessary PrivateLink privileges enabled, and the PrivateLink feature is configured and active in your Snowflake Notebook environment.
Configure the secret, a network rule to allow egress to the source endpoint, and external access integration.
CREATE OR REPLACE SECRET mssql_secret TYPE = PASSWORD USERNAME = 'mssql_username' PASSWORD = 'mssql_password'; -- Configure a network rule to allow egress to the source endpoint CREATE OR REPLACE NETWORK RULE mssql_network_rule MODE = EGRESS TYPE = HOST_PORT VALUE_LIST = ('mssql_host:mssql_port'); -- Configure an external access integration CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION mssql_access_integration ALLOWED_NETWORK_RULES = (mssql_network_rule) ALLOWED_AUTHENTICATION_SECRETS = (mssql_secret) ENABLED = true; -- Create or replace a Python stored procedure CREATE OR REPLACE PROCEDURE sp_mssql_dbapi() RETURNS TABLE() LANGUAGE PYTHON RUNTIME_VERSION='3.11' HANDLER='run' PACKAGES=('snowflake-snowpark-python', 'pyodbc', 'msodbcsql') EXTERNAL_ACCESS_INTEGRATIONS = (mssql_access_integration) SECRETS = ('cred' = mssql_secret ) AS $$ # Get user name and password from mssql_secret import _snowflake username_password_object = _snowflake.get_username_password('cred') USER = username_password_object.username PASSWORD = username_password_object.password # Define a method to connect to SQL server_hostname from snowflake.snowpark import Session def create_sql_server_connection(): import pyodbc host = "mssql_host" port = mssql_port username = USER password = PASSWORD database = "mssql_database" connection_str = ( "DRIVER={{ODBC Driver 18 for SQL Server}};" "SERVER={host},{port};" "DATABASE={database};" "UID={username};" "PWD={password};" ) connection = pyodbc.connect(connection_str) return connection def run(session: Session): df = session.read.dbapi( create_sql_server_connection, table="mssql_table" ) return df $$; CALL sp_mssql_dbapi();
Oracle¶
To connect to Oracle from Snowpark, you will need the following two packages:
Snowpark: snowflake-snowpark-python[pandas] (https://pypi.org/project/snowflake-snowpark-python/)
The open source oracledb library: oracledb (https://pypi.org/project/oracledb/)
Below are the code examples needed to connect to Oracle from a Snowpark client, stored procedures, and Snowflake Notebooks.
Using DB-API to connect to Oracle from a Snowpark client¶
Install
snowflake-snowpark-python[pandas]
andoracledb
pip install snowflake-snowpark-python[pandas] pip install oradb
Use DB-API to pull data from Oracle and define the factory method for creating a connection to Oracle
def create_oracle_db_connection(): import oracledb HOST = "myhost" PORT = "myport" SERVICE_NAME = "myservice" USER = "myuser" PASSWORD = "mypassword" DSN = "{HOST}:{PORT}/{SERVICE_NAME}" connection = oracledb.connect( user=USER, password=PASSWORD, dsn=DSN ) return connection # Call dbapi to pull data from mytable df = session.read.dbapi( create_oracle_db_connection, table="mytable")
Using DB-API to connect to Oracle from a stored procedure¶
External Access Integration is required to allow Snowflake to connect to the source endpoint.
Note
PrivateLink is recommended for secure data transfer, especially when dealing with sensitive information. Ensure that your Snowflake account has the necessary PrivateLink privileges enabled, and the PrivateLink feature is configured and active in your Snowflake Notebook environment.
Configure the secret, a network rule to allow egress to the source endpoint and external access integration.
-- Configure the secret, a network rule to allow egress to the source endpoint and external access integration. CREATE OR REPLACE SECRET ora_secret TYPE = PASSWORD USERNAME = 'ora_username' PASSWORD = 'ora_password'; -- Configure a network rule to allow egress to the source endpoint CREATE OR REPLACE NETWORK RULE ora_network_rule MODE = EGRESS TYPE = HOST_PORT VALUE_LIST = ('ora_host:ora_port'); -- Configure an external access integration CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION ora_access_integration ALLOWED_NETWORK_RULES = (ora_network_rule) ALLOWED_AUTHENTICATION_SECRETS = (ora_secret) ENABLED = true;
Using Snowpark Python DB-API to pull data from Oracle in a Python stored procedure¶
CREATE OR REPLACE PROCEDURE sp_ora_dbapi()
RETURNS TABLE()
LANGUAGE PYTHON
RUNTIME_VERSION='3.11'
HANDLER='run'
PACKAGES=('snowflake-snowpark-python', 'oracledb')
EXTERNAL_ACCESS_INTEGRATIONS = (ora_access_integration)
SECRETS = ('cred' = ora_secret )
AS $$
# Get user name and password from ora_secret
import _snowflake
username_password_object = _snowflake.get_username_password('cred')
USER = username_password_object.username
PASSWORD = username_password_object.password
# Define the factory method for creating a connection to Oracle
from snowflake.snowpark import Session
def create_oracle_db_connection():
import oracledb
host = "ora_host"
port = "ora_port"
service_name = "ora_service"
user = USER
password = PASSWORD
DSN = "{host}:{port}/{service_name}"
connection = oracledb.connect(
user=USER,
password=PASSWORD,
dsn=DSN
)
return connection
def run(session: Session):
df = session.read.dbapi(
create_ora_connection,
table="ora_table"
)
return df
$$;
CALL sp_ora_dbapi();
Using DB-API to connect to Oracle from a Snowflake Notebook¶
Select
snowflake-snowpark-python
andoracledb
from Notebook packages.Configure the secret, a network rule to allow egress to the source endpoint and external access integration.
CREATE OR REPLACE SECRET ora_secret TYPE = PASSWORD USERNAME = 'ora_username' PASSWORD = 'ora_password'; ALTER NOTEBOOK mynotebook SET SECRETS = ('snowflake-secret-object' = ora_secret); -- Configure a network rule to allow egress to the source endpoint CREATE OR REPLACE NETWORK RULE ora_network_rule MODE = EGRESS TYPE = HOST_PORT VALUE_LIST = ('ora_host:ora_port'); -- Configure an external access integration. CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION ora_access_integration ALLOWED_NETWORK_RULES = (ora_network_rule) ALLOWED_AUTHENTICATION_SECRETS = (ora_secret) ENABLED = true;
Set up external access for Snowflake Notebooks and then restart Notebook session.
Using Snowpark Python DB-API to pull data from Oracle in a Python cell of Snowflake Notebook¶
# Get user name and password from ora_secret import _snowflake username_password_object = _snowflake.get_username_password('snowflake-secret-object') USER = username_password_object.username PASSWORD = username_password_object.password import snowflake.snowpark.context session = snowflake.snowpark.context.get_active_session() # Define the factory method for creating a connection to Oracle def create_oracle_db_connection(): import oracledb host = "ora_host" port = "ora_port" service_name = "ora_service" user = USER password = PASSWORD DSN = "{host}:{port}/{service_name}" connection = oracledb.connect( user=USER, password=PASSWORD, dsn=DSN ) return connection # Use dbapi to read data from ora_table df_ora = session.read.dbapi( create_oracle_db_connection, table='ora_table' ) # Save data into sf_table df_ora.write.mode("overwrite").save_as_table('sf_table')
PostgreSQL¶
To connect to PostgreSQL from Snowpark, you will need the following two packages:
Snowpark: snowflake-snowpark-python[pandas] (https://pypi.org/project/snowflake-snowpark-python/)
The open source psycopg2 library: psycopg2 (https://pypi.org/project/psycopg2/)
Below are the code examples needed to connect to PostgreSQL from Snowpark client, stored procedures, and Snowflake Notebooks.
Using DB-API to connect to PostgreSQL from a Snowpark client¶
Install
psycopg2
pip install psycopg2
Define the factory method for creating a connection to PostgreSQL
def create_pg_connection(): import psycopg2 connection = psycopg2.connect( host="pg_host", port=pg_port, dbname="pg_dbname", user="pg_user", password="pg_password", ) return connection # Call dbapi to pull data from pg_table df = session.read.dbapi( create_pg_connection, table="pg_table")
Using DB-API to connect to PostgreSQL from a stored procedure¶
Configure External Access Integration, which is required to allow Snowflake to connect to the source endpoint.
Note
PrivateLink is recommended for secure data transfer, especially when dealing with sensitive information. Ensure that your Snowflake account has the necessary PrivateLink privileges enabled, and the PrivateLink feature is configured and active in your Snowflake Notebook environment.
Configure the secret, a network rule to allow egress to the source endpoint and external access integration.
CREATE OR REPLACE SECRET pg_secret
TYPE = PASSWORD
USERNAME = 'pg_username'
PASSWORD = 'pg_password';
-- Configure a network rule.
CREATE OR REPLACE NETWORK RULE pg_network_rule
MODE = EGRESS
TYPE = HOST_PORT
VALUE_LIST = ('pg_host:pg_port');
-- Configure an external access integration.
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION pg_access_integration
ALLOWED_NETWORK_RULES = (pg_network_rule)
ALLOWED_AUTHENTICATION_SECRETS = (pg_secret)
ENABLED = true;
Use Snowpark Python DB-API to pull data from PostgreSQL in a Python stored procedure
CREATE OR REPLACE PROCEDURE sp_pg_dbapi()
RETURNS TABLE()
LANGUAGE PYTHON
RUNTIME_VERSION='3.11'
HANDLER='run'
PACKAGES=('snowflake-snowpark-python', 'psycopg2')
EXTERNAL_ACCESS_INTEGRATIONS = (pg_access_integration)
SECRETS = ('cred' = pg_secret )
AS $$
# Get user name and password from pg_secret
import _snowflake
username_password_object = _snowflake.get_username_password('cred')
USER = username_password_object.username
PASSWORD = username_password_object.password
# Define the factory method for creating a connection to PostgreSQL
from snowflake.snowpark import Session
def create_pg_connection():
import psycopg2
connection = psycopg2.connect(
host="pg_host",
port=pg_port,
dbname="pg_dbname",
user=USER,
password=PASSWORD,
)
return connection
def run(session: Session):
df = session.read.dbapi(
create_pg_connection,
table="pg_table"
)
return df
$$;
CALL sp_pg_dbapi();
Using DB-API to connect to PostgreSQL from a Snowflake Notebook¶
Select
snowflake-snowpark-python
andpsycopg2
from Snowflake Notebook packages.Configure External Access Integration, which is required to allow Snowflake to connect to the source endpoint.
Note
PrivateLink is recommended for secure data transfer, especially when dealing with sensitive information. Ensure that your Snowflake account has the necessary PrivateLink privileges enabled, and the PrivateLink feature is configured and active in your Snowflake Notebook environment.
Configure the secret, a network rule to allow egress to the source endpoint and external access integration.
-- Configure the secret
CREATE OR REPLACE SECRET pg_secret
TYPE = PASSWORD
USERNAME = 'pg_username'
PASSWORD = 'pg_password';
ALTER NOTEBOOK pg_notebook SET SECRETS = ('snowflake-secret-object' = pg_secret);
-- Configure the network rule to allow egress to the source endpoint
CREATE OR REPLACE NETWORK RULE pg_network_rule
MODE = EGRESS
TYPE = HOST_PORT
VALUE_LIST = ('pg_host:pg_port');
-- Configure external access integration
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION pg_access_integration
ALLOWED_NETWORK_RULES = (pg_network_rule)
ALLOWED_AUTHENTICATION_SECRETS = (pg_secret)
ENABLED = true;
Set up external access for Snowflake Notebooks and then restart Notebook session.
# Get the user name and password from :code:`pg_secret` import _snowflake username_password_object = _snowflake.get_username_password('snowflake-secret-object') USER = username_password_object.username PASSWORD = username_password_object.password import snowflake.snowpark.context session = snowflake.snowpark.context.get_active_session() # Define the factory method for creating a connection to PostgreSQL def create_pg_connection(): import psycopg2 connection = psycopg2.connect( host="pg_host", port=pg_port, dbname="pg_dbname", user=USER, password=PASSWORD, ) return connection # Use dbapi to read and save data from pg_table df = session.read.dbapi( create_pg_connection, table="pg_table" ) # Save data into sf_table df.write.mode("overwrite").save_as_table('sf_table')
MySQL¶
To connect to MySQL from Snowpark, you will need the following two packages:
Snowpark: snowflake-snowpark-python[pandas] (https://pypi.org/project/snowflake-snowpark-python/)
The open source pymysql library: PyMySQL (https://pypi.org/project/PyMySQL/)
Below are the code examples needed to connect to MySQL from Snowpark client, stored procedures, and Snowflake Notebook.
Using DB-API to connect to MySQL from a Snowpark client¶
Install pymysql
pip install snowflake-snowpark-python[pandas] pip install pymysql
Define the factory method for creating a connection to MySQL
def create_mysql_connection():
import pymysql
connection = pymysql.connect(
host="mysql_host",
port=mysql_port,
database="mysql_db",
user="mysql_user",
password="mysql_password"
)
return connection
# Call dbapi to pull data from mysql_table
df = session.read.dbapi(
create_mysql_connection,
table="mysql_table"
)
Using DB-API to connect to MySQL from a stored procedure¶
Configure External Access Integration, which is required to allow Snowflake to connect to the source endpoint.
Note
PrivateLink is recommended for secure data transfer, especially when dealing with sensitive information. Ensure that your Snowflake account has the necessary PrivateLink privileges enabled, and the PrivateLink feature is configured and active in your Snowflake Notebook environment.
Configure the secret, a network rule to allow egress to the source endpoint and external access integration.
CREATE OR REPLACE SECRET mysql_secret
TYPE = PASSWORD
USERNAME = 'mysql_username'
PASSWORD = 'mysql_password';
-- Configure a network rule.
CREATE OR REPLACE NETWORK RULE mysql_network_rule
MODE = EGRESS
TYPE = HOST_PORT
VALUE_LIST = ('mysql_host:mysql_port');
-- Configure an external access integration
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION mysql_access_integration
ALLOWED_NETWORK_RULES = (mysql_network_rule)
ALLOWED_AUTHENTICATION_SECRETS = (mysql_secret)
ENABLED = true;
Use Snowpark Python DB-API to pull data from MySQL in a Python stored procedure.
CREATE OR REPLACE PROCEDURE sp_mysql_dbapi()
RETURNS TABLE()
LANGUAGE PYTHON
RUNTIME_VERSION='3.11'
HANDLER='run'
PACKAGES=('snowflake-snowpark-python', 'pymysql')
EXTERNAL_ACCESS_INTEGRATIONS = (mysql_access_integration)
SECRETS = ('cred' = mysql_secret )
AS $$
# Get user name and password from mysql_secret
import _snowflake
username_password_object = _snowflake.get_username_password('cred')
USER = username_password_object.username
PASSWORD = username_password_object.password
# Define the factory method for creating a connection to MySQL
from snowflake.snowpark import session
def create_mysql_connection():
import pymysql
connection = pymysql.connect(
host="mysql_host",
port=mysql_port,
dbname="mysql_dbname",
user=USER,
password=PASSWORD,
)
return connection
def run(session: Session):
df = session.read.dbapi(
create_mysql_connection,
table="mysql_table"
)
return df
$$;
CALL sp_mysql_dbapi();
Using DB-API to connect to MySQL from a Snowflake Notebook¶
Select
snowflake-snowpark-python
andpymysql
from Snowflake Notebook packages.Configure External Access Integration, which is required to allow Snowflake to connect to the source endpoint.
Note
PrivateLink is recommended for secure data transfer, especially when dealing with sensitive information. Ensure that your Snowflake account has the necessary PrivateLink privileges enabled, and the PrivateLink feature is configured and active in your Snowflake Notebook environment.
Configure the secret and add it to the Snowflake Notebook.
CREATE OR REPLACE SECRET mysql_secret
TYPE = PASSWORD
USERNAME = 'mysql_username'
PASSWORD = 'mysql_password';
ALTER NOTEBOOK mynotebook SET SECRETS = ('snowflake-secret-object' = mysql_secret);
Configure a network rule and an external access integration.
CREATE OR REPLACE NETWORK RULE mysql_network_rule
MODE = EGRESS
TYPE = HOST_PORT
VALUE_LIST = ('mysql_host:mysql_port');
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION mysql_access_integration
ALLOWED_NETWORK_RULES = (mysql_network_rule)
ALLOWED_AUTHENTICATION_SECRETS = (mysql_secret)
ENABLED = true;
Set up external access for Snowflake Notebooks and then restart Notebook session.
# Get user name and password from mysql_secret
import _snowflake
username_password_object = _snowflake.get_username_password('snowflake-secret-object')
USER = username_password_object.username
PASSWORD = username_password_object.password
import snowflake.snowpark.context
session = snowflake.snowpark.context.get_active_session()
# Define the factory method for creating a connection to MySQL
def create_mysql_connection():
import pymysql
connection = pymysql.connect(
host="mysql_host",
port=mysql_port,
dbname="mysql_dbname",
user=USER,
password=PASSWORD,
)
return connection
# Call dbapi to pull data from mysql_table
df = session.read.dbapi(
create_mysql_connection,
table="mysql_table")
# Save data into sf_table
df.write.mode("overwrite").save_as_table('sf_table')
Databricks¶
To connect to Databricks from Snowpark, you will need the following two packages:
Snowpark: snowflake-snowpark-python[pandas] (https://pypi.org/project/snowflake-snowpark-python/)
The open source psycopg2 library: databricks-sql-connector (https://pypi.org/project/databricks-sql-connector/)
Below are the code examples needed to connect to Databricks from Snowpark client, stored procedures, and Snowflake Notebook.
Using DB-API to connect to Databricks from a Snowpark client¶
Install
databricks-sql-connector
:
pip install snowflake-snowpark-python[pandas]
pip install databricks-sql-connector
Define the factory method for creating a connection to Databricks
def create_dbx_connection():
import databricks.sql
connection = databricks.sql.connect(
server_hostname=HOST,
http_path=PATH,
access_token=ACCESS_TOKEN
)
return connection
#Call dbapi to pull data from mytable
df = session.read.dbapi(
create_dbx_connection,
table="dbx_table")
Using DB-API to connect to Databricks from a stored procedure¶
Select
snowflake-snowpark-python
andpymysql
from Snowflake Notebook packages.External Access Integration is required to allow Snowflake to connect to the source endpoint.
Note
PrivateLink is recommended for secure data transfer, especially when dealing with sensitive information. Ensure that your Snowflake account has the necessary PrivateLink privileges enabled, and the PrivateLink feature is configured and active in your Snowflake Notebook environment.
Configure the secret, a network rule to allow egress to the source endpoint and external access integration.
CREATE OR REPLACE SECRET dbx_secret
TYPE = GENERIC_STRING
SECRET_STRING = 'dbx_access_token';
-- Configure a network rule
CREATE OR REPLACE NETWORK RULE dbx_network_rule
MODE = EGRESS
TYPE = HOST_PORT
VALUE_LIST = ('dbx_host:dbx_port');
-- Configure an external access integration
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION dbx_access_integration
ALLOWED_NETWORK_RULES = (dbx_network_rule)
ALLOWED_AUTHENTICATION_SECRETS = (dbx_secret)
ENABLED = true;
Use Snowpark Python DB-API to pull data from Databricks in a Python stored procedure
CREATE OR REPLACE PROCEDURE sp_dbx_dbapi() RETURNS TABLE() LANGUAGE PYTHON RUNTIME_VERSION='3.11' HANDLER='run' PACKAGES=('snowflake-snowpark-python', 'databricks-sql-connector') EXTERNAL_ACCESS_INTEGRATIONS = (dbx_access_integration) SECRETS = ('cred' = dbx_secret ) AS $$ # Get user name and password from dbx_secret import _snowflake ACCESS_TOKEN = _snowflake.get_generic_secret_string('cred') from snowflake.snowpark import Session # define the method for creating a connection to Databricks def create_dbx_connection(): import databricks.sql connection = databricks.sql.connect( server_hostname="dbx_host", http_path="dbx_path", access_token=ACCESS_TOKEN, ) return connection def run(session: Session): df = session.read.dbapi( create_dbx_connection, table="dbx_table" ) return df $$; CALL sp_dbx_dbapi();
Using DB-API to connect to Databricks from a Snowflake Notebook¶
Select
snowflake-snowpark-python
andpymysql
from Snowflake Notebook packages.Configure External Access Integration, which is required to allow Snowflake to connect to the source endpoint.
Note
PrivateLink is recommended for secure data transfer, especially when dealing with sensitive information. Ensure that your Snowflake account has the necessary PrivateLink privileges enabled, and the PrivateLink feature is configured and active in your Snowflake Notebook environment.
Configure the secret and add it to the Snowflake Notebook.
CREATE OR REPLACE SECRET dbx_secret TYPE = GENERIC_STRING SECRET_STRING = 'dbx_access_token'; ALTER NOTEBOOK mynotebook SET SECRETS = ('snowflake-secret-object' = dbx_secret);
Configure
CREATE OR REPLACE NETWORK RULE dbx_network_rule MODE = EGRESS TYPE = HOST_PORT VALUE_LIST = ('dbx_host:dbx_port'); CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION dbx_access_integration ALLOWED_NETWORK_RULES = (dbx_network_rule) ALLOWED_AUTHENTICATION_SECRETS = (dbx_secret) ENABLED = true;
Set up external access for Snowflake Notebooks and then restart Notebook session.
# Get user name and password from dbx_secret import _snowflake ACCESS_TOKEN = _snowflake.get_generic_secret_string('cred') import snowflake.snowpark.context session = snowflake.snowpark.context.get_active_session() # Define the factory method for creating a connection to Databricks def create_dbx_connection(): import databricks.sql connection = databricks.sql.connect( server_hostname="dbx_host", http_path="dbx_path", access_token=ACCESS_TOKEN, ) return connection # use dbapi to read data from dbx_table df = session.read.dbapi( create_dbx_connection, table="dbx_table" ) # save data into sf_table df.write.mode("overwrite").save_as_table('sf_table')
Limitations¶
Drivers¶
Snowpark Python DB-API supports only Python DB-API 2.0–compliant drivers (e.g., pyodbc
, oracledb
). JDBC drivers are not supported in this release.