从 Snowflake 笔记本运行 Spark 工作负载¶
您可以通过 Snowflake 笔记本以交互方式运行 Spark 工作负载,而无需管理 Spark 集群。工作负载会在 Snowflake 基础设施上运行。
要使用 Snowflake 笔记本作为客户端,来开发在 Snowflake 上运行的 Spark 工作负载,请执行以下操作:
- 启动 Snowflake 笔记本。
- 在笔记本中启动 Spark 会话。
- Write PySpark code to load, transform, and analyze data—such as to filter high-value customer orders or aggregate revenue.
使用在仓库上运行的 Snowflake 笔记本¶
For more information about Snowflake Notebooks, see Create a notebook.
- 完成以下步骤,创建 Snowflake 笔记本:
-
Sign in to Snowsight.
-
At the top of the navigation menu, select
(Create) » Notebook » New Notebook. -
In the Create notebook dialog, enter a name, database, and schema for the new notebook.
For more information, see Create a notebook.
-
For Runtime, select Run on warehouse.
-
For Runtime version, select Snowflake Warehouse Runtime 2.0.
When you select version 2.0, you ensure that you have the dependency support you need, including Python 3.10. For more information, see Legacy Notebook runtimes.
-
For Query warehouse and Notebook warehouse, select warehouses for running query code and kernel and Python code, as described in Create a notebook.
-
Select Create.
-
In the notebook you created, under Packages, ensure that you have the following packages listed to support code in your notebook:
- Python 3.10 或更高版本
- snowpark-connect,最新版本
-
如果您需要添加这些包,请按以下步骤操作:
-
Under Anaconda Packages, type the packages name in the search box.
-
选中所需包名称。
-
Select Save.
-
To connect to the Snowpark Connect for Spark server and test the connection, copy the following code and paste it in the Python cell of the notebook you created:
使用在工作区中运行的 Snowflake 笔记本¶
For more information about Snowflake Notebooks in Workspaces, see Snowflake Notebooks in Workspaces.
- 创建 PyPI 外部访问集成。
您必须使用 ACCOUNTADMIN 角色并拥有一个您可以访问的数据库。
在工作区中从某个 SQL 文件运行以下命令。
-
在笔记本中启用 PyPI 集成。
- In the notebook, for Service name, select a service.
- For External access integrations, select the PyPI integration you created.
- For Python version, select Python 3.11.
- Select Create.
-
Install the
snowpark_connectpackage from PyPI in the notebook, using code such as the following: -
重新启动内核。
- From the Connect button, select Restart kernel.
-
Start the
snowpark_connectserver using code such as the following: -
运行您的 Spark 代码,如以下示例所示: