提交 Spark 应用程序

You can run Spark workloads in a non-interactive, asynchronous way directly on Snowflake’s infrastructure while you use familiar Spark semantics. With Snowpark Submit, you can submit production-ready Spark applications—such as ETL pipelines and scheduled data transformations—by using a simple CLI interface. In this way, you can maintain your existing Spark development workflows without a dedicated Spark cluster.

For example, you can package your PySpark ETL script, then use the Snowpark Submit CLI to run the script as a batch job on a Snowpark Container Services container. This method lets you automate nightly data pipelines with Apache Airflow or CI/CD tools. Your Spark code runs in cluster mode on Snowpark Container Services, scaling seamlessly with built-in dependency and resource management.

For examples of Snowpark Submit in use, see Snowpark Submit examples.

Snowpark Submit runs Spark workloads on Snowflake by using Snowpark Connect for Spark. For more information about Snowpark Connect for Spark, see Run Apache Spark™ workloads on Snowflake with Snowpark Connect for Spark.

Snowpark Submit offers the following benefits:

  • 无需外部 Spark 设置即可在 Snowflake 管理的基础设施上以集群模式运行
  • 工作流程集成,支持通过 CI/CD 管道、Apache Airflow 或基于 cron 的调度实现自动化
  • 支持 Python,支持跨语言重用现有的 Spark 应用程序
  • 依赖项管理,支持打包外部 Python 模块或 JARs

Note

snowpark-submit supports much of the same functionality as spark-submit. However, some functionality has been omitted because it is not needed when running Spark workloads on Snowflake.

Get started with Snowpark Submit

To get started using Snowpark Submit, follow these steps:

  1. Install Snowpark Submit by following the steps in Install Snowpark Submit.
  2. Study the Snowpark Submit examples.
  3. Get to know how to use Snowpark Submit with Snowpark Submit reference.