Snowpark Migration Accelerator: 代码提取

Snowpark Migration Accelerator (SMA) 会处理指定目录中的所有文件。虽然它会为每个文件创建清单,但它仅针对具有特定扩展名的文件分析其中的 Spark API 引用。

有几种方法可以将文件添加到此目录。

先将所有相关的代码文件放置到一个目录中,再执行迁移过程。

要从现有环境(例如 Databricks)中提取笔记本,可以使用提取脚本来帮助完成迁移过程。

提取脚本

Snowflake provides publicly available extraction scripts that you can find on the Snowflake Labs GitHub page (https://github.com/Snowflake-Labs/SC.DDLExportScripts/tree/main). For Spark migrations, these scripts support various platforms.

Databricks

For Jupyter (.ipynb) or Databricks (.dbc) notebooks that run in Databricks, you can directly place them in a directory for SMA analysis without any extraction. To learn how to export your Databricks notebook files, visit the Databricks documentation here: https://docs.databricks.com/en/notebooks/notebook-export-import.html#export-notebooks (https://docs.databricks.com/en/notebooks/notebook-export-import.html#export-notebooks).

For an alternative approach, you can follow the instructions and use the scripts available in the Databricks folder of the SC.DDLExportScripts repository: https://github.com/Snowflake-Labs/SC.DDLExportScripts/tree/main/Databricks (https://github.com/Snowflake-Labs/SC.DDLExportScripts/tree/main/Databricks)

其他数据提取相关信息将后续补充。