Snowpark Migration Accelerator:就绪度分数

Snowpark Migration Accelerator (SMA) 会评估您的代码并生成详细的评估数据。为了使这些信息更易于访问,SMA 会计算就绪度分数,以衡量您的代码迁移到 Snowflake 的难易程度。这些分数充当兼容性指标 – 分数越高,您的代码与 Snowflake 平台的兼容性越强。您只需运行 SMA 工具即可获得这些分数。

SMA 生成以下就绪度分数:

The Readiness Scores indicate how compatible your code is with Snowflake, not how much work remains to be done. Even with a high readiness score, the remaining incompatible code might still require significant effort to migrate. To accurately estimate the work needed for migration, review the complete assessment report. If you need help creating a migration plan or estimating the effort required, reach out to our team.

等级

Snowpark Migration Accelerator (SMA) 使用类似于交通信号灯的颜色编码评分系统:

  • 红色 – 检测到严重问题。立即停止并解决问题,因为它会严重影响迁移过程或阻碍准确的代码分析。在继续操作之前,请遵循提供的操作步骤。

  • 黄色 – 检测到警告。仔细查看操作步骤,了解对迁移的潜在影响。了解其中的含义之后,您可以继续下一步。

  • 绿色 – 未检测到重大问题。尽管这表明没有明显的迁移障碍,但代码可能仍需要调整。查看操作步骤并继续迁移过程。

如何解读分数

对于每个分数,您将获得:

  • 数值

  • 状态指示灯(如前所述,红色、黄色或绿色)

  • 建议的下一步操作

我们强烈建议您执行以下操作:

  • 按顺序查看分数 – 当您遇到红色分数时,立即调查并解决该问题

  • 查看每个分数的所有建议操作 – 查看所有结果的建议后续步骤,包括绿色分数,因为它们包含重要的操作项目

我们来看看系统中目前提供的就绪度分数。

Snowpark API Readiness Score

The Snowpark Migration Accelerator (SMA) generates a Snowpark API Readiness Score, which indicates how ready your code is for migration. It's important to note that this score only evaluates the usage of Spark API components and does not assess other elements such as third-party libraries or external dependencies in your code.

When SMA analyzes your code, it identifies all Spark API references, including both import statements and function calls. These references are documented in the Spark API Usages Inventory, which you can find in your local output directory. Each reference is classified as either "supported" or "not supported" according to the Spark Reference Categories. The readiness score is calculated by dividing the number of supported references by the total number of references found in your code.

Snowpark API Readiness Score Calculation

This score is displayed as a percentage, indicating how well Snowflake supports the Spark API references found in your code. A higher percentage means better compatibility with Snowflake. You can view this score in both the detailed report and the assessment summary sections of the application.

此处显示的就绪度分数是 SMA 生成的原始分数。对于仅显示一个就绪度分数的较新 SMA 版本,该分数专门用于衡量 Spark API 兼容性。

Snowpark API Readiness Levels

根据计算得出的分数,结果将归为三种类别之一:绿色、黄色或红色。应用程序和输出报告将根据您的分数类别提供具体建议。

The Snowpark API Readiness Score will be assigned one of these levels:

  • 绿色:支持大多数 Spark API 引用,这使得该工作负载成为迁移的有力候选对象。如果其他指标也为绿色,请考虑继续进行概念验证。

  • 黄色:不支持某些 Spark API 引用,这将需要额外的迁移工作量。后续步骤应包括创建不支持的项目清单和估算所需的转换工作量。

Snowpark Connect Readiness Score

The Snowpark Connect Readiness Score measures the percentage of Spark API references in your codebase that are supported by Snowpark Connect. This score provides an assessment of your existing Spark API code's readiness for execution within the Snowpark Connect environment.

How It's Calculated

During its execution, the SMA scans your codebase to identify all references to the Spark API. Examples of such references include import statements, function calls, and class instantiations. All discovered references are then logged in the Spark API Usages Inventory. This inventory is generated as a file in your local output directory. For every reference listed in the inventory, the SMA populates the IsSnowparkConnectToolSupported column, setting it to True if the API usage is supported by Snowpark Connect, or False if it is not.

To calculate the readiness score, the SMA takes all of the supported references and divides them by the total references found in the codebase:

Snowpark Connect Readiness Score Calculation

For example, if your codebase has 100 Spark API references and 90 of them are supported by Snowpark Connect, your Snowpark Connect Readiness Score would be 90%.

A higher percentage for the Snowpark Connect Readiness Score indicates a greater degree of compatibility with Snowpark Connect, suggesting that a larger portion of your Spark code aligns with functionalities supported by Snowpark Connect.

Readiness Levels

The compatibility analysis yields a readiness score, which is categorized into one of three distinct levels: Green, Yellow, or Red. Both the application's assessment summary and the generated detailed report will display this readiness level, accompanied by specific guidance tailored to the findings:

第三方 API 就绪度分数

第三方就绪度分数显示了多少导入的库可用于 Snowflake。为更好地理解该分数,我们首先解释下我们所说的“第三方”的含义:

第三方库:任何非由 Snowflake(或 Snowflake 中的 Snowpark)开发、维护或控制的包或库。

The readiness score indicates the percentage of external libraries and packages that are compatible with Snowflake. For Python code, compatibility means the package is available through the Anaconda package collection in Snowpark. For Scala or Java code, compatibility means the package is already included in Snowpark's core functionality.

就绪度分数的计算公式是:支持的第三方库导入数除以代码中第三方库导入总数。

第三方 API 就绪度分数计算

有关就绪度分数的重要信息:

  • Snowpark 支持的第三方库:这包括 Snowpark 支持的所有库(包括 org.apache.spark)

  • 第三方库调用总数:代码中所有第三方库调用的总和,包括 Spark 库和非 Spark 库,无论它们是否受 Snowpark 支持。

  • 只有导入使用情况清单中标记为“ThirdPartyLib”的导入才会计算在内。内部依赖项和代码库内部的导入不包括在内。

  • 此指标会计算调用总数,而不是唯一的库引用。例如,如果代码总共有 100 次库调用,其中有 80 次调用不受支持的库,20 次调用受支持的库,则支持分数将为 20%。这显示了代码中支持的库与不支持的库的实际使用频率,而不是唯一库引用的比例。

第三方 API 就绪级别

根据计算得出的分数,结果将归为三种类别之一:绿色、黄色或红色。应用程序和输出报告将根据您的分数类别提供具体建议。

第三方 API 就绪度分数将被分配以下级别之一:

  • 绿色 – 代码库使用 Snowflake 完全支持的 Python 库。无需额外配置。

  • Yellow - The codebase contains at least one Python package or library that is not currently supported in Snowpark. You can add unsupported third-party packages using several methods described in the third-party package documentation. To identify unsupported packages, review the Import Usages Inventory generated by SMA. Then analyze how these packages are used in your code and plan their implementation in Snowflake.

  • Red - The codebase heavily relies on packages or libraries not supported in Snowpark. This could mean either a single unsupported library is used extensively throughout the code, or multiple unsupported libraries are used across different parts of the codebase. A thorough assessment of these import statements is necessary to understand their impact. For guidance or assistance with package support, contact sma-support@snowflake.com.

SQL 就绪度分数

SQL 就绪度分数表明源代码中可以使用 Snowpark Migration Accelerator (SMA) 自动转换为 Snowflake SQL 的 SQL 元素的百分比。分数越高意味着可以自动转换的代码越多,迁移过程也就会变得更轻松、更快捷。

就绪度分数的计算公式是:可以转换的 SQL 元素数除以源代码中找到的 SQL 元素总数。

SQL 就绪度分数计算

SQL 就绪度分数级别

SQL 就绪度分数将被分配以下级别之一:

  • 绿色 – 此代码库中的大多数 SQL 要么由 Snowflake 直接支持,要么可以由 SMA 自动转换。虽然没有任何转换是完美的,但这种工作负载只需要极少的手动调整即可完成 Snowflake 迁移。

  • 黄色 – 此代码库中的某些 SQL 元素不受 Snowflake 支持,需要额外的工作量才能进行迁移。查看 SQL 元素清单中是否存在不支持的功能,并检查问题输出中的 EWI 以制定行动计划。您可能需要对代码进行细微调整或局部重新设计某些组件。

  • Red - A large portion of SQL in this codebase is not compatible with Snowflake, suggesting significant redesign may be necessary. To proceed, review the SQL Element Inventory for unsupported features and examine the EWI's in the issues output to develop a migration strategy. For assistance, contact sma-support@snowflake.com.


While readiness scores provide valuable insights, they should not be the only factor in determining a workload's migration readiness. Consider multiple aspects of your migration plan alongside these scores, as they serve as an initial assessment rather than a complete evaluation. If you notice any readiness metrics that could be improved or aren't accurately represented in the tool, let us know. The SMA team continuously works to enhance and refine these readiness measurements.