Snowpark Migration Accelerator:Spark 的问题代码 – Scala

SPRKSCL1126

消息:org.apache.spark.sql.functions.covar_pop 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.covar_pop (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

输入

Below is an example of the org.apache.spark.sql.functions.covar_pop function, first used with column names as the arguments and then with column objects.

val df = Seq(
  (10.0, 100.0),
  (20.0, 150.0),
  (30.0, 200.0),
  (40.0, 250.0),
  (50.0, 300.0)
).toDF("column1", "column2")

val result1 = df.select(covar_pop("column1", "column2").as("covariance_pop"))
val result2 = df.select(covar_pop(col("column1"), col("column2")).as("covariance_pop"))

输出

The SMA adds the EWI SPRKSCL1126 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(
  (10.0, 100.0),
  (20.0, 150.0),
  (30.0, 200.0),
  (40.0, 250.0),
  (50.0, 300.0)
).toDF("column1", "column2")

/*EWI: SPRKSCL1126 => org.apache.spark.sql.functions.covar_pop has a workaround, see documentation for more info*/
val result1 = df.select(covar_pop("column1", "column2").as("covariance_pop"))
/*EWI: SPRKSCL1126 => org.apache.spark.sql.functions.covar_pop has a workaround, see documentation for more info*/
val result2 = df.select(covar_pop(col("column1"), col("column2")).as("covariance_pop"))

推荐修复方法

Snowpark has an equivalent covar_pop function that receives two column objects as arguments. For that reason, the Spark overload that receives two column objects as arguments is directly supported by Snowpark and does not require any changes.

For the overload that receives two string arguments, you can convert the strings into column objects using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(
  (10.0, 100.0),
  (20.0, 150.0),
  (30.0, 200.0),
  (40.0, 250.0),
  (50.0, 300.0)
).toDF("column1", "column2")

val result1 = df.select(covar_pop(col("column1"), col("column2")).as("covariance_pop"))
val result2 = df.select(covar_pop(col("column1"), col("column2")).as("covariance_pop"))

其他建议

SPRKSCL1112

Message: *spark element* is not supported

类别:转换错误

描述

当 SMA 检测到使用了 Snowpark 不支持的 Spark 元素且它没有自己的相关错误代码时,就会出现此问题。这是 SMA 对任何不支持的 Spark 元素使用的通用错误代码。

场景

输入

以下是 Snowpark 不支持的 Spark 元素的示例,因此它会生成此 EWI。

val df = session.range(10)
val result = df.isLocal

输出

The SMA adds the EWI SPRKSCL1112 to the output code to let you know that this element is not supported by Snowpark.

val df = session.range(10)
/*EWI: SPRKSCL1112 => org.apache.spark.sql.Dataset.isLocal is not supported*/
val result = df.isLocal

推荐修复方法

由于这是适用于一系列不支持的函数的通用错误代码,因此没有单一的具体修复方法。相应的操作将取决于所使用的特定元素。

请注意,尽管不支持该元素,但这并不一定意味着找不到解决方案或替代方案。这仅意味着 SMA 本身无法找到解决方案。

其他建议

SPRKSCL1143

消息:加载符号表时出错

类别:转换错误

描述

如果加载 SMA 符号表的符号时出错,就会出现此问题。符号表是 SMA 基础架构的一部分,允许进行更复杂的转换。

其他建议

  • This is unlikely to be an error in the source code itself, but rather is an error in how the SMA processes the source code. The best resolution would be to post an issue in the SMA.

  • For more support, you can email us at sma-support@snowflake.com or post an issue in the SMA.

SPRKSCL1153

警告

This issue code has been deprecated since Spark Conversion Core Version 4.3.2

消息:org.apache.spark.sql.functions.max 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.max (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.max function, first used with a column name as an argument and then with a column object.

val df = Seq(10, 12, 20, 15, 18).toDF("value")
val result1 = df.select(max("value"))
val result2 = df.select(max(col("value")))

输出

The SMA adds the EWI SPRKSCL1153 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(10, 12, 20, 15, 18).toDF("value")
/*EWI: SPRKSCL1153 => org.apache.spark.sql.functions.max has a workaround, see documentation for more info*/
val result1 = df.select(max("value"))
/*EWI: SPRKSCL1153 => org.apache.spark.sql.functions.max has a workaround, see documentation for more info*/
val result2 = df.select(max(col("value")))

推荐修复方法

Snowpark has an equivalent max function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(10, 12, 20, 15, 18).toDF("value")
val result1 = df.select(max(col("value")))
val result2 = df.select(max(col("value")))

其他建议

SPRKSCL1102

This issue code has been deprecated since Spark Conversion Core 2.3.22

消息:不支持 Explode

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.explode (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which is not supported by Snowpark.

场景

输入

Below is an example of the org.apache.spark.sql.functions.explode function used to get the consolidated information of the array fields of the dataset.

    val explodeData = Seq(
      Row("Cat", Array("Gato","Chat")),
      Row("Dog", Array("Perro","Chien")),
      Row("Bird", Array("Ave","Oiseau"))
    )

    val explodeSchema = StructType(
      List(
        StructField("Animal", StringType),
        StructField("Translation", ArrayType(StringType))
      )
    )

    val rddExplode = session.sparkContext.parallelize(explodeData)

    val dfExplode = session.createDataFrame(rddExplode, explodeSchema)

    dfExplode.select(explode(dfExplode("Translation").alias("exploded")))

输出

The SMA adds the EWI SPRKSCL1102 to the output code to let you know that this function is not supported by Snowpark.

    val explodeData = Seq(
      Row("Cat", Array("Gato","Chat")),
      Row("Dog", Array("Perro","Chien")),
      Row("Bird", Array("Ave","Oiseau"))
    )

    val explodeSchema = StructType(
      List(
        StructField("Animal", StringType),
        StructField("Translation", ArrayType(StringType))
      )
    )

    val rddExplode = session.sparkContext.parallelize(explodeData)

    val dfExplode = session.createDataFrame(rddExplode, explodeSchema)

    /*EWI: SPRKSCL1102 => Explode is not supported */
    dfExplode.select(explode(dfExplode("Translation").alias("exploded")))

推荐修复方法

Since explode is not supported by Snowpark, the function flatten could be used as a substitute.

以下修复方法先扁平化处理 dfExplode 数据框,然后在 Spark 中执行查询以复制结果。

    val explodeData = Seq(
      Row("Cat", Array("Gato","Chat")),
      Row("Dog", Array("Perro","Chien")),
      Row("Bird", Array("Ave","Oiseau"))
    )

    val explodeSchema = StructType(
      List(
        StructField("Animal", StringType),
        StructField("Translation", ArrayType(StringType))
      )
    )

    val rddExplode = session.sparkContext.parallelize(explodeData)

    val dfExplode = session.createDataFrame(rddExplode, explodeSchema)

     var dfFlatten = dfExplode.flatten(col("Translation")).alias("exploded")
                              .select(col("exploded.value").alias("Translation"))

其他建议

SPRKSCL1136

警告

This issue code is deprecated since Spark Conversion Core 4.3.2

消息:org.apache.spark.sql.functions.min 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.min (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.min function, first used with a column name as an argument and then with a column object.

val df = Seq(1, 3, 10, 1, 3).toDF("value")
val result1 = df.select(min("value"))
val result2 = df.select(min(col("value")))

输出

The SMA adds the EWI SPRKSCL1136 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(1, 3, 10, 1, 3).toDF("value")
/*EWI: SPRKSCL1136 => org.apache.spark.sql.functions.min has a workaround, see documentation for more info*/
val result1 = df.select(min("value"))
/*EWI: SPRKSCL1136 => org.apache.spark.sql.functions.min has a workaround, see documentation for more info*/
val result2 = df.select(min(col("value")))

推荐修复方法

Snowpark has an equivalent min function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that takes a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(1, 3, 10, 1, 3).toDF("value")
val result1 = df.select(min(col("value")))
val result2 = df.select(min(col("value")))

其他建议

SPRKSCL1167

消息:在输入文件夹中找不到项目文件

类别:警告

描述

当 SMA 检测到输入文件夹中没有任何项目配置文件时,就会出现此问题。SMA 支持的项目配置文件包括:

  • build.sbt

  • build.gradle

  • pom.xml

其他建议

SPRKSCL1147

消息:org.apache.spark.sql.functions.tanh 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.tanh (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.tanh function, first used with a column name as an argument and then with a column object.

val df = Seq(-1.0, 0.5, 1.0, 2.0).toDF("value")
val result1 = df.withColumn("tanh_value", tanh("value"))
val result2 = df.withColumn("tanh_value", tanh(col("value")))

输出

The SMA adds the EWI SPRKSCL1147 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(-1.0, 0.5, 1.0, 2.0).toDF("value")
/*EWI: SPRKSCL1147 => org.apache.spark.sql.functions.tanh has a workaround, see documentation for more info*/
val result1 = df.withColumn("tanh_value", tanh("value"))
/*EWI: SPRKSCL1147 => org.apache.spark.sql.functions.tanh has a workaround, see documentation for more info*/
val result2 = df.withColumn("tanh_value", tanh(col("value")))

推荐修复方法

Snowpark has an equivalent tanh function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(-1.0, 0.5, 1.0, 2.0).toDF("value")
val result1 = df.withColumn("tanh_value", tanh(col("value")))
val result2 = df.withColumn("tanh_value", tanh(col("value")))

其他建议

SPRKSCL1116

警告

This issue code has been deprecated since Spark Conversion Core Version 2.40.1

消息:org.apache.spark.sql.functions.split 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.split (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.split function that generates this EWI.

val df = Seq("apple,banana,orange", "grape,lemon,lime", "cherry,blueberry,strawberry").toDF("values")
val result1 = df.withColumn("split_values", split(col("values"), ","))
val result2 = df.withColumn("split_values", split(col("values"), ",", 0))

输出

The SMA adds the EWI SPRKSCL1116 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq("apple,banana,orange", "grape,lemon,lime", "cherry,blueberry,strawberry").toDF("values")
/*EWI: SPRKSCL1116 => org.apache.spark.sql.functions.split has a workaround, see documentation for more info*/
val result1 = df.withColumn("split_values", split(col("values"), ","))
/*EWI: SPRKSCL1116 => org.apache.spark.sql.functions.split has a workaround, see documentation for more info*/
val result2 = df.withColumn("split_values", split(col("values"), ",", 0))

推荐修复方法

For the Spark overload that receives two arguments, you can convert the second argument into a column object using the com.snowflake.snowpark.functions.lit function as a workaround.

Snowpark 尚不支持接收三个实参的重载,也没有替代方案。

val df = Seq("apple,banana,orange", "grape,lemon,lime", "cherry,blueberry,strawberry").toDF("values")
val result1 = df.withColumn("split_values", split(col("values"), lit(",")))
val result2 = df.withColumn("split_values", split(col("values"), ",", 0)) // This overload is not supported yet

其他建议

SPRKSCL1122

消息:org.apache.spark.sql.functions.corr 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.corr (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.corr function, first used with column names as the arguments and then with column objects.

val df = Seq(
  (10.0, 20.0),
  (20.0, 40.0),
  (30.0, 60.0)
).toDF("col1", "col2")

val result1 = df.select(corr("col1", "col2"))
val result2 = df.select(corr(col("col1"), col("col2")))

输出

The SMA adds the EWI SPRKSCL1122 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(
  (10.0, 20.0),
  (20.0, 40.0),
  (30.0, 60.0)
).toDF("col1", "col2")

/*EWI: SPRKSCL1122 => org.apache.spark.sql.functions.corr has a workaround, see documentation for more info*/
val result1 = df.select(corr("col1", "col2"))
/*EWI: SPRKSCL1122 => org.apache.spark.sql.functions.corr has a workaround, see documentation for more info*/
val result2 = df.select(corr(col("col1"), col("col2")))

推荐修复方法

Snowpark has an equivalent corr function that receives two column objects as arguments. For that reason, the Spark overload that receives column objects as arguments is directly supported by Snowpark and does not require any changes.

For the overload that receives two string arguments, you can convert the strings into column objects using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(
  (10.0, 20.0),
  (20.0, 40.0),
  (30.0, 60.0)
).toDF("col1", "col2")

val result1 = df.select(corr(col("col1"), col("col2")))
val result2 = df.select(corr(col("col1"), col("col2")))

其他建议

SPRKSCL1173

消息:无法处理嵌入式 SQL 代码。

类别:警告。

描述

当 SMA 检测到无法处理的嵌入式 SQL 代码时,就会出现此问题。然后,嵌入式 SQL 代码无法转换为 Snowflake。

场景

输入

以下是无法处理的嵌入式 SQL 代码的示例。

spark.sql("CREATE VIEW IF EXISTS My View" + "AS Select * From my Table WHERE date < current_date()")

输出

The SMA adds the EWI SPRKSCL1173 to the output code to let you know that the SQL-embedded code can not be processed.

/*EWI: SPRKSCL1173 => SQL embedded code cannot be processed.*/
spark.sql("CREATE VIEW IF EXISTS My View" + "AS Select * From my Table WHERE date < current_date()")

推荐修复方法

确保嵌入式 SQL 代码是不含插值、变量或字符串拼接的字符串。

其他建议

SPRKSCL1163

消息:该元素非字面量,无法进行计算。

类别:转换错误。

描述

如果当前处理元素非字面量,而 SMA 无法对其进行计算,就会出现此问题。

场景

输入

以下示例展示了当待处理元素非字面量时,SMA 无法对其进行计算的情况。

val format_type = "csv"
spark.read.format(format_type).load(path)

输出

The SMA adds the EWI SPRKSCL1163 to the output code to let you know that format_type parameter is not a literal and it can not be evaluated by the SMA.

/*EWI: SPRKSCL1163 => format_type is not a literal and can't be evaluated*/
val format_type = "csv"
spark.read.format(format_type).load(path)

推荐修复方法

  • 确保变量的值是有效的,以避免意外行为。

其他建议

SPRKSCL1132

消息:org.apache.spark.sql.functions.grouping_id 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.grouping_id (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.grouping_id function, first used with multiple column name as arguments and then with column objects.

val df = Seq(
  ("Store1", "Product1", 100),
  ("Store1", "Product2", 150),
  ("Store2", "Product1", 200),
  ("Store2", "Product2", 250)
).toDF("store", "product", "amount")

val result1 = df.cube("store", "product").agg(sum("amount"), grouping_id("store", "product"))
val result2 = df.cube("store", "product").agg(sum("amount"), grouping_id(col("store"), col("product")))

输出

The SMA adds the EWI SPRKSCL1132 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(
  ("Store1", "Product1", 100),
  ("Store1", "Product2", 150),
  ("Store2", "Product1", 200),
  ("Store2", "Product2", 250)
).toDF("store", "product", "amount")

/*EWI: SPRKSCL1132 => org.apache.spark.sql.functions.grouping_id has a workaround, see documentation for more info*/
val result1 = df.cube("store", "product").agg(sum("amount"), grouping_id("store", "product"))
/*EWI: SPRKSCL1132 => org.apache.spark.sql.functions.grouping_id has a workaround, see documentation for more info*/
val result2 = df.cube("store", "product").agg(sum("amount"), grouping_id(col("store"), col("product")))

推荐修复方法

Snowpark has an equivalent grouping_id function that receives multiple column objects as arguments. For that reason, the Spark overload that receives multiple column objects as arguments is directly supported by Snowpark and does not require any changes.

For the overload that receives multiple string arguments, you can convert the strings into column objects using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(
  ("Store1", "Product1", 100),
  ("Store1", "Product2", 150),
  ("Store2", "Product1", 200),
  ("Store2", "Product2", 250)
).toDF("store", "product", "amount")

val result1 = df.cube("store", "product").agg(sum("amount"), grouping_id(col("store"), col("product")))
val result2 = df.cube("store", "product").agg(sum("amount"), grouping_id(col("store"), col("product")))

其他建议

SPRKSCL1106

警告

此问题代码已 弃用

消息:不支持 Writer 选项。

类别:转换错误。

描述

当该工具在 writer 语句中检测到使用了 Snowpark 不支持的选项时,就会出现此问题。

场景

输入

Below is an example of the org.apache.spark.sql.DataFrameWriter.option (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameWriter.html) used to add options to a writer statement.

df.write.format("net.snowflake.spark.snowflake").option("dbtable", tablename)

输出

The SMA adds the EWI SPRKSCL1106 to the output code to let you know that the option method is not supported by Snowpark.

df.write.saveAsTable(tablename)
/*EWI: SPRKSCL1106 => Writer option is not supported .option("dbtable", tablename)*/

推荐修复方法

对于此场景,没有推荐的修复方法

其他建议

SPRKSCL1157

消息:org.apache.spark.sql.functions.kurtosis 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.kurtosis (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.kurtosis function that generates this EWI. In this example, the kurtosis function is used to calculate the kurtosis of selected column.

val df = Seq("1", "2", "3").toDF("elements")
val result1 = kurtosis(col("elements"))
val result2 = kurtosis("elements")

输出

The SMA adds the EWI SPRKSCL1157 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq("1", "2", "3").toDF("elements")
/*EWI: SPRKSCL1157 => org.apache.spark.sql.functions.kurtosis has a workaround, see documentation for more info*/
val result1 = kurtosis(col("elements"))
/*EWI: SPRKSCL1157 => org.apache.spark.sql.functions.kurtosis has a workaround, see documentation for more info*/
val result2 = kurtosis("elements")

推荐修复方法

Snowpark has an equivalent kurtosis function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq("1", "2", "3").toDF("elements")
val result1 = kurtosis(col("elements"))
val result2 = kurtosis(col("elements"))

其他建议

SPRKSCL1146

消息:org.apache.spark.sql.functions.tan 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.tan (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.tan function, first used with a column name as an argument and then with a column object.

val df = Seq(math.Pi / 4, math.Pi / 3, math.Pi / 6).toDF("angle")
val result1 = df.withColumn("tan_value", tan("angle"))
val result2 = df.withColumn("tan_value", tan(col("angle")))

输出

The SMA adds the EWI SPRKSCL1146 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(math.Pi / 4, math.Pi / 3, math.Pi / 6).toDF("angle")
/*EWI: SPRKSCL1146 => org.apache.spark.sql.functions.tan has a workaround, see documentation for more info*/
val result1 = df.withColumn("tan_value", tan("angle"))
/*EWI: SPRKSCL1146 => org.apache.spark.sql.functions.tan has a workaround, see documentation for more info*/
val result2 = df.withColumn("tan_value", tan(col("angle")))

推荐修复方法

Snowpark has an equivalent tan function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(math.Pi / 4, math.Pi / 3, math.Pi / 6).toDF("angle")
val result1 = df.withColumn("tan_value", tan(col("angle")))
val result2 = df.withColumn("tan_value", tan(col("angle")))

其他建议

SPRKSCL1117

警告

This issue code is deprecated since Spark Conversion Core 2.40.1

消息:org.apache.spark.sql.functions.translate 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.translate (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.translate function that generates this EWI. In this example, the translate function is used to replace the characters 'a', 'e' and 'o' in each word with '1', '2' and '3', respectively.

val df = Seq("hello", "world", "scala").toDF("word")
val result = df.withColumn("translated_word", translate(col("word"), "aeo", "123"))

输出

The SMA adds the EWI SPRKSCL1117 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq("hello", "world", "scala").toDF("word")
/*EWI: SPRKSCL1117 => org.apache.spark.sql.functions.translate has a workaround, see documentation for more info*/
val result = df.withColumn("translated_word", translate(col("word"), "aeo", "123"))

推荐修复方法

As a workaround, you can convert the second and third argument into a column object using the com.snowflake.snowpark.functions.lit function.

val df = Seq("hello", "world", "scala").toDF("word")
val result = df.withColumn("translated_word", translate(col("word"), lit("aeo"), lit("123")))

其他建议

SPRKSCL1123

消息:org.apache.spark.sql.functions.cos 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.cos (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.cos function, first used with a column name as an argument and then with a column object.

val df = Seq(0.0, Math.PI / 4, Math.PI / 2, Math.PI).toDF("angle_radians")
val result1 = df.withColumn("cosine_value", cos("angle_radians"))
val result2 = df.withColumn("cosine_value", cos(col("angle_radians")))

输出

The SMA adds the EWI SPRKSCL1123 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(0.0, Math.PI / 4, Math.PI / 2, Math.PI).toDF("angle_radians")
/*EWI: SPRKSCL1123 => org.apache.spark.sql.functions.cos has a workaround, see documentation for more info*/
val result1 = df.withColumn("cosine_value", cos("angle_radians"))
/*EWI: SPRKSCL1123 => org.apache.spark.sql.functions.cos has a workaround, see documentation for more info*/
val result2 = df.withColumn("cosine_value", cos(col("angle_radians")))

推荐修复方法

Snowpark has an equivalent cos function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(0.0, Math.PI / 4, Math.PI / 2, Math.PI).toDF("angle_radians")
val result1 = df.withColumn("cosine_value", cos(col("angle_radians")))
val result2 = df.withColumn("cosine_value", cos(col("angle_radians")))

其他建议

SPRKSCL1172

消息:Snowpark 不支持带有元数据参数的 StructFiled。

类别:警告

描述

This issue appears when the SMA detects that org.apache.spark.sql.types.StructField.apply (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/types/StructField.html) with org.apache.spark.sql.types.Metadata (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/types/Metadata.html) as parameter. This is because Snowpark does not supported the metadata parameter.

场景

输入

Below is an example of the org.apache.spark.sql.types.StructField.apply function that generates this EWI. In this example, the apply function is used to generate and instance of StructField.

val result = StructField("f1", StringType(), True, metadata)

输出

The SMA adds the EWI SPRKSCL1172 to the output code to let you know that metadata parameter is not supported by Snowflake.

/*EWI: SPRKSCL1172 => Snowpark does not support StructFiled with metadata parameter.*/
val result = StructField("f1", StringType(), True, metadata)

推荐修复方法

Snowpark has an equivalent com.snowflake.snowpark.types.StructField.apply function that receives three parameters. Then, as workaround, you can try to remove the metadata argument.

val result = StructField("f1", StringType(), True, metadata)

其他建议

SPRKSCL1162

备注

此问题代码已 弃用

消息:提取 dbc 文件时出错。

类别:警告。

描述

当无法提取 dbc 文件时,就会出现此问题。此警告可能是由以下一个或多个原因引起的:文件过大、无法访问、只读等。

其他建议

  • 作为一种替代方案,如果文件太大无法处理,则可以检查文件的大小。此外,分析该工具是否可以访问文件,以免出现任何访问问题。

  • For more support, you can email us at sma-support@snowflake.com or post an issue in the SMA.

SPRKSCL1133

消息:org.apache.spark.sql.functions.least 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.least (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.least function, first used with multiple column name as arguments and then with column objects.

val df = Seq((10, 20, 5), (15, 25, 30), (7, 14, 3)).toDF("value1", "value2", "value3")
val result1 = df.withColumn("least", least("value1", "value2", "value3"))
val result2 = df.withColumn("least", least(col("value1"), col("value2"), col("value3")))

输出

The SMA adds the EWI SPRKSCL1133 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq((10, 20, 5), (15, 25, 30), (7, 14, 3)).toDF("value1", "value2", "value3")
/*EWI: SPRKSCL1133 => org.apache.spark.sql.functions.least has a workaround, see documentation for more info*/
val result1 = df.withColumn("least", least("value1", "value2", "value3"))
/*EWI: SPRKSCL1133 => org.apache.spark.sql.functions.least has a workaround, see documentation for more info*/
val result2 = df.withColumn("least", least(col("value1"), col("value2"), col("value3")))

推荐修复方法

Snowpark has an equivalent least function that receives multiple column objects as arguments. For that reason, the Spark overload that receives multiple column objects as arguments is directly supported by Snowpark and does not require any changes.

For the overload that receives multiple string arguments, you can convert the strings into column objects using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq((10, 20, 5), (15, 25, 30), (7, 14, 3)).toDF("value1", "value2", "value3")
val result1 = df.withColumn("least", least(col("value1"), col("value2"), col("value3")))
val result2 = df.withColumn("least", least(col("value1"), col("value2"), col("value3")))

其他建议

SPRKSCL1107

警告

此问题代码已 弃用

消息:不支持 Writer save。

类别:转换错误。

描述

当该工具在 writer 语句中检测到使用了 Snowpark 不支持的 writer save 方法时,就会出现此问题。

场景

输入

Below is an example of the org.apache.spark.sql.DataFrameWriter.save (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameWriter.html) used to save the DataFrame content.

df.write.format("net.snowflake.spark.snowflake").save()

输出

The SMA adds the EWI SPRKSCL1107 to the output code to let you know that the save method is not supported by Snowpark.

df.write.saveAsTable(tablename)
/*EWI: SPRKSCL1107 => Writer method is not supported .save()*/

推荐修复方法

对于此场景,没有推荐的修复方法

其他建议

SPRKSCL1156

消息:org.apache.spark.sql.functions.degrees 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.degrees (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.degrees function, first used with a column name as an argument and then with a column object.

val df = Seq(math.Pi, math.Pi / 2, math.Pi / 4, math.Pi / 6).toDF("radians")
val result1 = df.withColumn("degrees", degrees("radians"))
val result2 = df.withColumn("degrees", degrees(col("radians")))

输出

The SMA adds the EWI SPRKSCL1156 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(math.Pi, math.Pi / 2, math.Pi / 4, math.Pi / 6).toDF("radians")
/*EWI: SPRKSCL1156 => org.apache.spark.sql.functions.degrees has a workaround, see documentation for more info*/
val result1 = df.withColumn("degrees", degrees("radians"))
/*EWI: SPRKSCL1156 => org.apache.spark.sql.functions.degrees has a workaround, see documentation for more info*/
val result2 = df.withColumn("degrees", degrees(col("radians")))

推荐修复方法

Snowpark has an equivalent degrees function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(math.Pi, math.Pi / 2, math.Pi / 4, math.Pi / 6).toDF("radians")
val result1 = df.withColumn("degrees", degrees(col("radians")))
val result2 = df.withColumn("degrees", degrees(col("radians")))

其他建议

SPRKSCL1127

消息:org.apache.spark.sql.functions.covar_samp 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.covar_samp (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.covar_samp function, first used with column names as the arguments and then with column objects.

val df = Seq(
  (10.0, 20.0),
  (15.0, 25.0),
  (20.0, 30.0),
  (25.0, 35.0),
  (30.0, 40.0)
).toDF("value1", "value2")

val result1 = df.select(covar_samp("value1", "value2").as("sample_covariance"))
val result2 = df.select(covar_samp(col("value1"), col("value2")).as("sample_covariance"))

输出

The SMA adds the EWI SPRKSCL1127 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(
  (10.0, 20.0),
  (15.0, 25.0),
  (20.0, 30.0),
  (25.0, 35.0),
  (30.0, 40.0)
).toDF("value1", "value2")

/*EWI: SPRKSCL1127 => org.apache.spark.sql.functions.covar_samp has a workaround, see documentation for more info*/
val result1 = df.select(covar_samp("value1", "value2").as("sample_covariance"))
/*EWI: SPRKSCL1127 => org.apache.spark.sql.functions.covar_samp has a workaround, see documentation for more info*/
val result2 = df.select(covar_samp(col("value1"), col("value2")).as("sample_covariance"))

推荐修复方法

Snowpark has an equivalent covar_samp function that receives two column objects as arguments. For that reason, the Spark overload that receives two column objects as arguments is directly supported by Snowpark and does not require any changes.

For the overload that receives two string arguments, you can convert the strings into column objects using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(
  (10.0, 20.0),
  (15.0, 25.0),
  (20.0, 30.0),
  (25.0, 35.0),
  (30.0, 40.0)
).toDF("value1", "value2")

val result1 = df.select(covar_samp(col("value1"), col("value2")).as("sample_covariance"))
val result2 = df.select(covar_samp(col("value1"), col("value2")).as("sample_covariance"))

其他建议

SPRKSCL1113

消息:org.apache.spark.sql.functions.next_day 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.next_day (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.next_day function, first used with a string as the second argument and then with a column object.

val df = Seq("2024-11-06", "2024-11-13", "2024-11-20").toDF("date")
val result1 = df.withColumn("next_monday", next_day(col("date"), "Mon"))
val result2 = df.withColumn("next_monday", next_day(col("date"), lit("Mon")))

输出

The SMA adds the EWI SPRKSCL1113 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq("2024-11-06", "2024-11-13", "2024-11-20").toDF("date")
/*EWI: SPRKSCL1113 => org.apache.spark.sql.functions.next_day has a workaround, see documentation for more info*/
val result1 = df.withColumn("next_monday", next_day(col("date"), "Mon"))
/*EWI: SPRKSCL1113 => org.apache.spark.sql.functions.next_day has a workaround, see documentation for more info*/
val result2 = df.withColumn("next_monday", next_day(col("date"), lit("Mon")))

推荐修复方法

Snowpark has an equivalent next_day function that receives two column objects as arguments. For that reason, the Spark overload that receives two column objects as arguments is directly supported by Snowpark and does not require any changes.

For the overload that receives a column object and a string, you can convert the string into a column object using the com.snowflake.snowpark.functions.lit (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function as a workaround.

val df = Seq("2024-11-06", "2024-11-13", "2024-11-20").toDF("date")
val result1 = df.withColumn("next_monday", next_day(col("date"), lit("Mon")))
val result2 = df.withColumn("next_monday", next_day(col("date"), lit("Mon")))

其他建议

SPRKSCL1002

Message: This code section has recovery from parsing errors *statement*

类别:解析错误。

描述

当 SMA 检测到文件代码中存在某些无法正确读取或理解的语句时,就会出现此问题,称为 解析错误,但是 SMA 可以从该解析错误中恢复并继续分析文件代码。在这种情况下,SMA 能够毫无错误地处理文件的代码。

场景

输入

以下是 SMA 可以恢复的无效 Scala 代码示例。

Class myClass {

    def function1() & = { 1 }

    def function2() = { 2 }

    def function3() = { 3 }

}

输出

The SMA adds the EWI SPRKSCL1002 to the output code to let you know that the code of the file has parsing errors, however the SMA can recovery from that error and continue analyzing the code of the file.

class myClass {

    def function1();//EWI: SPRKSCL1002 => Unexpected end of declaration. Failed token: '&' @(3,21).
    & = { 1 }

    def function2() = { 2 }

    def function3() = { 3 }

}

推荐修复方法

由于该消息指出了语句中的错误,因此您可以尝试识别无效语法并将其删除,或注释掉该语句以避免解析错误。

Class myClass {

    def function1() = { 1 }

    def function2() = { 2 }

    def function3() = { 3 }

}
Class myClass {

    // def function1() & = { 1 }

    def function2() = { 2 }

    def function3() = { 3 }

}

其他建议

SPRKSCL1142

Message: *spark element* is not defined

类别:转换错误

描述

当 SMA 无法确定给定元素的相应映射状态时,就会出现此问题。这意味着,SMA 还不知道 Snowpark 是否支持该元素。请注意,这是 SMA 对任何未定义元素使用的通用错误代码。

场景

输入

Below is an example of a function for which the SMA could not determine an appropriate mapping status, and therefore it generated this EWI. In this case, you should assume that notDefinedFunction() is a valid Spark function and the code runs.

val df = session.range(10)
val result = df.notDefinedFunction()

输出

The SMA adds the EWI SPRKSCL1142 to the output code to let you know that this element is not defined.

val df = session.range(10)
/*EWI: SPRKSCL1142 => org.apache.spark.sql.DataFrame.notDefinedFunction is not defined*/
val result = df.notDefinedFunction()

推荐修复方法

要尝试找出问题,可以执行以下验证:

  • 检查它是否为有效的 Spark 元素。

  • 检查元素的语法是否正确以及拼写是否正确。

  • 检查是否使用的是 SMA 支持的 Spark 版本。

If this is a valid Spark element, please report that you encountered a conversion error on that particular element using the Report an Issue option of the SMA and include any additional information that you think may be helpful.

Please note that if an element is not defined by the SMA, it does not mean necessarily that it is not supported by Snowpark. You should check the Snowpark Documentation to verify if an equivalent element exist.

其他建议

SPRKSCL1152

消息:org.apache.spark.sql.functions.variance 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.variance (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.variance function, first used with a column name as an argument and then with a column object.

val df = Seq(10, 20, 30, 40, 50).toDF("value")
val result1 = df.select(variance("value"))
val result2 = df.select(variance(col("value")))

输出

The SMA adds the EWI SPRKSCL1152 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(10, 20, 30, 40, 50).toDF("value")
/*EWI: SPRKSCL1152 => org.apache.spark.sql.functions.variance has a workaround, see documentation for more info*/
val result1 = df.select(variance("value"))
/*EWI: SPRKSCL1152 => org.apache.spark.sql.functions.variance has a workaround, see documentation for more info*/
val result2 = df.select(variance(col("value")))

推荐修复方法

Snowpark has an equivalent variance function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(10, 20, 30, 40, 50).toDF("value")
val result1 = df.select(variance(col("value")))
val result2 = df.select(variance(col("value")))

其他建议

SPRKSCL1103

此问题代码已 弃用

Message: SparkBuilder method is not supported *method name*

类别:转换错误

描述

当 SMA 在 SparkBuilder 方法链接中检测到 Snowflake 不支持的方法时,就会出现此问题。因此,这可能会影响 reader 语句的迁移。

以下是不支持的 SparkBuilder 方法:

  • master

  • appName

  • enableHiveSupport

  • withExtensions

场景

输入

以下是 SparkBuilder 方法链接的示例,其中包含许多 Snowflake 不支持的方法。

val spark = SparkSession.builder()
           .master("local")
           .appName("testApp")
           .config("spark.sql.broadcastTimeout", "3600")
           .enableHiveSupport()
           .getOrCreate()

输出

The SMA adds the EWI SPRKSCL1103 to the output code to let you know that master, appName and enableHiveSupport methods are not supported by Snowpark. Then, it might affects the migration of the Spark Session statement.

val spark = Session.builder.configFile("connection.properties")
/*EWI: SPRKSCL1103 => SparkBuilder Method is not supported .master("local")*/
/*EWI: SPRKSCL1103 => SparkBuilder Method is not supported .appName("testApp")*/
/*EWI: SPRKSCL1103 => SparkBuilder method is not supported .enableHiveSupport()*/
.create

推荐修复方法

要创建会话,需要添加适当的 Snowflake Snowpark 配置。

在此示例中,使用了配置变量。

    val configs = Map (
      "URL" -> "https://<myAccount>.snowflakecomputing.cn:<port>",
      "USER" -> <myUserName>,
      "PASSWORD" -> <myPassword>,
      "ROLE" -> <myRole>,
      "WAREHOUSE" -> <myWarehouse>,
      "DB" -> <myDatabase>,
      "SCHEMA" -> <mySchema>
    )
    val session = Session.builder.configs(configs).create

此外,还建议使用包含连接信息的 configFile (profile.properties):

## profile.properties file (a text file)
URL = https://<account_identifier>.snowflakecomputing.cn
USER = <username>
PRIVATEKEY = <unencrypted_private_key_from_the_private_key_file>
ROLE = <role_name>
WAREHOUSE = <warehouse_name>
DB = <database_name>
SCHEMA = <schema_name>

And with the Session.builder.configFile the session can be created:

val session = Session.builder.configFile("/path/to/properties/file").create

其他建议

SPRKSCL1137

消息:org.apache.spark.sql.functions.sin 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.sin (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.sin function, first used with a column name as an argument and then with a column object.

val df = Seq(Math.PI / 2, Math.PI, Math.PI / 6).toDF("angle")
val result1 = df.withColumn("sin_value", sin("angle"))
val result2 = df.withColumn("sin_value", sin(col("angle")))

输出

The SMA adds the EWI SPRKSCL1137 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(Math.PI / 2, Math.PI, Math.PI / 6).toDF("angle")
/*EWI: SPRKSCL1137 => org.apache.spark.sql.functions.sin has a workaround, see documentation for more info*/
val result1 = df.withColumn("sin_value", sin("angle"))
/*EWI: SPRKSCL1137 => org.apache.spark.sql.functions.sin has a workaround, see documentation for more info*/
val result2 = df.withColumn("sin_value", sin(col("angle")))

推荐修复方法

Snowpark has an equivalent sin function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(Math.PI / 2, Math.PI, Math.PI / 6).toDF("angle")
val result1 = df.withColumn("sin_value", sin(col("angle")))
val result2 = df.withColumn("sin_value", sin(col("angle")))

其他建议

SPRKSCL1166

备注

此问题代码已 弃用

消息:不支持 org.apache.spark.sql.DataFrameReader.format。

类别:警告。

描述

This issue appears when the org.apache.spark.sql.DataFrameReader.format (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html) has an argument that is not supported by Snowpark.

场景

There are some scenarios depending on the type of format you are trying to load. It can be a supported, or non-supported format.

场景 1

输入

该工具会分析尝试加载的格式类型,支持的格式有:

  • csv

  • json

  • orc

  • parquet

  • text

The below example shows how the tool transforms the format method when passing a csv value.

spark.read.format("csv").load(path)

输出

The tool transforms the format method into a csv method call when load function has one parameter.

spark.read.csv(path)

推荐修复方法

在此示例中,该工具不显示 EWI,这意味着无需修复。

场景 2

输入

The below example shows how the tool transforms the format method when passing a net.snowflake.spark.snowflake value.

spark.read.format("net.snowflake.spark.snowflake").load(path)

输出

The tool shows the EWI SPRKSCL1166 indicating that the value net.snowflake.spark.snowflake is not supported.

/*EWI: SPRKSCL1166 => The parameter net.snowflake.spark.snowflake is not supported for org.apache.spark.sql.DataFrameReader.format
  EWI: SPRKSCL1112 => org.apache.spark.sql.DataFrameReader.load(scala.String) is not supported*/
spark.read.format("net.snowflake.spark.snowflake").load(path)

推荐修复方法

For the not supported scenarios there is no specific fix since it depends on the files that are trying to be read.

场景 3

输入

The below example shows how the tool transforms the format method when passing a csv, but using a variable instead.

val myFormat = "csv"
spark.read.format(myFormat).load(path)

输出

Since the tool can not determine the value of the variable in runtime, shows the EWI SPRKSCL1163 indicating that the value is not supported.

/*EWI: SPRKSCL1163 => myFormat is not a literal and can't be evaluated
  EWI: SPRKSCL1112 => org.apache.spark.sql.DataFrameReader.load(scala.String) is not supported*/
spark.read.format(myFormat).load(path)

推荐修复方法

As a workaround, you can check the value of the variable and add it as a string to the format call.

其他建议

SPRKSCL1118

消息:org.apache.spark.sql.functions.trunc 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.trunc (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.trunc function that generates this EWI.

val df = Seq(
  Date.valueOf("2024-10-28"),
  Date.valueOf("2023-05-15"),
  Date.valueOf("2022-11-20"),
).toDF("date")

val result = df.withColumn("truncated", trunc(col("date"), "month"))

输出

The SMA adds the EWI SPRKSCL1118 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(
  Date.valueOf("2024-10-28"),
  Date.valueOf("2023-05-15"),
  Date.valueOf("2022-11-20"),
).toDF("date")

/*EWI: SPRKSCL1118 => org.apache.spark.sql.functions.trunc has a workaround, see documentation for more info*/
val result = df.withColumn("truncated", trunc(col("date"), "month"))

推荐修复方法

As a workaround, you can convert the second argument into a column object using the com.snowflake.snowpark.functions.lit function.

val df = Seq(
  Date.valueOf("2024-10-28"),
  Date.valueOf("2023-05-15"),
  Date.valueOf("2022-11-20"),
).toDF("date")

val result = df.withColumn("truncated", trunc(col("date"), lit("month")))

其他建议

SPRKSCL1149

消息:org.apache.spark.sql.functions.toRadians 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.toRadians (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.toRadians function, first used with a column name as an argument and then with a column object.

val df = Seq(0, 45, 90, 180, 270).toDF("degrees")
val result1 = df.withColumn("radians", toRadians("degrees"))
val result2 = df.withColumn("radians", toRadians(col("degrees")))

输出

The SMA adds the EWI SPRKSCL1149 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(0, 45, 90, 180, 270).toDF("degrees")
/*EWI: SPRKSCL1149 => org.apache.spark.sql.functions.toRadians has a workaround, see documentation for more info*/
val result1 = df.withColumn("radians", toRadians("degrees"))
/*EWI: SPRKSCL1149 => org.apache.spark.sql.functions.toRadians has a workaround, see documentation for more info*/
val result2 = df.withColumn("radians", toRadians(col("degrees")))

推荐修复方法

As a workaround, you can use the radians function. For the Spark overload that receives a string argument, you additionally have to convert the string into a column object using the com.snowflake.snowpark.functions.col function.

val df = Seq(0, 45, 90, 180, 270).toDF("degrees")
val result1 = df.withColumn("radians", radians(col("degrees")))
val result2 = df.withColumn("radians", radians(col("degrees")))

其他建议

SPRKSCL1159

消息:org.apache.spark.sql.functions.stddev_samp 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.stddev_samp (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.stddev_samp function that generates this EWI. In this example, the stddev_samp function is used to calculate the sample standard deviation of selected column.

val df = Seq("1.7", "2.1", "3.0", "4.4", "5.2").toDF("elements")
val result1 = stddev_samp(col("elements"))
val result2 = stddev_samp("elements")

输出

The SMA adds the EWI SPRKSCL1159 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq("1.7", "2.1", "3.0", "4.4", "5.2").toDF("elements")
/*EWI: SPRKSCL1159 => org.apache.spark.sql.functions.stddev_samp has a workaround, see documentation for more info*/
val result1 = stddev_samp(col("elements"))
/*EWI: SPRKSCL1159 => org.apache.spark.sql.functions.stddev_samp has a workaround, see documentation for more info*/
val result2 = stddev_samp("elements")

推荐修复方法

Snowpark has an equivalent stddev_samp function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq("1.7", "2.1", "3.0", "4.4", "5.2").toDF("elements")
val result1 = stddev_samp(col("elements"))
val result2 = stddev_samp(col("elements"))

其他建议

SPRKSCL1108

备注

此问题代码已 弃用。

消息:不支持 org.apache.spark.sql.DataFrameReader.format。

类别:警告。

描述

This issue appears when the org.apache.spark.sql.DataFrameReader.format (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html) has an argument that is not supported by Snowpark.

场景

There are some scenarios depending on the type of format you are trying to load. It can be a supported, or non-supported format.

场景 1

输入

该工具会分析尝试加载的格式类型,支持的格式有:

  • csv

  • json

  • orc

  • parquet

  • text

The below example shows how the tool transforms the format method when passing a csv value.

spark.read.format("csv").load(path)

输出

The tool transforms the format method into a csv method call when load function has one parameter.

spark.read.csv(path)

推荐修复方法

在此示例中,该工具不显示 EWI,这意味着无需修复。

场景 2

输入

The below example shows how the tool transforms the format method when passing a net.snowflake.spark.snowflake value.

spark.read.format("net.snowflake.spark.snowflake").load(path)

输出

The tool shows the EWI SPRKSCL1108 indicating that the value net.snowflake.spark.snowflake is not supported.

/*EWI: SPRKSCL1108 => The parameter net.snowflake.spark.snowflake is not supported for org.apache.spark.sql.DataFrameReader.format
  EWI: SPRKSCL1112 => org.apache.spark.sql.DataFrameReader.load(scala.String) is not supported*/
spark.read.format("net.snowflake.spark.snowflake").load(path)

推荐修复方法

For the not supported scenarios there is no specific fix since it depends on the files that are trying to be read.

场景 3

输入

The below example shows how the tool transforms the format method when passing a csv, but using a variable instead.

val myFormat = "csv"
spark.read.format(myFormat).load(path)

输出

Since the tool can not determine the value of the variable in runtime, shows the EWI SPRKSCL1163 indicating that the value is not supported.

/*EWI: SPRKSCL1108 => myFormat is not a literal and can't be evaluated
  EWI: SPRKSCL1112 => org.apache.spark.sql.DataFrameReader.load(scala.String) is not supported*/
spark.read.format(myFormat).load(path)

推荐修复方法

As a workaround, you can check the value of the variable and add it as a string to the format call.

其他建议

SPRKSCL1128

消息:org.apache.spark.sql.functions.exp 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.exp (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.exp function, first used with a column name as an argument and then with a column object.

val df = Seq(1.0, 2.0, 3.0).toDF("value")
val result1 = df.withColumn("exp_value", exp("value"))
val result2 = df.withColumn("exp_value", exp(col("value")))

输出

The SMA adds the EWI SPRKSCL1128 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(1.0, 2.0, 3.0).toDF("value")
/*EWI: SPRKSCL1128 => org.apache.spark.sql.functions.exp has a workaround, see documentation for more info*/
val result1 = df.withColumn("exp_value", exp("value"))
/*EWI: SPRKSCL1128 => org.apache.spark.sql.functions.exp has a workaround, see documentation for more info*/
val result2 = df.withColumn("exp_value", exp(col("value")))

推荐修复方法

Snowpark has an equivalent exp function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(1.0, 2.0, 3.0).toDF("value")
val result1 = df.withColumn("exp_value", exp(col("value")))
val result2 = df.withColumn("exp_value", exp(col("value")))

其他建议

SPRKSCL1169

Message: *Spark element* is missing on the method chaining.

类别:警告。

描述

当 SMA 检测到方法链接中缺少 Spark 元素调用时,就会出现此问题。SMA 需要知道 Spark 元素才能分析该语句。

场景

输入

以下是方法链接中缺少加载函数调用的示例。

val reader = spark.read.format("json")
val df = reader.load(path)

输出

The SMA adds the EWI SPRKSCL1169 to the output code to let you know that load function call is missing on the method chaining and SMA can not analyze the statement.

/*EWI: SPRKSCL1169 => Function 'org.apache.spark.sql.DataFrameReader.load' is missing on the method chaining*/
val reader = spark.read.format("json")
val df = reader.load(path)

推荐修复方法

确保方法链接的所有函数调用都在同一语句中。

val reader = spark.read.format("json").load(path)

其他建议

SPRKSCL1138

消息:org.apache.spark.sql.functions.sinh 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.sinh (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.sinh function, first used with a column name as an argument and then with a column object.

val df = Seq(0.0, 1.0, 2.0, 3.0).toDF("value")
val result1 = df.withColumn("sinh_value", sinh("value"))
val result2 = df.withColumn("sinh_value", sinh(col("value")))

输出

The SMA adds the EWI SPRKSCL1138 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(0.0, 1.0, 2.0, 3.0).toDF("value")
/*EWI: SPRKSCL1138 => org.apache.spark.sql.functions.sinh has a workaround, see documentation for more info*/
val result1 = df.withColumn("sinh_value", sinh("value"))
/*EWI: SPRKSCL1138 => org.apache.spark.sql.functions.sinh has a workaround, see documentation for more info*/
val result2 = df.withColumn("sinh_value", sinh(col("value")))

推荐修复方法

Snowpark has an equivalent sinh function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(0.0, 1.0, 2.0, 3.0).toDF("value")
val result1 = df.withColumn("sinh_value", sinh(col("value")))
val result2 = df.withColumn("sinh_value", sinh(col("value")))

其他建议

SPRKSCL1129

消息:org.apache.spark.sql.functions.floor 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.floor (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.floor function, first used with a column name as an argument, then with a column object and finally with two column objects.

val df = Seq(4.75, 6.22, 9.99).toDF("value")
val result1 = df.withColumn("floor_value", floor("value"))
val result2 = df.withColumn("floor_value", floor(col("value")))
val result3 = df.withColumn("floor_value", floor(col("value"), lit(1)))

输出

The SMA adds the EWI SPRKSCL1129 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(4.75, 6.22, 9.99).toDF("value")
/*EWI: SPRKSCL1129 => org.apache.spark.sql.functions.floor has a workaround, see documentation for more info*/
val result1 = df.withColumn("floor_value", floor("value"))
/*EWI: SPRKSCL1129 => org.apache.spark.sql.functions.floor has a workaround, see documentation for more info*/
val result2 = df.withColumn("floor_value", floor(col("value")))
/*EWI: SPRKSCL1129 => org.apache.spark.sql.functions.floor has a workaround, see documentation for more info*/
val result3 = df.withColumn("floor_value", floor(col("value"), lit(1)))

推荐修复方法

Snowpark has an equivalent floor function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

For the overload that receives a column object and a scale, you can use the callBuiltin function to invoke the Snowflake builtin FLOOR function. To use it, you should pass the string "floor" as the first argument, the column as the second argument and the scale as the third argument.

val df = Seq(4.75, 6.22, 9.99).toDF("value")
val result1 = df.withColumn("floor_value", floor(col("value")))
val result2 = df.withColumn("floor_value", floor(col("value")))
val result3 = df.withColumn("floor_value", callBuiltin("floor", col("value"), lit(1)))

其他建议

SPRKSCL1168

Message: *Spark element* with argument(s) value(s) *given arguments* is not supported.

类别:警告。

描述

当 SMA 检测到不支持包含给定参数的 Spark 元素时,就会出现此问题。

场景

输入

以下是其参数不受支持的 Spark 元素的示例。

spark.read.format("text").load(path)

输出

The SMA adds the EWI SPRKSCL1168 to the output code to let you know that Spark element with the given parameter is not supported.

/*EWI: SPRKSCL1168 => org.apache.spark.sql.DataFrameReader.format(scala.String) with argument(s) value(s) (spark.format) is not supported*/
spark.read.format("text").load(path)

推荐修复方法

对于此场景,没有具体的修复方法。

其他建议

SPRKSCL1139

消息:org.apache.spark.sql.functions.sqrt 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.sqrt (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.sqrt function, first used with a column name as an argument and then with a column object.

val df = Seq(4.0, 16.0, 25.0, 36.0).toDF("value")
val result1 = df.withColumn("sqrt_value", sqrt("value"))
val result2 = df.withColumn("sqrt_value", sqrt(col("value")))

输出

The SMA adds the EWI SPRKSCL1139 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(4.0, 16.0, 25.0, 36.0).toDF("value")
/*EWI: SPRKSCL1139 => org.apache.spark.sql.functions.sqrt has a workaround, see documentation for more info*/
val result1 = df.withColumn("sqrt_value", sqrt("value"))
/*EWI: SPRKSCL1139 => org.apache.spark.sql.functions.sqrt has a workaround, see documentation for more info*/
val result2 = df.withColumn("sqrt_value", sqrt(col("value")))

推荐修复方法

Snowpark has an equivalent sqrt function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(4.0, 16.0, 25.0, 36.0).toDF("value")
val result1 = df.withColumn("sqrt_value", sqrt(col("value")))
val result2 = df.withColumn("sqrt_value", sqrt(col("value")))

其他建议

SPRKSCL1119

消息:org.apache.spark.sql.Column.endsWith 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.Column.endsWith (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Column.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.Column.endsWith function, first used with a literal string argument and then with a column object argument.

val df1 = Seq(
  ("Alice", "alice@example.com"),
  ("Bob", "bob@example.org"),
  ("David", "david@example.com")
).toDF("name", "email")
val result1 = df1.filter(col("email").endsWith(".com"))

val df2 = Seq(
  ("Alice", "alice@example.com", ".com"),
  ("Bob", "bob@example.org", ".org"),
  ("David", "david@example.org", ".com")
).toDF("name", "email", "suffix")
val result2 = df2.filter(col("email").endsWith(col("suffix")))

输出

The SMA adds the EWI SPRKSCL1119 to the output code to let you know that this function is not directly supported by Snowpark, but it has a workaround.

val df1 = Seq(
  ("Alice", "alice@example.com"),
  ("Bob", "bob@example.org"),
  ("David", "david@example.com")
).toDF("name", "email")
/*EWI: SPRKSCL1119 => org.apache.spark.sql.Column.endsWith has a workaround, see documentation for more info*/
val result1 = df1.filter(col("email").endsWith(".com"))

val df2 = Seq(
  ("Alice", "alice@example.com", ".com"),
  ("Bob", "bob@example.org", ".org"),
  ("David", "david@example.org", ".com")
).toDF("name", "email", "suffix")
/*EWI: SPRKSCL1119 => org.apache.spark.sql.Column.endsWith has a workaround, see documentation for more info*/
val result2 = df2.filter(col("email").endsWith(col("suffix")))

推荐修复方法

As a workaround, you can use the com.snowflake.snowpark.functions.endswith function, where the first argument would be the column whose values will be checked and the second argument the suffix to check against the column values. Please note that if the argument of the Spark's endswith function is a literal string, you should convert it into a column object using the com.snowflake.snowpark.functions.lit function.

val df1 = Seq(
  ("Alice", "alice@example.com"),
  ("Bob", "bob@example.org"),
  ("David", "david@example.com")
).toDF("name", "email")
val result1 = df1.filter(endswith(col("email"), lit(".com")))

val df2 = Seq(
  ("Alice", "alice@example.com", ".com"),
  ("Bob", "bob@example.org", ".org"),
  ("David", "david@example.org", ".com")
).toDF("name", "email", "suffix")
val result2 = df2.filter(endswith(col("email"), col("suffix")))

其他建议

SPRKSCL1148

消息:org.apache.spark.sql.functions.toDegrees 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.toDegrees (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.toDegrees function, first used with a column name as an argument and then with a column object.

val df = Seq(Math.PI, Math.PI / 2, Math.PI / 4).toDF("angle_in_radians")
val result1 = df.withColumn("angle_in_degrees", toDegrees("angle_in_radians"))
val result2 = df.withColumn("angle_in_degrees", toDegrees(col("angle_in_radians")))

输出

The SMA adds the EWI SPRKSCL1148 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(Math.PI, Math.PI / 2, Math.PI / 4).toDF("angle_in_radians")
/*EWI: SPRKSCL1148 => org.apache.spark.sql.functions.toDegrees has a workaround, see documentation for more info*/
val result1 = df.withColumn("angle_in_degrees", toDegrees("angle_in_radians"))
/*EWI: SPRKSCL1148 => org.apache.spark.sql.functions.toDegrees has a workaround, see documentation for more info*/
val result2 = df.withColumn("angle_in_degrees", toDegrees(col("angle_in_radians")))

推荐修复方法

As a workaround, you can use the degrees function. For the Spark overload that receives a string argument, you additionally have to convert the string into a column object using the com.snowflake.snowpark.functions.col function.

val df = Seq(Math.PI, Math.PI / 2, Math.PI / 4).toDF("angle_in_radians")
val result1 = df.withColumn("angle_in_degrees", degrees(col("angle_in_radians")))
val result2 = df.withColumn("angle_in_degrees", degrees(col("angle_in_radians")))

其他建议

SPRKSCL1158

消息:org.apache.spark.sql.functions.skewness 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.skewness (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.skewness function that generates this EWI. In this example, the skewness function is used to calculate the skewness of selected column.

val df = Seq("1", "2", "3").toDF("elements")
val result1 = skewness(col("elements"))
val result2 = skewness("elements")

输出

The SMA adds the EWI SPRKSCL1158 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq("1", "2", "3").toDF("elements")
/*EWI: SPRKSCL1158 => org.apache.spark.sql.functions.skewness has a workaround, see documentation for more info*/
val result1 = skewness(col("elements"))
/*EWI: SPRKSCL1158 => org.apache.spark.sql.functions.skewness has a workaround, see documentation for more info*/
val result2 = skewness("elements")

推荐修复方法

Snowpark has an equivalent skew function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq("1", "2", "3").toDF("elements")
val result1 = skew(col("elements"))
val result2 = skew(col("elements"))

其他建议

SPRKSCL1109

备注

此问题代码已 弃用

消息:未对 org.apache.spark.sql.DataFrameReader.option 定义该参数

类别:警告

描述

This issue appears when the SMA detects that giving parameter of org.apache.spark.sql.DataFrameReader.option (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html) is not defined.

场景

输入

Below is an example of undefined parameter for org.apache.spark.sql.DataFrameReader.option function.

spark.read.option("header", True).json(path)

输出

The SMA adds the EWI SPRKSCL1109 to the output code to let you know that giving parameter to the org.apache.spark.sql.DataFrameReader.option function is not defined.

/*EWI: SPRKSCL1109 => The parameter header=True is not supported for org.apache.spark.sql.DataFrameReader.option*/
spark.read.option("header", True).json(path)

推荐修复方法

Check the Snowpark documentation for reader format option here, in order to identify the defined options.

其他建议

SPRKSCL1114

消息:org.apache.spark.sql.functions.repeat 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.repeat (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.repeat function that generates this EWI.

val df = Seq("Hello", "World").toDF("word")
val result = df.withColumn("repeated_word", repeat(col("word"), 3))

输出

The SMA adds the EWI SPRKSCL1114 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq("Hello", "World").toDF("word")
/*EWI: SPRKSCL1114 => org.apache.spark.sql.functions.repeat has a workaround, see documentation for more info*/
val result = df.withColumn("repeated_word", repeat(col("word"), 3))

推荐修复方法

As a workaround, you can convert the second argument into a column object using the com.snowflake.snowpark.functions.lit function.

val df = Seq("Hello", "World").toDF("word")
val result = df.withColumn("repeated_word", repeat(col("word"), lit(3)))

其他建议

SPRKSCL1145

消息:org.apache.spark.sql.functions.sumDistinct 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.sumDistinct (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.sumDistinct function, first used with a column name as an argument and then with a column object.

val df = Seq(
  ("Alice", 10),
  ("Bob", 15),
  ("Alice", 10),
  ("Alice", 20),
  ("Bob", 15)
).toDF("name", "value")

val result1 = df.groupBy("name").agg(sumDistinct("value"))
val result2 = df.groupBy("name").agg(sumDistinct(col("value")))

输出

The SMA adds the EWI SPRKSCL1145 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(
  ("Alice", 10),
  ("Bob", 15),
  ("Alice", 10),
  ("Alice", 20),
  ("Bob", 15)
).toDF("name", "value")

/*EWI: SPRKSCL1145 => org.apache.spark.sql.functions.sumDistinct has a workaround, see documentation for more info*/
val result1 = df.groupBy("name").agg(sumDistinct("value"))
/*EWI: SPRKSCL1145 => org.apache.spark.sql.functions.sumDistinct has a workaround, see documentation for more info*/
val result2 = df.groupBy("name").agg(sumDistinct(col("value")))

推荐修复方法

As a workaround, you can use the sum_distinct function. For the Spark overload that receives a string argument, you additionally have to convert the string into a column object using the com.snowflake.snowpark.functions.col function.

val df = Seq(
  ("Alice", 10),
  ("Bob", 15),
  ("Alice", 10),
  ("Alice", 20),
  ("Bob", 15)
).toDF("name", "value")

val result1 = df.groupBy("name").agg(sum_distinct(col("value")))
val result2 = df.groupBy("name").agg(sum_distinct(col("value")))

其他建议

SPRKSCL1171

消息:Snowpark 不支持具有两个以上参数或包含正则表达式模式的拆分函数。有关更多信息,请参阅文档。

类别:警告。

描述

This issue appears when the SMA detects that org.apache.spark.sql.functions.split (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) has more than two parameters or containing regex pattern.

场景

The split function is used to separate the given column around matches of the given pattern. This Spark function has three overloads.

场景 1

输入

Below is an example of the org.apache.spark.sql.functions.split function that generates this EWI. In this example, the split function has two parameters and the second argument is a string, not a regex pattern.

val df = Seq("Snowflake", "Snowpark", "Snow", "Spark").toDF("words")
val result = df.select(split(col("words"), "Snow"))

输出

The SMA adds the EWI SPRKSCL1171 to the output code to let you know that this function is not fully supported by Snowpark.

val df = Seq("Snowflake", "Snowpark", "Snow", "Spark").toDF("words")
/* EWI: SPRKSCL1171 => Snowpark does not support split functions with more than two parameters or containing regex pattern. See documentation for more info. */
val result = df.select(split(col("words"), "Snow"))

推荐修复方法

Snowpark has an equivalent split function that receives a column object as a second argument. For that reason, the Spark overload that receives a string argument in the second argument, but it is not a regex pattern, can convert the string into a column object using the com.snowflake.snowpark.functions.lit function as a workaround.

val df = Seq("Snowflake", "Snowpark", "Snow", "Spark").toDF("words")
val result = df.select(split(col("words"), lit("Snow")))
场景 2

输入

Below is an example of the org.apache.spark.sql.functions.split function that generates this EWI. In this example, the split function has two parameters and the second argument is a regex pattern.

val df = Seq("Snowflake", "Snowpark", "Snow", "Spark").toDF("words")
val result = df.select(split(col("words"), "^([\\d]+-[\\d]+-[\\d])"))

输出

The SMA adds the EWI SPRKSCL1171 to the output code to let you know that this function is not fully supported by Snowpark because regex patterns are not supported by Snowflake.

val df = Seq("Snowflake", "Snowpark", "Snow", "Spark").toDF("words")
/* EWI: SPRKSCL1171 => Snowpark does not support split functions with more than two parameters or containing regex pattern. See documentation for more info. */
val result = df.select(split(col("words"), "^([\\d]+-[\\d]+-[\\d])"))

推荐修复方法

由于 Snowflake 不支持正则表达式模式,因此,请尝试将该模式替换为非正则表达式模式字符串。

场景 3

输入

Below is an example of the org.apache.spark.sql.functions.split function that generates this EWI. In this example, the split function has more than two parameters.

val df = Seq("Snowflake", "Snowpark", "Snow", "Spark").toDF("words")
val result = df.select(split(df("words"), "Snow", 3))

输出

The SMA adds the EWI SPRKSCL1171 to the output code to let you know that this function is not fully supported by Snowpark, because Snowflake does not have a split function with more than two parameters.

val df = Seq("Snowflake", "Snowpark", "Snow", "Spark").toDF("words")
/* EWI: SPRKSCL1171 => Snowpark does not support split functions with more than two parameters or containing regex pattern. See documentation for more info. */
val result3 = df.select(split(df("words"), "Snow", 3))

推荐修复方法

由于 Snowflake 不支持具有两个以上参数的拆分函数,因此,请尝试使用 Snowflake 支持的拆分函数。

其他建议

SPRKSCL1120

消息:org.apache.spark.sql.functions.asin 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.asin (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.asin function, first used with a column name as an argument and then with a column object.

val df = Seq(0.5, 0.6, -0.5).toDF("value")
val result1 = df.select(col("value"), asin("value").as("asin_value"))
val result2 = df.select(col("value"), asin(col("value")).as("asin_value"))

输出

The SMA adds the EWI SPRKSCL1120 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(0.5, 0.6, -0.5).toDF("value")
/*EWI: SPRKSCL1120 => org.apache.spark.sql.functions.asin has a workaround, see documentation for more info*/
val result1 = df.select(col("value"), asin("value").as("asin_value"))
/*EWI: SPRKSCL1120 => org.apache.spark.sql.functions.asin has a workaround, see documentation for more info*/
val result2 = df.select(col("value"), asin(col("value")).as("asin_value"))

推荐修复方法

Snowpark has an equivalent asin function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(0.5, 0.6, -0.5).toDF("value")
val result1 = df.select(col("value"), asin(col("value")).as("asin_value"))
val result2 = df.select(col("value"), asin(col("value")).as("asin_value"))

其他建议

SPRKSCL1130

消息:org.apache.spark.sql.functions.greatest 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.greatest (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.greatest function, first used with multiple column names as arguments and then with multiple column objects.

val df = Seq(
  ("apple", 10, 20, 15),
  ("banana", 5, 25, 18),
  ("mango", 12, 8, 30)
).toDF("fruit", "value1", "value2", "value3")

val result1 = df.withColumn("greatest", greatest("value1", "value2", "value3"))
val result2 = df.withColumn("greatest", greatest(col("value1"), col("value2"), col("value3")))

输出

The SMA adds the EWI SPRKSCL1130 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(
  ("apple", 10, 20, 15),
  ("banana", 5, 25, 18),
  ("mango", 12, 8, 30)
).toDF("fruit", "value1", "value2", "value3")

/*EWI: SPRKSCL1130 => org.apache.spark.sql.functions.greatest has a workaround, see documentation for more info*/
val result1 = df.withColumn("greatest", greatest("value1", "value2", "value3"))
/*EWI: SPRKSCL1130 => org.apache.spark.sql.functions.greatest has a workaround, see documentation for more info*/
val result2 = df.withColumn("greatest", greatest(col("value1"), col("value2"), col("value3")))

推荐修复方法

Snowpark has an equivalent greatest function that receives multiple column objects as arguments. For that reason, the Spark overload that receives column objects as arguments is directly supported by Snowpark and does not require any changes.

For the overload that receives multiple string arguments, you can convert the strings into column objects using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(
  ("apple", 10, 20, 15),
  ("banana", 5, 25, 18),
  ("mango", 12, 8, 30)
).toDF("fruit", "value1", "value2", "value3")

val result1 = df.withColumn("greatest", greatest(col("value1"), col("value2"), col("value3")))
val result2 = df.withColumn("greatest", greatest(col("value1"), col("value2"), col("value3")))

其他建议


描述:>- 未将 Snowpark 和 Snowpark Extensions 添加到项目配置文件中。


SPRKSCL1161

消息:无法添加依赖项。

类别:转换错误。

描述

当 SMA 在项目配置文件中检测到 SMA 不支持的 Spark 版本时,就会出现此问题,因此 SMA 无法将 Snowpark 和 Snowpark Extensions 依赖项添加到相应的项目配置文件中。如果未添加 Snowpark 依赖项,则迁移后的代码将无法编译。

场景

有三种可能的情况:sbt、gradle 和 pom.xml。SMA 尝试通过删除 Spark 依赖项以及添加 Snowpark 和 Snowpark Extensions 依赖项来处理项目配置文件。

场景 1

输入

Below is an example of the dependencies section of a sbt project configuration file.

...
libraryDependencies += "org.apache.spark" % "spark-core_2.13" % "3.5.3"
libraryDependencies += "org.apache.spark" % "spark-sql_2.13" % "3.5.3"
...

输出

The SMA adds the EWI SPRKSCL1161 to the issues inventory since the Spark version is not supported and keeps the output the same.

...
libraryDependencies += "org.apache.spark" % "spark-core_2.13" % "3.5.3"
libraryDependencies += "org.apache.spark" % "spark-sql_2.13" % "3.5.3"
...

推荐修复方法

Manually, remove the Spark dependencies and add Snowpark and Snowpark Extensions dependencies to the sbt project configuration file.

...
libraryDependencies += "com.snowflake" % "snowpark" % "1.14.0"
libraryDependencies += "net.mobilize.snowpark-extensions" % "snowparkextensions" % "0.0.18"
...

确保使用最符合项目要求的 Snowpark 版本。

场景 2

输入

Below is an example of the dependencies section of a gradle project configuration file.

dependencies {
    implementation group: 'org.apache.spark', name: 'spark-core_2.13', version: '3.5.3'
    implementation group: 'org.apache.spark', name: 'spark-sql_2.13', version: '3.5.3'
    ...
}

输出

The SMA adds the EWI SPRKSCL1161 to the issues inventory since the Spark version is not supported and keeps the output the same.

dependencies {
    implementation group: 'org.apache.spark', name: 'spark-core_2.13', version: '3.5.3'
    implementation group: 'org.apache.spark', name: 'spark-sql_2.13', version: '3.5.3'
    ...
}

推荐修复方法

Manually, remove the Spark dependencies and add Snowpark and Snowpark Extensions dependencies to the gradle project configuration file.

dependencies {
    implementation 'com.snowflake:snowpark:1.14.2'
    implementation 'net.mobilize.snowpark-extensions:snowparkextensions:0.0.18'
    ...
}

确保依赖项版本符合项目需求。

场景 3

输入

Below is an example of the dependencies section of a pom.xml project configuration file.

<dependencies>
  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.13</artifactId>
    <version>3.5.3</version>
  </dependency>

  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.13</artifactId>
    <version>3.5.3</version>
    <scope>compile</scope>
  </dependency>
  ...
</dependencies>

输出

The SMA adds the EWI SPRKSCL1161 to the issues inventory since the Spark version is not supported and keeps the output the same.

<dependencies>
  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.13</artifactId>
    <version>3.5.3</version>
  </dependency>

  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.13</artifactId>
    <version>3.5.3</version>
    <scope>compile</scope>
  </dependency>
  ...
</dependencies>

推荐修复方法

Manually, remove the Spark dependencies and add Snowpark and Snowpark Extensions dependencies to the gradle project configuration file.

<dependencies>
  <dependency>
    <groupId>com.snowflake</groupId>
    <artifactId>snowpark</artifactId>
    <version>1.14.2</version>
  </dependency>

  <dependency>
    <groupId>net.mobilize.snowpark-extensions</groupId>
    <artifactId>snowparkextensions</artifactId>
    <version>0.0.18</version>
  </dependency>
  ...
</dependencies>

确保依赖项版本符合项目需求。

其他建议

  • 确保输入中包含项目配置文件:

    • build.sbt

    • build.gradle

    • pom.xml

  • SMA 支持的 Spark 版本为 2.12:3.1.2

  • You can check the latest Snowpark version here (https://github.com/snowflakedb/snowpark-java-scala/releases/latest).

  • For more support, you can email us at sma-support@snowflake.com or post an issue in the SMA.

SPRKSCL1155

警告

This issue code has been deprecated since Spark Conversion Core Version 4.3.2

消息:org.apache.spark.sql.functions.countDistinct 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.countDistinct (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.countDistinct function, first used with column names as arguments and then with column objects.

val df = Seq(
  ("Alice", 1),
  ("Bob", 2),
  ("Alice", 3),
  ("Bob", 4),
  ("Alice", 1),
  ("Charlie", 5)
).toDF("name", "value")

val result1 = df.select(countDistinct("name", "value"))
val result2 = df.select(countDistinct(col("name"), col("value")))

输出

The SMA adds the EWI SPRKSCL1155 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(
  ("Alice", 1),
  ("Bob", 2),
  ("Alice", 3),
  ("Bob", 4),
  ("Alice", 1),
  ("Charlie", 5)
).toDF("name", "value")

/*EWI: SPRKSCL1155 => org.apache.spark.sql.functions.countDistinct has a workaround, see documentation for more info*/
val result1 = df.select(countDistinct("name", "value"))
/*EWI: SPRKSCL1155 => org.apache.spark.sql.functions.countDistinct has a workaround, see documentation for more info*/
val result2 = df.select(countDistinct(col("name"), col("value")))

推荐修复方法

As a workaround, you can use the count_distinct function. For the Spark overload that receives string arguments, you additionally have to convert the strings into column objects using the com.snowflake.snowpark.functions.col function.

val df = Seq(
  ("Alice", 1),
  ("Bob", 2),
  ("Alice", 3),
  ("Bob", 4),
  ("Alice", 1),
  ("Charlie", 5)
).toDF("name", "value")

val result1 = df.select(count_distinct(col("name"), col("value")))
val result2 = df.select(count_distinct(col("name"), col("value")))

其他建议

SPRKSCL1104

此问题代码已 弃用

消息:不支持 Spark Session 生成器选项。

类别:转换错误。

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.SparkSession.Builder.config (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/SparkSession$$Builder.html) function, which is setting an option of the Spark Session and it is not supported by Snowpark.

场景

输入

Below is an example of the org.apache.spark.sql.SparkSession.Builder.config function used to set an option in the Spark Session.

val spark = SparkSession.builder()
           .master("local")
           .appName("testApp")
           .config("spark.sql.broadcastTimeout", "3600")
           .getOrCreate()

输出

The SMA adds the EWI SPRKSCL1104 to the output code to let you know config method is not supported by Snowpark. Then, it is not possible to set options in the Spark Session via config function and it might affects the migration of the Spark Session statement.

val spark = Session.builder.configFile("connection.properties")
/*EWI: SPRKSCL1104 => SparkBuilder Option is not supported .config("spark.sql.broadcastTimeout", "3600")*/
.create()

推荐修复方法

要创建会话,需要添加适当的 Snowflake Snowpark 配置。

在此示例中,使用了配置变量。

    val configs = Map (
      "URL" -> "https://<myAccount>.snowflakecomputing.cn:<port>",
      "USER" -> <myUserName>,
      "PASSWORD" -> <myPassword>,
      "ROLE" -> <myRole>,
      "WAREHOUSE" -> <myWarehouse>,
      "DB" -> <myDatabase>,
      "SCHEMA" -> <mySchema>
    )
    val session = Session.builder.configs(configs).create

此外,还建议使用包含连接信息的 configFile (profile.properties):

## profile.properties file (a text file)
URL = https://<account_identifier>.snowflakecomputing.cn
USER = <username>
PRIVATEKEY = <unencrypted_private_key_from_the_private_key_file>
ROLE = <role_name>
WAREHOUSE = <warehouse_name>
DB = <database_name>
SCHEMA = <schema_name>

And with the Session.builder.configFile the session can be created:

val session = Session.builder.configFile("/path/to/properties/file").create

其他建议

SPRKSCL1124

消息:org.apache.spark.sql.functions.cosh 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.cosh (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.cosh function, first used with a column name as an argument and then with a column object.

val df = Seq(0.0, 1.0, 2.0, -1.0).toDF("value")
val result1 = df.withColumn("cosh_value", cosh("value"))
val result2 = df.withColumn("cosh_value", cosh(col("value")))

输出

The SMA adds the EWI SPRKSCL1124 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(0.0, 1.0, 2.0, -1.0).toDF("value")
/*EWI: SPRKSCL1124 => org.apache.spark.sql.functions.cosh has a workaround, see documentation for more info*/
val result1 = df.withColumn("cosh_value", cosh("value"))
/*EWI: SPRKSCL1124 => org.apache.spark.sql.functions.cosh has a workaround, see documentation for more info*/
val result2 = df.withColumn("cosh_value", cosh(col("value")))

推荐修复方法

Snowpark has an equivalent cosh function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(0.0, 1.0, 2.0, -1.0).toDF("value")
val result1 = df.withColumn("cosh_value", cosh(col("value")))
val result2 = df.withColumn("cosh_value", cosh(col("value")))

其他建议

SPRKSCL1175

Message: The two-parameter udf function is not supported in Snowpark. It should be converted into a single-parameter udf function. Please check the documentation to learn how to manually modify the code to make it work in Snowpark.

类别:转换错误。

描述

This issue appears when the SMA detects an use of the two-parameter org.apache.spark.sql.functions.udf (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function in the source code, because Snowpark does not have an equivalent two-parameter udf function, then the output code might not compile.

场景

输入

Below is an example of the org.apache.spark.sql.functions.udf function that generates this EWI. In this example, the udf function has two parameters.

val myFuncUdf = udf(new UDF1[String, Integer] {
  override def call(s: String): Integer = s.length()
}, IntegerType)

输出

The SMA adds the EWI SPRKSCL1175 to the output code to let you know that the udf function is not supported, because it has two parameters.

/*EWI: SPRKSCL1175 => The two-parameter udf function is not supported in Snowpark. It should be converted into a single-parameter udf function. Please check the documentation to learn how to manually modify the code to make it work in Snowpark.*/
val myFuncUdf = udf(new UDF1[String, Integer] {
  override def call(s: String): Integer = s.length()
}, IntegerType)

推荐修复方法

Snowpark only supports the single-parameter udf function (without the return type parameter), so you should convert your two-parameter udf function into a single-parameter udf function in order to make it work in Snowpark.

例如,对于上面提到的示例代码,必须手动将其转换为以下代码:

val myFuncUdf = udf((s: String) => s.length())

Please note that there are some caveats about creating udf in Snowpark that might require you to make some additional manual changes to your code. Please check this other recommendations here related with creating single-parameter udf functions in Snowpark for more details.

其他建议

  • To learn more about how to create user-defined functions in Snowpark, please refer to the following documentation: Creating User-Defined Functions (UDFs) for DataFrames in Scala

  • For more support, you can email us at sma-support@snowflake.com or post an issue in the SMA.

SPRKSCL1001

Message: This code section has parsing errors. The parsing error was found at: line *line number*, column *column number*. When trying to parse *statement*. This file was not converted, so it is expected to still have references to the Spark API.

类别:解析错误。

描述

当 SMA 检测到文件代码中存在某些无法正确读取或理解的语句时,就会出现此问题,称为 解析错误。此外,当文件出现一个或多个解析错误时,就会出现此问题。

场景

输入

以下是无效 Scala 代码的示例。

/#/(%$"$%

Class myClass {

    def function1() = { 1 }

}

输出

The SMA adds the EWI SPRKSCL1001 to the output code to let you know that the code of the file has parsing errors. Therefore, SMA is not able to process a file with this error.

// **********************************************************************************************************************
// EWI: SPRKSCL1001 => This code section has parsing errors
// The parsing error was found at: line 0, column 0. When trying to parse ''.
// This file was not converted, so it is expected to still have references to the Spark API
// **********************************************************************************************************************
/#/(%$"$%

Class myClass {

    def function1() = { 1 }

}

推荐修复方法

由于该消息指出了错误语句,因此您可以尝试识别无效语法并将其删除,或注释掉该语句以避免解析错误。

Class myClass {

    def function1() = { 1 }

}
// /#/(%$"$%

Class myClass {

    def function1() = { 1 }

}

其他建议

SPRKSCL1141

消息:org.apache.spark.sql.functions.stddev_pop 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.stddev_pop (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

Below is an example of the org.apache.spark.sql.functions.stddev_pop function, first used with a column name as an argument and then with a column object.

输入

val df = Seq(
  ("Alice", 23),
  ("Bob", 30),
  ("Carol", 27),
  ("David", 25),
).toDF("name", "age")

val result1 = df.select(stddev_pop("age"))
val result2 = df.select(stddev_pop(col("age")))

输出

The SMA adds the EWI SPRKSCL1141 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(
  ("Alice", 23),
  ("Bob", 30),
  ("Carol", 27),
  ("David", 25),
).toDF("name", "age")

/*EWI: SPRKSCL1141 => org.apache.spark.sql.functions.stddev_pop has a workaround, see documentation for more info*/
val result1 = df.select(stddev_pop("age"))
/*EWI: SPRKSCL1141 => org.apache.spark.sql.functions.stddev_pop has a workaround, see documentation for more info*/
val result2 = df.select(stddev_pop(col("age")))

推荐修复方法

Snowpark has an equivalent stddev_pop function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(
  ("Alice", 23),
  ("Bob", 30),
  ("Carol", 27),
  ("David", 25),
).toDF("name", "age")

val result1 = df.select(stddev_pop(col("age")))
val result2 = df.select(stddev_pop(col("age")))

其他建议

SPRKSCL1110

备注

此问题代码已 弃用

Message: Reader method not supported *method name*.

类别:警告

描述

当 SMA 在 DataFrameReader 方法链接中检测到 Snowflake 不支持的方法时,就会出现此问题。然后,这可能会影响 reader 语句的迁移。

场景

输入

以下是 DataFrameReader 方法链接的示例,其中 Snowflake 不支持加载方法。

spark.read.
    format("net.snowflake.spark.snowflake").
    option("query", s"select * from $tablename")
    load()

输出

The SMA adds the EWI SPRKSCL1110 to the output code to let you know that load method is not supported by Snowpark. Then, it might affects the migration of the reader statement.

session.sql(s"select * from $tablename")
/*EWI: SPRKSCL1110 => Reader method not supported .load()*/

推荐修复方法

Check the Snowpark documentation for reader here, in order to know the supported methods by Snowflake.

其他建议

SPRKSCL1100

This issue code has been deprecated since Spark Conversion Core 2.3.22

消息:不支持重新分区。

类别:解析错误。

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.DataFrame.repartition (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Dataset.html) function, which is not supported by Snowpark. Snowflake manages the storage and the workload on the clusters making repartition operation inapplicable.

场景

输入

Below is an example of the org.apache.spark.sql.DataFrame.repartition function used to return a new DataFrame partitioned by the given partitioning expressions.

    var nameData = Seq("James", "Sarah", "Dylan", "Leila, "Laura", "Peter")
    var jobData = Seq("Police", "Doctor", "Actor", "Teacher, "Dentist", "Fireman")
    var ageData = Seq(40, 38, 34, 27, 29, 55)

    val dfName = nameData.toDF("name")
    val dfJob = jobData.toDF("job")
    val dfAge = ageData.toDF("age")

    val dfRepartitionByExpresion = dfName.repartition($"name")

    val dfRepartitionByNumber = dfJob.repartition(3)

    val dfRepartitionByBoth = dfAge.repartition(3, $"age")

    val joinedDf = dfRepartitionByExpresion.join(dfRepartitionByNumber)

输出

The SMA adds the EWI SPRKSCL1100 to the output code to let you know that this function is not supported by Snowpark.

    var nameData = Seq("James", "Sarah", "Dylan", "Leila, "Laura", "Peter")
    var jobData = Seq("Police", "Doctor", "Actor", "Teacher, "Dentist", "Fireman")
    var ageData = Seq(40, 38, 34, 27, 29, 55)

    val dfName = nameData.toDF("name")
    val dfJob = jobData.toDF("job")
    val dfAge = ageData.toDF("age")

    /*EWI: SPRKSCL1100 => Repartition is not supported*/
    val dfRepartitionByExpresion = dfName.repartition($"name")

    /*EWI: SPRKSCL1100 => Repartition is not supported*/
    val dfRepartitionByNumber = dfJob.repartition(3)

    /*EWI: SPRKSCL1100 => Repartition is not supported*/
    val dfRepartitionByBoth = dfAge.repartition(3, $"age")

    val joinedDf = dfRepartitionByExpresion.join(dfRepartitionByNumber)

推荐修复方法

由于 Snowflake 管理集群上的存储和工作负载,因此重新分区操作不适用。这意味着根本不需要在联接之前使用分区。

    var nameData = Seq("James", "Sarah", "Dylan", "Leila, "Laura", "Peter")
    var jobData = Seq("Police", "Doctor", "Actor", "Teacher, "Dentist", "Fireman")
    var ageData = Seq(40, 38, 34, 27, 29, 55)

    val dfName = nameData.toDF("name")
    val dfJob = jobData.toDF("job")
    val dfAge = ageData.toDF("age")

    val dfRepartitionByExpresion = dfName

    val dfRepartitionByNumber = dfJob

    val dfRepartitionByBoth = dfAge

    val joinedDf = dfRepartitionByExpresion.join(dfRepartitionByNumber)

其他建议

SPRKSCL1151

消息:org.apache.spark.sql.functions.var_samp 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.var_samp (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.var_samp function, first used with a column name as an argument and then with a column object.

val df = Seq(
  ("A", 10),
  ("A", 20),
  ("A", 30),
  ("B", 40),
  ("B", 50),
  ("B", 60)
).toDF("category", "value")

val result1 = df.groupBy("category").agg(var_samp("value"))
val result2 = df.groupBy("category").agg(var_samp(col("value")))

输出

The SMA adds the EWI SPRKSCL1151 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(
  ("A", 10),
  ("A", 20),
  ("A", 30),
  ("B", 40),
  ("B", 50),
  ("B", 60)
).toDF("category", "value")

/*EWI: SPRKSCL1151 => org.apache.spark.sql.functions.var_samp has a workaround, see documentation for more info*/
val result1 = df.groupBy("category").agg(var_samp("value"))
/*EWI: SPRKSCL1151 => org.apache.spark.sql.functions.var_samp has a workaround, see documentation for more info*/
val result2 = df.groupBy("category").agg(var_samp(col("value")))

推荐修复方法

Snowpark has an equivalent var_samp function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(
  ("A", 10),
  ("A", 20),
  ("A", 30),
  ("B", 40),
  ("B", 50),
  ("B", 60)
).toDF("category", "value")

val result1 = df.groupBy("category").agg(var_samp(col("value")))
val result2 = df.groupBy("category").agg(var_samp(col("value")))

其他建议


描述:>- DataFrameReader 方法链接中读取器的格式不是 Snowpark 定义的格式之一。


SPRKSCL1165

消息:无法定义 DataFrameReader 方法链接中的读取器格式

类别:警告

描述

This issue appears when the SMA detects that format of the reader in DataFrameReader method chaining is not one of the following supported for Snowpark: avro, csv, json, orc, parquet and xml. Therefore, the SMA can not determine if setting options are defined or not.

场景

输入

以下是 DataFrameReader 方法链接的示例,其中 SMA 可以确定读取器的格式。

spark.read.format("net.snowflake.spark.snowflake")
                 .option("query", s"select * from $tableName")
                 .load()

输出

The SMA adds the EWI SPRKSCL1165 to the output code to let you know that format of the reader can not be determine in the giving DataFrameReader method chaining.

/*EWI: SPRKSCL1165 => Reader format on DataFrameReader method chaining can't be defined*/
spark.read.option("query", s"select * from $tableName")
                 .load()

推荐修复方法

Check the Snowpark documentation here to get more information about format of the reader.

其他建议

SPRKSCL1134

消息:org.apache.spark.sql.functions.log 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.log (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.log function that generates this EWI.

val df = Seq(10.0, 20.0, 30.0, 40.0).toDF("value")
val result1 = df.withColumn("log_value", log(10, "value"))
val result2 = df.withColumn("log_value", log(10, col("value")))
val result3 = df.withColumn("log_value", log("value"))
val result4 = df.withColumn("log_value", log(col("value")))

输出

The SMA adds the EWI SPRKSCL1134 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(10.0, 20.0, 30.0, 40.0).toDF("value")
/*EWI: SPRKSCL1134 => org.apache.spark.sql.functions.log has a workaround, see documentation for more info*/
val result1 = df.withColumn("log_value", log(10, "value"))
/*EWI: SPRKSCL1134 => org.apache.spark.sql.functions.log has a workaround, see documentation for more info*/
val result2 = df.withColumn("log_value", log(10, col("value")))
/*EWI: SPRKSCL1134 => org.apache.spark.sql.functions.log has a workaround, see documentation for more info*/
val result3 = df.withColumn("log_value", log("value"))
/*EWI: SPRKSCL1134 => org.apache.spark.sql.functions.log has a workaround, see documentation for more info*/
val result4 = df.withColumn("log_value", log(col("value")))

推荐修复方法

Below are the different workarounds for all the overloads of the log function.

1. def log(base:Double, columnName:String):Column

You can convert the base into a column object using the com.snowflake.snowpark.functions.lit function and convert the column name into a column object using the com.snowflake.snowpark.functions.col function.

val result1 = df.withColumn("log_value", log(lit(10), col("value")))

2. def log(base:Double, a:Column):Column

You can convert the base into a column object using the com.snowflake.snowpark.functions.lit function.

val result2 = df.withColumn("log_value", log(lit(10), col("value")))

3.def log(columnName:String):Column

You can pass lit(Math.E) as the first argument and convert the column name into a column object using the com.snowflake.snowpark.functions.col function and pass it as the second argument.

val result3 = df.withColumn("log_value", log(lit(Math.E), col("value")))

4. def log(e:Column):Column

You can pass lit(Math.E) as the first argument and the column object as the second argument.

val result4 = df.withColumn("log_value", log(lit(Math.E), col("value")))

其他建议

SPRKSCL1125

警告

This issue code is deprecated since Spark Conversion Core 2.9.0

消息:org.apache.spark.sql.functions.count 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.count (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.count function, first used with a column name as an argument and then with a column object.

val df = Seq(
  ("Alice", "Math"),
  ("Bob", "Science"),
  ("Alice", "Science"),
  ("Bob", null)
).toDF("name", "subject")

val result1 = df.groupBy("name").agg(count("subject").as("subject_count"))
val result2 = df.groupBy("name").agg(count(col("subject")).as("subject_count"))

输出

The SMA adds the EWI SPRKSCL1125 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(
  ("Alice", "Math"),
  ("Bob", "Science"),
  ("Alice", "Science"),
  ("Bob", null)
).toDF("name", "subject")

/*EWI: SPRKSCL1125 => org.apache.spark.sql.functions.count has a workaround, see documentation for more info*/
val result1 = df.groupBy("name").agg(count("subject").as("subject_count"))
/*EWI: SPRKSCL1125 => org.apache.spark.sql.functions.count has a workaround, see documentation for more info*/
val result2 = df.groupBy("name").agg(count(col("subject")).as("subject_count"))

推荐修复方法

Snowpark has an equivalent count function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(
  ("Alice", "Math"),
  ("Bob", "Science"),
  ("Alice", "Science"),
  ("Bob", null)
).toDF("name", "subject")

val result1 = df.groupBy("name").agg(count(col("subject")).as("subject_count"))
val result2 = df.groupBy("name").agg(count(col("subject")).as("subject_count"))

其他建议

SPRKSCL1174

Message: The single-parameter udf function is supported in Snowpark but it might require manual intervention. Please check the documentation to learn how to manually modify the code to make it work in Snowpark.

类别:警告。

描述

This issue appears when the SMA detects an use of the single-parameter org.apache.spark.sql.functions.udf (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function in the code. Then, it might require a manual intervention.

The Snowpark API provides an equivalent com.snowflake.snowpark.functions.udf function that allows you to create a user-defined function from a lambda or function in Scala, however, there are some caveats about creating udf in Snowpark that might require you to make some manual changes to your code in order to make it work properly.

场景

The Snowpark udf function should work as intended for a wide range of cases without requiring manual intervention. However, there are some scenarios that would requiere you to manually modify your code in order to get it work in Snowpark. Some of those scenarios are listed below:

场景 1

输入

以下是在具有 App Trait 的对象中创建 UDFs 的示例。

The Scala's App trait simplifies creating executable programs by providing a main method that automatically runs the code within the object definition. Extending App delays the initialization of the fields until the main method is executed, which can affect the UDFs definitions if they rely on initialized fields. This means that if an object extends App and the udf references an object field, the udf definition uploaded to Snowflake will not include the initialized value of the field. This can result in null values being returned by the udf.

For example, in the following code the variable myValue will resolve to null in the udf definition:

object Main extends App {
  ...
  val myValue = 10
  val myUdf = udf((x: Int) => x + myValue) // myValue in the `udf` definition will resolve to null
  ...
}

输出

The SMA adds the EWI SPRKSCL1174 to the output code to let you know that the single-parameter udf function is supported in Snowpark but it requires manual intervention.

object Main extends App {
  ...
  val myValue = 10
  /*EWI: SPRKSCL1174 => The single-parameter udf function is supported in Snowpark but it might require manual intervention. Please check the documentation to learn how to manually modify the code to make it work in Snowpark.*/
  val myUdf = udf((x: Int) => x + myValue) // myValue in the `udf` definition will resolve to null
  ...
}

推荐修复方法

To avoid this issue, it is recommended to not extend App and implement a separate main method for your code. This ensure that object fields are initialized before udf definitions are created and uploaded to Snowflake.

object Main {
  ...
  def main(args: Array[String]): Unit = {
    val myValue = 10
    val myUdf = udf((x: Int) => x + myValue)
  }
  ...
}

For more details about this topic, see Caveat About Creating UDFs in an Object With the App Trait.

场景 2

输入

以下是在 Jupyter Notebook 中创建 UDFs 的示例。

def myFunc(s: String): String = {
  ...
}

val myFuncUdf = udf((x: String) => myFunc(x))
df1.select(myFuncUdf(col("name"))).show()

输出

The SMA adds the EWI SPRKSCL1174 to the output code to let you know that the single-parameter udf function is supported in Snowpark but it requires manual intervention.

def myFunc(s: String): String = {
  ...
}

/*EWI: SPRKSCL1174 => The single-parameter udf function is supported in Snowpark but it might require manual intervention. Please check the documentation to learn how to manually modify the code to make it work in Snowpark.*/
val myFuncUdf = udf((x: String) => myFunc(x))
df1.select(myFuncUdf(col("name"))).show()

推荐修复方法

To create a udf in a Jupyter Notebook, you should define the implementation of your function in a class that extends Serializable. For example, you should manually convert it into this:

object ConvertedUdfFuncs extends Serializable {
  def myFunc(s: String): String = {
    ...
  }

  val myFuncAsLambda = ((x: String) => ConvertedUdfFuncs.myFunc(x))
}

val myFuncUdf = udf(ConvertedUdfFuncs.myFuncAsLambda)
df1.select(myFuncUdf(col("name"))).show()

For more details about how to create UDFs in Jupyter Notebooks, see Creating UDFs in Jupyter Notebooks.

其他建议

SPRKSCL1000

Message: Source project spark-core version is *version number*, the spark-core version supported by snowpark is 2.12:3.1.2 so there may be functional differences between the existing mappings

类别:警告

描述

This issue appears when the SMA detects a version of the spark-core that is not supported by SMA. Therefore, there may be functional differences between the existing mappings and the output might have unexpected behaviors.

其他建议

  • SMA 支持的 spark-core 版本为 2.12:3.1.2。考虑更改源代码的版本。

  • For more support, you can email us at sma-support@snowflake.com or post an issue in the SMA.

SPRKSCL1140

消息:org.apache.spark.sql.functions.stddev 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.stddev (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.stddev function, first used with a column name as an argument and then with a column object.

val df = Seq(
  ("Alice", 10),
  ("Bob", 15),
  ("Charlie", 20),
  ("David", 25),
).toDF("name", "score")

val result1 = df.select(stddev("score"))
val result2 = df.select(stddev(col("score")))

输出

The SMA adds the EWI SPRKSCL1140 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(
  ("Alice", 10),
  ("Bob", 15),
  ("Charlie", 20),
  ("David", 25),
).toDF("name", "score")

/*EWI: SPRKSCL1140 => org.apache.spark.sql.functions.stddev has a workaround, see documentation for more info*/
val result1 = df.select(stddev("score"))
/*EWI: SPRKSCL1140 => org.apache.spark.sql.functions.stddev has a workaround, see documentation for more info*/
val result2 = df.select(stddev(col("score")))

推荐修复方法

Snowpark has an equivalent stddev function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(
  ("Alice", 10),
  ("Bob", 15),
  ("Charlie", 20),
  ("David", 25),
).toDF("name", "score")

val result1 = df.select(stddev(col("score")))
val result2 = df.select(stddev(col("score")))

其他建议

SPRKSCL1111

备注

此问题代码已 弃用

消息:不支持 CreateDecimalType。

类别:转换错误。

描述

This issue appears when the SMA detects a usage org.apache.spark.sql.types.DataTypes.CreateDecimalType (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/types/DecimalType.html) function.

场景

输入

以下是 org.apache.spark.sql.types.DataTypes.CreateDecimalType 函数的使用示例。

var result = DataTypes.createDecimalType(18, 8)

输出

The SMA adds the EWI SPRKSCL1111 to the output code to let you know that CreateDecimalType function is not supported by Snowpark.

/*EWI: SPRKSCL1111 => CreateDecimalType is not supported*/
var result = createDecimalType(18, 8)

推荐修复方法

目前还没有推荐的修复方法。

消息:不支持 Spark Session 生成器选项。

类别:转换错误。

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.SparkSession.Builder.config (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/SparkSession$$Builder.html) function, which is setting an option of the Spark Session and it is not supported by Snowpark.

场景

输入

Below is an example of the org.apache.spark.sql.SparkSession.Builder.config function used to set an option in the Spark Session.

val spark = SparkSession.builder()
           .master("local")
           .appName("testApp")
           .config("spark.sql.broadcastTimeout", "3600")
           .getOrCreate()

输出

The SMA adds the EWI SPRKSCL1104 to the output code to let you know config method is not supported by Snowpark. Then, it is not possible to set options in the Spark Session via config function and it might affects the migration of the Spark Session statement.

val spark = Session.builder.configFile("connection.properties")
/*EWI: SPRKSCL1104 => SparkBuilder Option is not supported .config("spark.sql.broadcastTimeout", "3600")*/
.create()

推荐修复方法

要创建会话,需要添加适当的 Snowflake Snowpark 配置。

在此示例中,使用了配置变量。

    val configs = Map (
      "URL" -> "https://<myAccount>.snowflakecomputing.cn:<port>",
      "USER" -> <myUserName>,
      "PASSWORD" -> <myPassword>,
      "ROLE" -> <myRole>,
      "WAREHOUSE" -> <myWarehouse>,
      "DB" -> <myDatabase>,
      "SCHEMA" -> <mySchema>
    )
    val session = Session.builder.configs(configs).create

此外,还建议使用包含连接信息的 configFile (profile.properties):

## profile.properties file (a text file)
URL = https://<account_identifier>.snowflakecomputing.cn
USER = <username>
PRIVATEKEY = <unencrypted_private_key_from_the_private_key_file>
ROLE = <role_name>
WAREHOUSE = <warehouse_name>
DB = <database_name>
SCHEMA = <schema_name>

And with the Session.builder.configFile the session can be created:

val session = Session.builder.configFile("/path/to/properties/file").create

其他建议

SPRKSCL1101

This issue code has been deprecated since Spark Conversion Core 2.3.22

消息:不支持广播

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.broadcast (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which is not supported by Snowpark. This function is not supported because Snowflake does not support broadcast variables (https://spark.apache.org/docs/latest/api/java/org/apache/spark/broadcast/Broadcast.html).

场景

输入

Below is an example of the org.apache.spark.sql.functions.broadcast function used to create a broadcast object to use on each Spark cluster:

    var studentData = Seq(
      ("James", "Orozco", "Science"),
      ("Andrea", "Larson", "Bussiness"),
    )

    var collegeData = Seq(
      ("Arts", 1),
      ("Bussiness", 2),
      ("Science", 3)
    )

    val dfStudent = studentData.toDF("FirstName", "LastName", "CollegeName")
    val dfCollege = collegeData.toDF("CollegeName", "CollegeCode")

    dfStudent.join(
      broadcast(dfCollege),
      Seq("CollegeName")
    )

输出

The SMA adds the EWI SPRKSCL1101 to the output code to let you know that this function is not supported by Snowpark.

    var studentData = Seq(
      ("James", "Orozco", "Science"),
      ("Andrea", "Larson", "Bussiness"),
    )

    var collegeData = Seq(
      ("Arts", 1),
      ("Bussiness", 2),
      ("Science", 3)
    )

    val dfStudent = studentData.toDF("FirstName", "LastName", "CollegeName")
    val dfCollege = collegeData.toDF("CollegeName", "CollegeCode")

    dfStudent.join(
      /*EWI: SPRKSCL1101 => Broadcast is not supported*/
      broadcast(dfCollege),
      Seq("CollegeName")
    )

推荐修复方法

由于 Snowflake 管理集群上的存储和工作负载,因此广播对象不适用。这意味着根本不可能要求使用广播,但每种情况都需要进一步分析。

The recommended approach is replace a Spark dataframe broadcast by a Snowpark regular dataframe or by using a dataframe method as Join.

For the proposed input the fix is to adapt the join to use directly the dataframe collegeDF without the use of broadcast for the dataframe.

    var studentData = Seq(
      ("James", "Orozco", "Science"),
      ("Andrea", "Larson", "Bussiness"),
    )

    var collegeData = Seq(
      ("Arts", 1),
      ("Bussiness", 2),
      ("Science", 3)
    )

    val dfStudent = studentData.toDF("FirstName", "LastName", "CollegeName")
    val dfCollege = collegeData.toDF("CollegeName", "CollegeCode")

    dfStudent.join(
      dfCollege,
      Seq("CollegeName")
    ).show()

其他建议

SPRKSCL1150

消息:org.apache.spark.sql.functions.var_pop 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.var_pop (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.var_pop function, first used with a column name as an argument and then with a column object.

val df = Seq(
  ("A", 10.0),
  ("A", 20.0),
  ("A", 30.0),
  ("B", 40.0),
  ("B", 50.0),
  ("B", 60.0)
).toDF("group", "value")

val result1 = df.groupBy("group").agg(var_pop("value"))
val result2 = df.groupBy("group").agg(var_pop(col("value")))

输出

The SMA adds the EWI SPRKSCL1150 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(
  ("A", 10.0),
  ("A", 20.0),
  ("A", 30.0),
  ("B", 40.0),
  ("B", 50.0),
  ("B", 60.0)
).toDF("group", "value")

/*EWI: SPRKSCL1150 => org.apache.spark.sql.functions.var_pop has a workaround, see documentation for more info*/
val result1 = df.groupBy("group").agg(var_pop("value"))
/*EWI: SPRKSCL1150 => org.apache.spark.sql.functions.var_pop has a workaround, see documentation for more info*/
val result2 = df.groupBy("group").agg(var_pop(col("value")))

推荐修复方法

Snowpark has an equivalent var_pop function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(
  ("A", 10.0),
  ("A", 20.0),
  ("A", 30.0),
  ("B", 40.0),
  ("B", 50.0),
  ("B", 60.0)
).toDF("group", "value")

val result1 = df.groupBy("group").agg(var_pop(col("value")))
val result2 = df.groupBy("group").agg(var_pop(col("value")))

其他建议


描述:>- org.apache.spark.sql.DataFrameReader.option 函数的参数未定义。


SPRKSCL1164

备注

此问题代码已 弃用

消息:未对 org.apache.spark.sql.DataFrameReader.option 定义该参数

类别:警告

描述

This issue appears when the SMA detects that giving parameter of org.apache.spark.sql.DataFrameReader.option (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html) is not defined.

场景

输入

Below is an example of undefined parameter for org.apache.spark.sql.DataFrameReader.option function.

spark.read.option("header", True).json(path)

输出

The SMA adds the EWI SPRKSCL1164 to the output code to let you know that giving parameter to the org.apache.spark.sql.DataFrameReader.option function is not defined.

/*EWI: SPRKSCL1164 => The parameter header=True is not supported for org.apache.spark.sql.DataFrameReader.option*/
spark.read.option("header", True).json(path)

推荐修复方法

Check the Snowpark documentation for reader format option here, in order to identify the defined options.

其他建议

SPRKSCL1135

警告

This issue code is deprecated since Spark Conversion Core 4.3.2

消息:org.apache.spark.sql.functions.mean 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.mean (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.mean function, first used with a column name as an argument and then with a column object.

val df = Seq(1, 3, 10, 1, 3).toDF("value")
val result1 = df.select(mean("value"))
val result2 = df.select(mean(col("value")))

输出

The SMA adds the EWI SPRKSCL1135 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(1, 3, 10, 1, 3).toDF("value")
/*EWI: SPRKSCL1135 => org.apache.spark.sql.functions.mean has a workaround, see documentation for more info*/
val result1 = df.select(mean("value"))
/*EWI: SPRKSCL1135 => org.apache.spark.sql.functions.mean has a workaround, see documentation for more info*/
val result2 = df.select(mean(col("value")))

推荐修复方法

Snowpark has an equivalent mean function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(1, 3, 10, 1, 3).toDF("value")
val result1 = df.select(mean(col("value")))
val result2 = df.select(mean(col("value")))

其他建议

SPRKSCL1115

警告

This issue code has been deprecated since Spark Conversion Core Version 4.6.0

消息:org.apache.spark.sql.functions.round 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.round (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.round function that generates this EWI.

val df = Seq(3.9876, 5.673, 8.1234).toDF("value")
val result1 = df.withColumn("rounded_value", round(col("value")))
val result2 = df.withColumn("rounded_value", round(col("value"), 2))

输出

The SMA adds the EWI SPRKSCL1115 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(3.9876, 5.673, 8.1234).toDF("value")
/*EWI: SPRKSCL1115 => org.apache.spark.sql.functions.round has a workaround, see documentation for more info*/
val result1 = df.withColumn("rounded_value", round(col("value")))
/*EWI: SPRKSCL1115 => org.apache.spark.sql.functions.round has a workaround, see documentation for more info*/
val result2 = df.withColumn("rounded_value", round(col("value"), 2))

推荐修复方法

Snowpark has an equivalent round function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a column object and a scale, you can convert the scale into a column object using the com.snowflake.snowpark.functions.lit function as a workaround.

val df = Seq(3.9876, 5.673, 8.1234).toDF("value")
val result1 = df.withColumn("rounded_value", round(col("value")))
val result2 = df.withColumn("rounded_value", round(col("value"), lit(2)))

其他建议

SPRKSCL1144

消息:无法加载符号表

类别:解析错误

描述

当 SMA 执行过程中出现严重错误时,就会出现此问题。由于无法加载符号表,SMA 无法启动评估或转换过程。

其他建议

  • This is unlikely to be an error in the source code itself, but rather is an error in how the SMA processes the source code. The best resolution would be to post an issue in the SMA.

  • For more support, you can email us at sma-support@snowflake.com or post an issue in the SMA.

SPRKSCL1170

备注

此问题代码已 弃用

消息:平台特定密钥不支持 sparkConfig 成员密钥。

类别:转换错误

描述

如果您使用的是旧版本,请升级到最新版本。

其他建议

SPRKSCL1121

消息:org.apache.spark.sql.functions.atan 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.atan (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.atan function, first used with a column name as an argument and then with a column object.

val df = Seq(1.0, 0.5, -1.0).toDF("value")
val result1 = df.withColumn("atan_value", atan("value"))
val result2 = df.withColumn("atan_value", atan(col("value")))

输出

The SMA adds the EWI SPRKSCL1121 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(1.0, 0.5, -1.0).toDF("value")
/*EWI: SPRKSCL1121 => org.apache.spark.sql.functions.atan has a workaround, see documentation for more info*/
val result1 = df.withColumn("atan_value", atan("value"))
/*EWI: SPRKSCL1121 => org.apache.spark.sql.functions.atan has a workaround, see documentation for more info*/
val result2 = df.withColumn("atan_value", atan(col("value")))

推荐修复方法

Snowpark has an equivalent atan function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(1.0, 0.5, -1.0).toDF("value")
val result1 = df.withColumn("atan_value", atan(col("value")))
val result2 = df.withColumn("atan_value", atan(col("value")))

其他建议

SPRKSCL1131

消息:org.apache.spark.sql.functions.grouping 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.grouping (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.grouping function, first used with a column name as an argument and then with a column object.

val df = Seq(("Alice", 2), ("Bob", 5)).toDF("name", "age")
val result1 = df.cube("name").agg(grouping("name"), sum("age"))
val result2 = df.cube("name").agg(grouping(col("name")), sum("age"))

输出

The SMA adds the EWI SPRKSCL1131 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(("Alice", 2), ("Bob", 5)).toDF("name", "age")
/*EWI: SPRKSCL1131 => org.apache.spark.sql.functions.grouping has a workaround, see documentation for more info*/
val result1 = df.cube("name").agg(grouping("name"), sum("age"))
/*EWI: SPRKSCL1131 => org.apache.spark.sql.functions.grouping has a workaround, see documentation for more info*/
val result2 = df.cube("name").agg(grouping(col("name")), sum("age"))

推荐修复方法

Snowpark has an equivalent grouping function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq(("Alice", 2), ("Bob", 5)).toDF("name", "age")
val result1 = df.cube("name").agg(grouping(col("name")), sum("age"))
val result2 = df.cube("name").agg(grouping(col("name")), sum("age"))

其他建议

SPRKSCL1160

备注

This issue code has been deprecated since Spark Conversion Core 4.1.0

消息:org.apache.spark.sql.functions.sum 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.sum (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.sum function that generates this EWI. In this example, the sum function is used to calculate the sum of selected column.

val df = Seq("1", "2", "3", "4", "5").toDF("elements")
val result1 = sum(col("elements"))
val result2 = sum("elements")

输出

The SMA adds the EWI SPRKSCL1160 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq("1", "2", "3", "4", "5").toDF("elements")
/*EWI: SPRKSCL1160 => org.apache.spark.sql.functions.sum has a workaround, see documentation for more info*/
val result1 = sum(col("elements"))
/*EWI: SPRKSCL1160 => org.apache.spark.sql.functions.sum has a workaround, see documentation for more info*/
val result2 = sum("elements")

推荐修复方法

Snowpark has an equivalent sum function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

val df = Seq("1", "2", "3", "4", "5").toDF("elements")
val result1 = sum(col("elements"))
val result2 = sum(col("elements"))

其他建议

SPRKSCL1154

消息:org.apache.spark.sql.functions.ceil 有替代方案,请参阅文档了解更多信息

类别:警告

描述

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.ceil (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html) function, which has a workaround.

场景

输入

Below is an example of the org.apache.spark.sql.functions.ceil function, first used with a column name as an argument, then with a column object and finally with a column object and a scale.

val df = Seq(2.33, 3.88, 4.11, 5.99).toDF("value")
val result1 = df.withColumn("ceil", ceil("value"))
val result2 = df.withColumn("ceil", ceil(col("value")))
val result3 = df.withColumn("ceil", ceil(col("value"), lit(1)))

输出

The SMA adds the EWI SPRKSCL1154 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(2.33, 3.88, 4.11, 5.99).toDF("value")
/*EWI: SPRKSCL1154 => org.apache.spark.sql.functions.ceil has a workaround, see documentation for more info*/
val result1 = df.withColumn("ceil", ceil("value"))
/*EWI: SPRKSCL1154 => org.apache.spark.sql.functions.ceil has a workaround, see documentation for more info*/
val result2 = df.withColumn("ceil", ceil(col("value")))
/*EWI: SPRKSCL1154 => org.apache.spark.sql.functions.ceil has a workaround, see documentation for more info*/
val result3 = df.withColumn("ceil", ceil(col("value"), lit(1)))

推荐修复方法

Snowpark has an equivalent ceil function that receives a column object as an argument. For that reason, the Spark overload that receives a column object as an argument is directly supported by Snowpark and does not require any changes.

For the overload that receives a string argument, you can convert the string into a column object using the com.snowflake.snowpark.functions.col function as a workaround.

For the overload that receives a column object and a scale, you can use the callBuiltin function to invoke the Snowflake builtin CEIL function. To use it, you should pass the string "ceil" as the first argument, the column as the second argument and the scale as the third argument.

val df = Seq(2.33, 3.88, 4.11, 5.99).toDF("value")
val result1 = df.withColumn("ceil", ceil(col("value")))
val result2 = df.withColumn("ceil", ceil(col("value")))
val result3 = df.withColumn("ceil", callBuiltin("ceil", col("value"), lit(1)))

其他建议

SPRKSCL1105

此问题代码已 弃用

消息:不支持写入器格式值。

类别:转换错误

描述

This issue appears when the org.apache.spark.sql.DataFrameWriter.format (https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameWriter.html) has an argument that is not supported by Snowpark.

场景

There are some scenarios depending on the type of format you are trying to save. It can be a supported, or non-supported format.

场景 1

输入

该工具会分析尝试保存的格式类型,支持的格式有:

  • csv

  • json

  • orc

  • parquet

  • text

    dfWrite.write.format("csv").save(path)

输出

The tool transforms the format method into a csv method call when save function has one parameter.

    dfWrite.write.csv(path)

推荐修复方法

在此示例中,该工具不显示 EWI,这意味着无需修复。

场景 2

输入

The below example shows how the tool transforms the format method when passing a net.snowflake.spark.snowflake value.

dfWrite.write.format("net.snowflake.spark.snowflake").save(path)

输出

The tool shows the EWI SPRKSCL1105 indicating that the value net.snowflake.spark.snowflake is not supported.

/*EWI: SPRKSCL1105 => Writer format value is not supported .format("net.snowflake.spark.snowflake")*/
dfWrite.write.format("net.snowflake.spark.snowflake").save(path)

推荐修复方法

For the not supported scenarios there is no specific fix since it depends on the files that are trying to be read.

场景 3

输入

The below example shows how the tool transforms the format method when passing a csv, but using a variable instead.

val myFormat = "csv"
dfWrite.write.format(myFormat).save(path)

输出

Since the tool can not determine the value of the variable in runtime, shows the EWI SPRKSCL1163 indicating that the value is not supported.

val myFormat = "csv"
/*EWI: SPRKSCL1163 => format_type is not a literal and can't be evaluated*/
dfWrite.write.format(myFormat).load(path)

推荐修复方法

As a workaround, you can check the value of the variable and add it as a string to the format call.

其他建议