Snowpark Migration Accelerator:版本说明¶
请注意,下方的版本说明按版本日期组织整理。下方将显示应用程序和转换核心的版本号。
Version 2.10.0 (Sep 24, 2025)¶
Application & CLI Version 2.10.0¶
Included SMA Core Versions¶
Snowpark Conversion Core 8.0.62
已添加¶
Added functionality to migrate SQL embedded with Python format interpolation.
Added support for
DataFrame.select
andDataFrame.sort
transformations for greater data processing flexibility.
更改¶
Bumped the supported versions of Snowpark Python API and Snowpark Pandas API to 1.36.0.
Updated the mapping status of
pandas.core.frame.DataFrame.boxplot
from Not Supported to Direct.Updated the mapping status of
DataFrame.select
,Dataset.select
,DataFrame.sort
andDataset.sort
from Direct to Transformation.Snowpark Scala allows a sequence of columns to be passed directly to the select and sort functions, so this transformation changes all the usages such as
df.select(cols: _*)
todf.select(cols)
anddf.sort(cols: _*)
todf.sort(cols)
.Bumped Python AST and Parser version to 149.1.9.
Updated the status to Direct for pandas functions:
pandas.core.frame.DataFrame.to_excel
pandas.core.series.Series.to_excel
pandas.io.feather_format.read_feather
pandas.io.orc.read_orc
pandas.io.stata.read_stata
Updated the status for
pyspark.sql.pandas.map_ops.PandasMapOpsMixin.mapInPandas
to workaround using the EWI SPRKPY1102.
已修复¶
Fixed issue that affected SqlEmbedded transformations when using chained method calls.
Fixed transformations involving PySqlExpr using the new PyLiteralSql to avoid losing Tails.
Resolved internal stability issues to improve tool robustness and reliability.
Version 2.7.7 (Aug 28, 2025)¶
Application & CLI Version 2.7.7¶
Included SMA Core Versions¶
Snowpark Conversion Core 8.0.46
已添加¶
Added new Pandas EWI documentation PNDSPY1011.
Added support to the following Pandas functions:
pandas.core.algorithms.unique
pandas.core.dtypes.missing.isna
pandas.core.dtypes.missing.isnull
pandas.core.dtypes.missing.notna
pandas.core.dtypes.missing.notnull
pandas.core.resample.Resampler.count
pandas.core.resample.Resampler.max
pandas.core.resample.Resampler.mean
pandas.core.resample.Resampler.median
pandas.core.resample.Resampler.min
pandas.core.resample.Resampler.size
pandas.core.resample.Resampler.sum
pandas.core.arrays.timedeltas.TimedeltaArray.total_seconds
pandas.core.series.Series.get
pandas.core.series.Series.to_frame
pandas.core.frame.DataFrame.assign
pandas.core.frame.DataFrame.get
pandas.core.frame.DataFrame.to_numpy
pandas.core.indexes.base.Index.is_unique
pandas.core.indexes.base.Index.has_duplicates
pandas.core.indexes.base.Index.shape
pandas.core.indexes.base.Index.array
pandas.core.indexes.base.Index.str
pandas.core.indexes.base.Index.equals
pandas.core.indexes.base.Index.identical
pandas.core.indexes.base.Index.unique
Added support to the following Spark Scala functions:
org.apache.spark.sql.functions.format_number
org.apache.spark.sql.functions.from_unixtime
org.apache.spark.sql.functions.instr
org.apache.spark.sql.functions.months_between
org.apache.spark.sql.functions.pow
org.apache.spark.sql.functions.to_unix_timestamp
org.apache.spark.sql.Row.getAs
更改¶
Bumped the version of Snowpark Pandas API supported by the SMA to 1.33.0.
Bumped the version of Snowpark Scala API supported by the SMA to 1.16.0.
Updated the mapping status of pyspark.sql.group.GroupedData.pivot from Transformation to Direct.
Updated the mapping status of org.apache.spark.sql.Builder.master from NotSupported to Transformation. This transformation removes all the identified usages of this element during code conversion.
Updated the mapping status of org.apache.spark.sql.types.StructType.fieldIndex from NotSupported to Direct.
Updated the mapping status of org.apache.spark.sql.Row.fieldIndex from NotSupported to Direct.
Updated the mapping status of org.apache.spark.sql.SparkSession.stop from NotSupported to Rename. All the identified usages of this element are renamed to com.snowflake.snowpark.Session.close during code conversion.
Updated the mapping status of org.apache.spark.sql.DataFrame.unpersist and org.apache.spark.sql.Dataset.unpersist from NotSupported to Transformation. This transformation removes all the identified usages of this element during code conversion.
已修复¶
Fixed continuation backslash on removed tailed functions.
Fix the LIBRARY_PREFIX column in the ConversionStatusLibraries.csv file to use the right identifier for scikit-learn library family (scikit-*).
Fixed bug not parsing multiline grouped operations.
Version 2.9.0 (Sep 09, 2025)¶
Included SMA Core Versions¶
Snowpark Conversion Core 8.0.53
已添加¶
The following mappings are now performed for
org.apache.spark.sql.Dataset[T]
:org.apache.spark.sql.Dataset.union
is nowcom.snowflake.snowpark.DataFrame.unionAll
org.apache.spark.sql.Dataset.unionByName
is nowcom.snowflake.snowpark.DataFrame.unionAllByName
Added support for
org.apache.spark.sql.functions.broadcast
as a transformation.
更改¶
Increased the supported Snowpark Python API version for SMA from
1.27.0
to1.33.0
.The status for the
pyspark.sql.function.randn
function has been updated to Direct.
已修复¶
Resolved an issue where
org.apache.spark.SparkContext.parallelize
was not resolving and now supports it as a transformation.Fixed the
Dataset.persist
transformation to work with any type of Dataset, not justDataset[Row]
.
Version 2.7.6 (Jul 17, 2025)¶
Included SMA Core Versions¶
Snowpark Conversion Core 8.0.30
已添加¶
Adjusted mappings for spark.DataReader methods.
DataFrame.union
is nowDataFrame.unionAll
.DataFrame.unionByName
is nowDataFrame.unionAllByName
.Added multi-level artifact dependency columns in artifact inventory
Added new Pandas EWIs documentation, from
PNDSPY1005
toPNDSPY1010
.Added a specific EWI for
pandas.core.series.Series.apply
.
更改¶
Bumped the version of Snowpark Pandas API supported by the SMA from
1.27.0
to1.30.0
.
已修复¶
Fixed an issue with missing values in the formula to get the SQL readiness score.
Fixed a bug that was causing some Pandas elements to have the default EWI message from PySpark.
Version 2.7.5 (Jul 2, 2025)¶
Application & CLI Version 2.7.5¶
Included SMA Core Versions¶
Snowpark Conversion Core 8.0.19
更改¶
Refactored Pandas Imports: Pandas imports now use `modin.pandas` instead of
snowflake.snowpark.modin.pandas
.Improved `dbutils` and Magic Commands Transformation:
A new
sfutils.py
file is now generated, and alldbutils
prefixes are replaced withsfutils
.For Databricks (DBX) notebooks, an implicit import for
sfutils
is automatically added.The
sfutils
module simulates variousdbutils
methods, including file system operations (dbutils.fs
) via a defined Snowflake FileSystem (SFFS) stage, and handles notebook execution (dbutils.notebook.run
) by transforming it toEXECUTE NOTEBOOK
SQL functions.dbutils.notebook.exit
is removed as it is not required in Snowflake.
已修复¶
Updates in SnowConvert Reports: SnowConvert reports now include the CellId column when instances originate from SMA, and the FileName column displays the full path.
Updated Artifacts Dependency for SnowConvert Reports: The SMA's artifact inventory report, which was previously impacted by the integration of SnowConvert, has been restored. This update enables the SMA tool to accurately capture and analyze Object References and Missing Object References directly from SnowConvert reports, thereby ensuring the correct retrieval of SQL dependencies for the inventory.
Version 2.7.4 (Jun 26, 2025)¶
Application & CLI Version 2.7.4¶
Desktop App
已添加¶
Added telemetry improvements.
已修复¶
Fix documentation links in conversion settings pop-up and Pandas EWIs.
Included SMA Core Versions¶
Snowpark Conversion Core 8.0.16
已添加¶
Transforming Spark XML to Snowpark
Databricks SQL option in the SQL source language
Transform JDBC read connections.
更改¶
All the SnowConvert reports are copied to the backup Zip file.
The folder is renamed from
SqlReports
toSnowConvertReports
.SqlFunctionsInventory
is moved to the folderReports
.All the SnowConvert Reports are sent to Telemetry.
已修复¶
Non-deterministic issue with SQL Readiness Score.
Fixed a false-positive critical result that made the desktop crash.
Fixed issue causing the Artifacts dependency report not to show the SQL objects.
Version 2.7.2 (Jun 10, 2025)¶
Application & CLI Version 2.7.2¶
Included SMA Core Versions¶
Snowpark Conversion Core 8.0.2
已修复¶
Addressed an issue with SMA execution on the latest Windows OS, as previously reported. This fix resolves the issues encountered in version 2.7.1.
Version 2.7.1 (Jun 9, 2025)¶
Application & CLI Version 2.7.1¶
Included SMA Core Versions¶
Snowpark Conversion Core 8.0.1
已添加¶
The Snowpark Migration Accelerator (SMA) now orchestrates SnowConvert (https://docs.snowconvert.com/sc/general/about) to process SQL found in user workloads, including embedded SQL in Python / Scala code, Notebook SQL cells, .sql
files, and .hql
files.
The SnowConvert now enhances the previous SMA capabilities:
Spark SQL (https://docs.snowconvert.com/sc/translation-references/spark-dbx)
A new folder in the Reports called SQL Reports contains the reports generated by SnowConvert.
Known Issues¶
The previous SMA version for SQL reports will appear empty for the following:
For
Reports/SqlElementsInventory.csv
, partially covered by theReports/SqlReports/Elements.yyyymmdd.hhmmss.csv.
For
Reports/SqlFunctionsInventory.csv
refer to the new location with the same name atReports/SqlReports/SqlFunctionsInventory.csv
The artifact dependency inventory:
In the
ArtifactDependencyInventory
the column for the SQL Object will appear empty
Version 2.6.10 (May 5, 2025)¶
Application & CLI Version 2.6.10¶
Included SMA Core Versions¶
Snowpark Conversion Core 7.4.0
已修复¶
Fixed wrong values in the 'checkpoints.json' file.
The 'sample' value was without decimals (for integer values) and quotes.
The 'entryPoint' value had dots instead of slashes and was missing the file extension.
Updated the default value to TRUE for the setting 'Convert DBX notebooks to Snowflake notebooks'
Version 2.6.8 (Apr 28, 2025)¶
Application & CLI Version 2.6.8¶
Desktop App¶
Added checkpoints execution settings mechanism recognition.
Added a mechanism to collect DBX magic commands into DbxElementsInventory.csv
Added 'checkpoints.json' generation into the input directory.
Added a new EWI for all not supported magic command.
Added the collection of dbutils into DbxElementsInventory.csv from scala source notebooks
Included SMA Core Versions¶
Snowpark Conversion Core 7.2.53
更改¶
Updates made to handle transformations from DBX Scala elements to Jupyter Python elements, and to comment the entire code from the cell.
Updates made to handle transformations from dbutils.notebook.run and “r" commands, for the last one, also comment out the entire code from the cell.
Updated the name and the letter of the key to make the conversion of the notebook files.
已修复¶
Fixed the bug that was causing the transformation of DBX notebooks into .ipynb files to have the wrong format.
Fixed the bug that was causing .py DBX notebooks to not be transformable into .ipynb files.
Fixed a bug that was causing comments to be missing in the output code of DBX notebooks.
Fixed a bug that was causing raw Scala files to be converted into ipynb files.
Version 2.6.7 (Apr 21, 2025)¶
Application & CLI Version 2.6.7¶
Included SMA Core Versions¶
Snowpark Conversion Core 7.2.42
更改¶
Updated DataFramesInventory to fill EntryPoints column
Version 2.6.6 (Apr 7, 2025)¶
Application & CLI Version 2.6.6¶
Desktop App¶
已添加¶
Update DBx EWI link in the UI results page
Included SMA Core Versions¶
Snowpark Conversion Core 7.2.39
已添加¶
Added Execution Flow inventory generation.
Added implicit session setup in every DBx notebook transformation
更改¶
Renamed the DbUtilsUsagesInventory.csv to DbxElementsInventory.csv
已修复¶
Fixed a bug that caused a Parsing error when a backslash came after a type hint.
Fixed relative imports that do not start with a dot and relative imports with a star.
Version 2.6.5 (Mar 27, 2025)¶
Application & CLI Version 2.6.5¶
Desktop App¶
已添加¶
Added a new conversion setting toggle to enable or disable Sma-Checkpoints feature.
Fix report issue to not crash when post api returns 500
Included SMA Core Versions¶
Snowpark Conversion Core 7.2.26
已添加¶
Added generation of the checkpoints.json file into the output folder based on the DataFramesInventory.csv.
Added "disableCheckpoints" flag into the CLI commands and additional parameters of the code processor.
Added a new replacer for Python to transform the dbutils.notebook.run node.
Added new replacers to transform the magic %run command.
Added new replacers (Python and Scala) to remove the dbutils.notebook.exit node.
Added Location column to artifacts inventory.
更改¶
Refactored the normalized directory separator used in some parts of the solution.
Centralized the DBC extraction working folder name handling.
Updated Snowpark and Pandas version to v1.27.0
Updated the artifacts inventory columns to:
Name -> Dependency
File -> FileId
Status -> Status_detail
Added new column to the artifacts inventory:
Success
已修复¶
Dataframes inventory was not being uploaded to the stage correctly.
Version 2.6.4 (Mar 12, 2025)¶
Application & CLI Version 2.6.4¶
Included SMA Core Versions ¶
Snowpark Conversion Core 7.2.0
Added ¶
An Artifact Dependency Inventory
A replacer and EWI for pyspark.sql.types.StructType.fieldNames method to snowflake.snowpark.types.StructType.fieldNames attribute.
The following PySpark functions with the status:
Direct Status
pyspark.sql.functions.bitmap_bit_position
pyspark.sql.functions.bitmap_bucket_number
pyspark.sql.functions.bitmap_construct_agg
pyspark.sql.functions.equal_null
pyspark.sql.functions.ifnull
pyspark.sql.functions.localtimestamp
pyspark.sql.functions.max_by
pyspark.sql.functions.min_by
pyspark.sql.functions.nvl
pyspark.sql.functions.regr_avgx
pyspark.sql.functions.regr_avgy
pyspark.sql.functions.regr_count
pyspark.sql.functions.regr_intercept
pyspark.sql.functions.regr_slope
pyspark.sql.functions.regr_sxx
pyspark.sql.functions.regr_sxy
pyspark.sql.functions.regr
NotSupported
pyspark.sql.functions.map_contains_key
pyspark.sql.functions.position
pyspark.sql.functions.regr_r2
pyspark.sql.functions.try_to_binary
The following Pandas functions with status
pandas.core.series.Series.str.ljust
pandas.core.series.Series.str.center
pandas.core.series.Series.str.pad
pandas.core.series.Series.str.rjust
Update the following Pyspark functions with the status
From WorkAround to Direct
pyspark.sql.functions.acosh
pyspark.sql.functions.asinh
pyspark.sql.functions.atanh
pyspark.sql.functions.instr
pyspark.sql.functions.log10
pyspark.sql.functions.log1p
pyspark.sql.functions.log2
From NotSupported to Direct
pyspark.sql.functions.bit_length
pyspark.sql.functions.cbrt
pyspark.sql.functions.nth_value
pyspark.sql.functions.octet_length
pyspark.sql.functions.base64
pyspark.sql.functions.unbase64
Updated the folloing Pandas functions with the status
From NotSupported to Direct
pandas.core.frame.DataFrame.pop
pandas.core.series.Series.between
pandas.core.series.Series.pop
Version 2.6.3 (Mar 6, 2025)¶
Application & CLI Version 2.6.3¶
Included SMA Core Versions ¶
Snowpark Conversion Core 7.1.13
Added ¶
Added csv generator class for new inventory creation.
Added "full_name" column to import usages inventory.
Added transformation from pyspark.sql.functions.concat_ws to snowflake.snowpark.functions._concat_ws_ignore_nulls.
Added logic for generation of checkpoints.json.
Added the inventories:
DataFramesInventory.csv.
CheckpointsInventory.csv
Version 2.6.0 (Feb 21, 2025)¶
Application & CLI Version 2.6.0¶
Desktop App ¶
Updated the licensing agreement, acceptance is required.
Included SMA Core Versions¶
Snowpark Conversion Core 7.1.2
已添加
Updated the mapping status for the following PySpark elements, from NotSupported
to Direct
pyspark.sql.types.ArrayType.json
pyspark.sql.types.ArrayType.jsonValue
pyspark.sql.types.ArrayType.simpleString
pyspark.sql.types.ArrayType.typeName
pyspark.sql.types.AtomicType.json
pyspark.sql.types.AtomicType.jsonValue
pyspark.sql.types.AtomicType.simpleString
pyspark.sql.types.AtomicType.typeName
pyspark.sql.types.BinaryType.json
pyspark.sql.types.BinaryType.jsonValue
pyspark.sql.types.BinaryType.simpleString
pyspark.sql.types.BinaryType.typeName
pyspark.sql.types.BooleanType.json
pyspark.sql.types.BooleanType.jsonValue
pyspark.sql.types.BooleanType.simpleString
pyspark.sql.types.BooleanType.typeName
pyspark.sql.types.ByteType.json
pyspark.sql.types.ByteType.jsonValue
pyspark.sql.types.ByteType.simpleString
pyspark.sql.types.ByteType.typeName
pyspark.sql.types.DecimalType.json
pyspark.sql.types.DecimalType.jsonValue
pyspark.sql.types.DecimalType.simpleString
pyspark.sql.types.DecimalType.typeName
pyspark.sql.types.DoubleType.json
pyspark.sql.types.DoubleType.jsonValue
pyspark.sql.types.DoubleType.simpleString
pyspark.sql.types.DoubleType.typeName
pyspark.sql.types.FloatType.json
pyspark.sql.types.FloatType.jsonValue
pyspark.sql.types.FloatType.simpleString
pyspark.sql.types.FloatType.typeName
pyspark.sql.types.FractionalType.json
pyspark.sql.types.FractionalType.jsonValue
pyspark.sql.types.FractionalType.simpleString
pyspark.sql.types.FractionalType.typeName
pyspark.sql.types.IntegerType.json
pyspark.sql.types.IntegerType.jsonValue
pyspark.sql.types.IntegerType.simpleString
pyspark.sql.types.IntegerType.typeName
pyspark.sql.types.IntegralType.json
pyspark.sql.types.IntegralType.jsonValue
pyspark.sql.types.IntegralType.simpleString
pyspark.sql.types.IntegralType.typeName
pyspark.sql.types.LongType.json
pyspark.sql.types.LongType.jsonValue
pyspark.sql.types.LongType.simpleString
pyspark.sql.types.LongType.typeName
pyspark.sql.types.MapType.json
pyspark.sql.types.MapType.jsonValue
pyspark.sql.types.MapType.simpleString
pyspark.sql.types.MapType.typeName
pyspark.sql.types.NullType.json
pyspark.sql.types.NullType.jsonValue
pyspark.sql.types.NullType.simpleString
pyspark.sql.types.NullType.typeName
pyspark.sql.types.NumericType.json
pyspark.sql.types.NumericType.jsonValue
pyspark.sql.types.NumericType.simpleString
pyspark.sql.types.NumericType.typeName
pyspark.sql.types.ShortType.json
pyspark.sql.types.ShortType.jsonValue
pyspark.sql.types.ShortType.simpleString
pyspark.sql.types.ShortType.typeName
pyspark.sql.types.StringType.json
pyspark.sql.types.StringType.jsonValue
pyspark.sql.types.StringType.simpleString
pyspark.sql.types.StringType.typeName
pyspark.sql.types.StructType.json
pyspark.sql.types.StructType.jsonValue
pyspark.sql.types.StructType.simpleString
pyspark.sql.types.StructType.typeName
pyspark.sql.types.TimestampType.json
pyspark.sql.types.TimestampType.jsonValue
pyspark.sql.types.TimestampType.simpleString
pyspark.sql.types.TimestampType.typeName
pyspark.sql.types.StructField.simpleString
pyspark.sql.types.StructField.typeName
pyspark.sql.types.StructField.json
pyspark.sql.types.StructField.jsonValue
pyspark.sql.types.DataType.json
pyspark.sql.types.DataType.jsonValue
pyspark.sql.types.DataType.simpleString
pyspark.sql.types.DataType.typeName
pyspark.sql.session.SparkSession.getActiveSession
pyspark.sql.session.SparkSession.version
pandas.io.html.read_html
pandas.io.json._normalize.json_normalize
pyspark.sql.types.ArrayType.fromJson
pyspark.sql.types.MapType.fromJson
pyspark.sql.types.StructField.fromJson
pyspark.sql.types.StructType.fromJson
pandas.core.groupby.generic.DataFrameGroupBy.pct_change
pandas.core.groupby.generic.SeriesGroupBy.pct_change
Updated the mapping status for the following Pandas elements, from NotSupported
to Direct
pandas.io.html.read_html
pandas.io.json._normalize.json_normalize
pandas.core.groupby.generic.DataFrameGroupBy.pct_change
pandas.core.groupby.generic.SeriesGroupBy.pct_change
Updated the mapping status for the following PySpark elements, from Rename
to Direct
pyspark.sql.functions.collect_list
pyspark.sql.functions.size
Fixed ¶
Standardized the format of the version number in the inventories.
Version 2.5.2 (Feb 5, 2025)¶
修补程序:应用程序和 CLI 版本 2.5.2¶
Desktop App¶
修复了在示例项目选项中进行转换时出现的问题。
Included SMA Core Versions¶
Snowpark Conversion Core 5.3.0
Version 2.5.1 (Feb 4, 2025)¶
应用程序和 CLI 版本 2.5.1¶
Desktop App¶
添加了在用户无写入权限时适用的新模式。
更新了许可协议,用户需要接受此协议。
CLI¶
修复了显示“--version”或“-v”时,CLI 屏幕中年份的问题
包含 SMA 核心版本 included-sma-core-versions¶
Snowpark Conversion Core 5.3.0
已添加¶
Added the following Python Third-Party libraries with Direct status:
about-time
affinegap
aiohappyeyeballs
alibi-detect
alive-progress
allure-nose2
allure-robotframework
anaconda-cloud-cli
anaconda-mirror
astropy-iers-data
asynch
asyncssh
autots
autoviml
aws-msk-iam-sasl-signer-python
azure-functions
backports.tarfile
blas
bottle
bson
cairo
capnproto
captum
categorical-distance
census
clickhouse-driver
clustergram
cma
conda-anaconda-telemetry
configspace
cpp-expected
dask-expr
data-science-utils
databricks-sdk
datetime-distance
db-dtypes
dedupe
dedupe-variable-datetime
dedupe_lehvenshtein_search
dedupe_levenshtein_search
diff-cover
diptest
dmglib
docstring_parser
doublemetaphone
dspy-ai
econml
emcee
emoji
environs
eth-abi
eth-hash
eth-typing
eth-utils
expat
filetype
fitter
flask-cors
fpdf2
frozendict
gcab
geojson
gettext
glib-tools
google-ads
google-ai-generativelanguage
google-api-python-client
google-auth-httplib2
google-cloud-bigquery
google-cloud-bigquery-core
google-cloud-bigquery-storage
google-cloud-bigquery-storage-core
google-cloud-resource-manager
google-generativeai
googlemaps
grapheme
graphene
graphql-relay
gravis
greykite
grpc-google-iam-v1
harfbuzz
hatch-fancy-pypi-readme
haversine
hiclass
hicolor-icon-theme
highered
hmmlearn
holidays-ext
httplib2
icu
imbalanced-ensemble
immutabledict
importlib-metadata
importlib-resources
inquirerpy
iterative-telemetry
jaraco.context
jaraco.test
jiter
jiwer
joserfc
jsoncpp
jsonpath
jsonpath-ng
jsonpath-python
kagglehub
keplergl
kt-legacy
langchain-community
langchain-experimental
langchain-snowflake
langchain-text-splitters
libabseil
libflac
libgfortran-ng
libgfortran5
libglib
libgomp
libgrpc
libgsf
libmagic
libogg
libopenblas
libpostal
libprotobuf
libsentencepiece
libsndfile
libstdcxx-ng
libtheora
libtiff
libvorbis
libwebp
lightweight-mmm
litestar
litestar-with-annotated-types
litestar-with-attrs
litestar-with-cryptography
litestar-with-jinja
litestar-with-jwt
litestar-with-prometheus
litestar-with-structlog
lunarcalendar-ext
matplotlib-venn
metricks
mimesis
modin-ray
momepy
mpg123
msgspec
msgspec-toml
msgspec-yaml
msitools
multipart
namex
nbconvert-all
nbconvert-core
nbconvert-pandoc
nlohmann_json
numba-cuda
numpyro
office365-rest-python-client
openapi-pydantic
opentelemetry-distro
opentelemetry-instrumentation
opentelemetry-instrumentation-system-metrics
optree
osmnx
pathlib
pdf2image
pfzy
pgpy
plumbum
pm4py
polars
polyfactory
poppler-cpp
postal
pre-commit
prompt-toolkit
propcache
py-partiql-parser
py_stringmatching
pyatlan
pyfakefs
pyfhel
pyhacrf-datamade
pyiceberg
pykrb5
pylbfgs
pymilvus
pymoo
pynisher
pyomo
pypdf
pypdf-with-crypto
pypdf-with-full
pypdf-with-image
pypng
pyprind
pyrfr
pysoundfile
pytest-codspeed
pytest-trio
python-barcode
python-box
python-docx
python-gssapi
python-iso639
python-magic
python-pandoc
python-zstd
pyuca
pyvinecopulib
pyxirr
qrcode
rai-sdk
ray-client
ray-observability
readline
rich-click
rouge-score
ruff
scikit-criteria
scikit-mobility
sentencepiece-python
sentencepiece-spm
setuptools-markdown
setuptools-scm
setuptools-scm-git-archive
shareplum
simdjson
simplecosine
sis-extras
slack-sdk
smac
snowflake-sqlalchemy
snowflake_legacy
socrata-py
spdlog
sphinxcontrib-images
sphinxcontrib-jquery
sphinxcontrib-youtube
splunk-opentelemetry
sqlfluff
squarify
st-theme
statistics
streamlit-antd-components
streamlit-condition-tree
streamlit-echarts
streamlit-feedback
streamlit-keplergl
streamlit-mermaid
streamlit-navigation-bar
streamlit-option-menu
strictyaml
stringdist
sybil
tensorflow-cpu
tensorflow-text
tiledb-ptorchaudio
torcheval
trio-websocket
trulens-connectors-snowflake
trulens-core
trulens-dashboard
trulens-feedback
trulens-otel-semconv
trulens-providers-cortex
tsdownsample
typing
typing-extensions
typing_extensions
unittest-xml-reporting
uritemplate
us
uuid6
wfdb
wsproto
zlib
zope.index
Added the following Python BuiltIn libraries with Direct status:
aifc
array
ast
asynchat
asyncio
asyncore
atexit
audioop
base64
bdb
binascii
bitsect
builtins
bz2
calendar
cgi
cgitb
chunk
cmath
cmd
code
codecs
codeop
colorsys
compileall
concurrent
contextlib
contextvars
copy
copyreg
cprofile
crypt
csv
ctypes
curses
dbm
difflib
dis
distutils
doctest
email
ensurepip
enum
errno
faulthandler
fcntl
filecmp
fileinput
fnmatch
fractions
ftplib
functools
gc
getopt
getpass
gettext
graphlib
grp
gzip
hashlib
heapq
hmac
html
http
idlelib
imaplib
imghdr
imp
importlib
inspect
ipaddress
itertools
keyword
linecache
locale
lzma
mailbox
mailcap
marshal
math
mimetypes
mmap
modulefinder
msilib
multiprocessing
netrc
nis
nntplib
numbers
operator
optparse
ossaudiodev
pdb
pickle
pickletools
pipes
pkgutil
platform
plistlib
poplib
posix
pprint
profile
pstats
pty
pwd
py_compile
pyclbr
pydoc
queue
quopri
random
re
reprlib
resource
rlcompleter
runpy
sched
secrets
select
selectors
shelve
shlex
signal
site
sitecustomize
smtpd
smtplib
sndhdr
socket
socketserver
spwd
sqlite3
ssl
stat
string
stringprep
struct
subprocess
sunau
symtable
sysconfig
syslog
tabnanny
tarfile
telnetlib
tempfile
termios
test
textwrap
threading
timeit
tkinter
token
tokenize
tomllib
trace
traceback
tracemalloc
tty
turtle
turtledemo
types
unicodedata
urllib
uu
uuid
venv
warnings
wave
weakref
webbrowser
wsgiref
xdrlib
xml
xmlrpc
zipapp
zipfile
zipimport
zoneinfo
Added the following Python BuiltIn libraries with NotSupported status:
msvcrt
winreg
winsound
更改¶
将 .NET 版本更新到 v9.0.0。
已改进 EWI SPRKPY1068。
将 SMA 支持的 Snowpark Python API 版本从 1.24.0 升级至 1.25.0。
更新了详细报告模板,现在包含适用于 Pandas 的 Snowpark 版本。
将以下库从 ThirdPartyLib 更改为 BuiltIn。
configparser
dataclasses
pathlib
readline
statistics
zlib
Updated the mapping status for the following Pandas elements, from Direct to Partial:
pandas.core.frame.DataFrame.add
pandas.core.frame.DataFrame.aggregate
pandas.core.frame.DataFrame.all
pandas.core.frame.DataFrame.apply
pandas.core.frame.DataFrame.astype
pandas.core.frame.DataFrame.cumsum
pandas.core.frame.DataFrame.div
pandas.core.frame.DataFrame.dropna
pandas.core.frame.DataFrame.eq
pandas.core.frame.DataFrame.ffill
pandas.core.frame.DataFrame.fillna
pandas.core.frame.DataFrame.floordiv
pandas.core.frame.DataFrame.ge
pandas.core.frame.DataFrame.groupby
pandas.core.frame.DataFrame.gt
pandas.core.frame.DataFrame.idxmax
pandas.core.frame.DataFrame.idxmin
pandas.core.frame.DataFrame.inf
pandas.core.frame.DataFrame.join
pandas.core.frame.DataFrame.le
pandas.core.frame.DataFrame.loc
pandas.core.frame.DataFrame.lt
pandas.core.frame.DataFrame.mask
pandas.core.frame.DataFrame.merge
pandas.core.frame.DataFrame.mod
pandas.core.frame.DataFrame.mul
pandas.core.frame.DataFrame.ne
pandas.core.frame.DataFrame.nunique
pandas.core.frame.DataFrame.pivot_table
pandas.core.frame.DataFrame.pow
pandas.core.frame.DataFrame.radd
pandas.core.frame.DataFrame.rank
pandas.core.frame.DataFrame.rdiv
pandas.core.frame.DataFrame.rename
pandas.core.frame.DataFrame.replace
pandas.core.frame.DataFrame.resample
pandas.core.frame.DataFrame.rfloordiv
pandas.core.frame.DataFrame.rmod
pandas.core.frame.DataFrame.rmul
pandas.core.frame.DataFrame.rolling
pandas.core.frame.DataFrame.round
pandas.core.frame.DataFrame.rpow
pandas.core.frame.DataFrame.rsub
pandas.core.frame.DataFrame.rtruediv
pandas.core.frame.DataFrame.shift
pandas.core.frame.DataFrame.skew
pandas.core.frame.DataFrame.sort_index
pandas.core.frame.DataFrame.sort_values
pandas.core.frame.DataFrame.sub
pandas.core.frame.DataFrame.to_dict
pandas.core.frame.DataFrame.transform
pandas.core.frame.DataFrame.transpose
pandas.core.frame.DataFrame.truediv
pandas.core.frame.DataFrame.var
pandas.core.indexes.datetimes.date_range
pandas.core.reshape.concat.concat
pandas.core.reshape.melt.melt
pandas.core.reshape.merge.merge
pandas.core.reshape.pivot.pivot_table
pandas.core.reshape.tile.cut
pandas.core.series.Series.add
pandas.core.series.Series.aggregate
pandas.core.series.Series.all
pandas.core.series.Series.any
pandas.core.series.Series.cumsum
pandas.core.series.Series.div
pandas.core.series.Series.dropna
pandas.core.series.Series.eq
pandas.core.series.Series.ffill
pandas.core.series.Series.fillna
pandas.core.series.Series.floordiv
pandas.core.series.Series.ge
pandas.core.series.Series.gt
pandas.core.series.Series.lt
pandas.core.series.Series.mask
pandas.core.series.Series.mod
pandas.core.series.Series.mul
pandas.core.series.Series.multiply
pandas.core.series.Series.ne
pandas.core.series.Series.pow
pandas.core.series.Series.quantile
pandas.core.series.Series.radd
pandas.core.series.Series.rank
pandas.core.series.Series.rdiv
pandas.core.series.Series.rename
pandas.core.series.Series.replace
pandas.core.series.Series.resample
pandas.core.series.Series.rfloordiv
pandas.core.series.Series.rmod
pandas.core.series.Series.rmul
pandas.core.series.Series.rolling
pandas.core.series.Series.rpow
pandas.core.series.Series.rsub
pandas.core.series.Series.rtruediv
pandas.core.series.Series.sample
pandas.core.series.Series.shift
pandas.core.series.Series.skew
pandas.core.series.Series.sort_index
pandas.core.series.Series.sort_values
pandas.core.series.Series.std
pandas.core.series.Series.sub
pandas.core.series.Series.subtract
pandas.core.series.Series.truediv
pandas.core.series.Series.value_counts
pandas.core.series.Series.var
pandas.core.series.Series.where
pandas.core.tools.numeric.to_numeric
Updated the mapping status for the following Pandas elements, from NotSupported to Direct:
pandas.core.frame.DataFrame.attrs
pandas.core.indexes.base.Index.to_numpy
pandas.core.series.Series.str.len
pandas.io.html.read_html
pandas.io.xml.read_xml
pandas.core.indexes.datetimes.DatetimeIndex.mean
pandas.core.resample.Resampler.indices
pandas.core.resample.Resampler.nunique
pandas.core.series.Series.items
pandas.core.tools.datetimes.to_datetime
pandas.io.sas.sasreader.read_sas
pandas.core.frame.DataFrame.attrs
pandas.core.frame.DataFrame.style
pandas.core.frame.DataFrame.items
pandas.core.groupby.generic.DataFrameGroupBy.head
pandas.core.groupby.generic.DataFrameGroupBy.median
pandas.core.groupby.generic.DataFrameGroupBy.min
pandas.core.groupby.generic.DataFrameGroupBy.nunique
pandas.core.groupby.generic.DataFrameGroupBy.tail
pandas.core.indexes.base.Index.is_boolean
pandas.core.indexes.base.Index.is_floating
pandas.core.indexes.base.Index.is_integer
pandas.core.indexes.base.Index.is_monotonic_decreasing
pandas.core.indexes.base.Index.is_monotonic_increasing
pandas.core.indexes.base.Index.is_numeric
pandas.core.indexes.base.Index.is_object
pandas.core.indexes.base.Index.max
pandas.core.indexes.base.Index.min
pandas.core.indexes.base.Index.name
pandas.core.indexes.base.Index.names
pandas.core.indexes.base.Index.rename
pandas.core.indexes.base.Index.set_names
pandas.core.indexes.datetimes.DatetimeIndex.day_name
pandas.core.indexes.datetimes.DatetimeIndex.month_name
pandas.core.indexes.datetimes.DatetimeIndex.time
pandas.core.indexes.timedeltas.TimedeltaIndex.ceil
pandas.core.indexes.timedeltas.TimedeltaIndex.days
pandas.core.indexes.timedeltas.TimedeltaIndex.floor
pandas.core.indexes.timedeltas.TimedeltaIndex.microseconds
pandas.core.indexes.timedeltas.TimedeltaIndex.nanoseconds
pandas.core.indexes.timedeltas.TimedeltaIndex.round
pandas.core.indexes.timedeltas.TimedeltaIndex.seconds
pandas.core.reshape.pivot.crosstab
pandas.core.series.Series.dt.round
pandas.core.series.Series.dt.time
pandas.core.series.Series.dt.weekday
pandas.core.series.Series.is_monotonic_decreasing
pandas.core.series.Series.is_monotonic_increasing
Updated the mapping status for the following Pandas elements, from NotSupported to Partial:
pandas.core.frame.DataFrame.align
pandas.core.series.Series.align
pandas.core.frame.DataFrame.tz_convert
pandas.core.frame.DataFrame.tz_localize
pandas.core.groupby.generic.DataFrameGroupBy.fillna
pandas.core.groupby.generic.SeriesGroupBy.fillna
pandas.core.indexes.datetimes.bdate_range
pandas.core.indexes.datetimes.DatetimeIndex.std
pandas.core.indexes.timedeltas.TimedeltaIndex.mean
pandas.core.resample.Resampler.asfreq
pandas.core.resample.Resampler.quantile
pandas.core.series.Series.map
pandas.core.series.Series.tz_convert
pandas.core.series.Series.tz_localize
pandas.core.window.expanding.Expanding.count
pandas.core.window.rolling.Rolling.count
pandas.core.groupby.generic.DataFrameGroupBy.aggregate
pandas.core.groupby.generic.SeriesGroupBy.aggregate
pandas.core.frame.DataFrame.applymap
pandas.core.series.Series.apply
pandas.core.groupby.generic.DataFrameGroupBy.bfill
pandas.core.groupby.generic.DataFrameGroupBy.ffill
pandas.core.groupby.generic.SeriesGroupBy.bfill
pandas.core.groupby.generic.SeriesGroupBy.ffill
pandas.core.frame.DataFrame.backfill
pandas.core.frame.DataFrame.bfill
pandas.core.frame.DataFrame.compare
pandas.core.frame.DataFrame.unstack
pandas.core.frame.DataFrame.asfreq
pandas.core.series.Series.backfill
pandas.core.series.Series.bfill
pandas.core.series.Series.compare
pandas.core.series.Series.unstack
pandas.core.series.Series.asfreq
pandas.core.series.Series.argmax
pandas.core.series.Series.argmin
pandas.core.indexes.accessors.CombinedDatetimelikeProperties.microsecond
pandas.core.indexes.accessors.CombinedDatetimelikeProperties.nanosecond
pandas.core.indexes.accessors.CombinedDatetimelikeProperties.day_name
pandas.core.indexes.accessors.CombinedDatetimelikeProperties.month_name
pandas.core.indexes.accessors.CombinedDatetimelikeProperties.month_start
pandas.core.indexes.accessors.CombinedDatetimelikeProperties.month_end
pandas.core.indexes.accessors.CombinedDatetimelikeProperties.is_year_start
pandas.core.indexes.accessors.CombinedDatetimelikeProperties.is_year_end
pandas.core.indexes.accessors.CombinedDatetimelikeProperties.is_quarter_start
pandas.core.indexes.accessors.CombinedDatetimelikeProperties.is_quarter_end
pandas.core.indexes.accessors.CombinedDatetimelikeProperties.is_leap_year
pandas.core.indexes.accessors.CombinedDatetimelikeProperties.floor
pandas.core.indexes.accessors.CombinedDatetimelikeProperties.ceil
pandas.core.groupby.generic.DataFrameGroupBy.idxmax
pandas.core.groupby.generic.DataFrameGroupBy.idxmin
pandas.core.groupby.generic.DataFrameGroupBy.std
pandas.core.indexes.timedeltas.TimedeltaIndex.mean
pandas.core.tools.timedeltas.to_timedelta
已知问题¶
此版本包含一个问题,导致无法在此版本中进行示例项目转换, 这将在下一个版本中修复
Version 2.4.3 (Jan 9, 2025)¶
应用程序和 CLI 版本 2.4.3¶
Desktop App¶
崩溃报告模式中新增故障排除指南链接。
Included SMA Core Versions¶
Snowpark Conversion Core 4.15.0
已添加¶
在 ConversionStatusPySpark.csv 文件中将以下 PySpark 元素添加为
NotSupported:
pyspark.sql.streaming.readwriter.DataStreamReader.table
pyspark.sql.streaming.readwriter.DataStreamReader.schema
pyspark.sql.streaming.readwriter.DataStreamReader.options
pyspark.sql.streaming.readwriter.DataStreamReader.option
pyspark.sql.streaming.readwriter.DataStreamReader.load
pyspark.sql.streaming.readwriter.DataStreamReader.format
pyspark.sql.streaming.query.StreamingQuery.awaitTermination
pyspark.sql.streaming.readwriter.DataStreamWriter.partitionBy
pyspark.sql.streaming.readwriter.DataStreamWriter.toTable
pyspark.sql.streaming.readwriter.DataStreamWriter.trigger
pyspark.sql.streaming.readwriter.DataStreamWriter.queryName
pyspark.sql.streaming.readwriter.DataStreamWriter.outputMode
pyspark.sql.streaming.readwriter.DataStreamWriter.format
pyspark.sql.streaming.readwriter.DataStreamWriter.option
pyspark.sql.streaming.readwriter.DataStreamWriter.foreachBatch
pyspark.sql.streaming.readwriter.DataStreamWriter.start
更改¶
更新了 Hive SQL EWIs 格式。
SPRKHVSQL1001
SPRKHVSQL1002
SPRKHVSQL1003
SPRKHVSQL1004
SPRKHVSQL1005
SPRKHVSQL1006
更新了 Spark SQL EWIs 格式。
SPRKSPSQL1001
SPRKSPSQL1002
SPRKSPSQL1003
SPRKSPSQL1004
SPRKSPSQL1005
SPRKSPSQL1006
已修复¶
修复了导致该工具无法识别某些 PySpark 元素的错误。
修复了 ThirdParty 标识的调用和 ThirdParty 导入的调用数量不匹配的问题。
Version 2.4.2 (Dec 13, 2024)¶
Application & CLI Version 2.4.2¶
Included SMA Core Versions¶
Snowpark Conversion Core 4.14.0
新增 added¶
在 ConversionStatusPySpark.csv 中添加了以下 Spark 元素:
pyspark.broadcast.Broadcast.value
pyspark.conf.SparkConf.getAll
pyspark.conf.SparkConf.setAll
pyspark.conf.SparkConf.setMaster
pyspark.context.SparkContext.addFile
pyspark.context.SparkContext.addPyFile
pyspark.context.SparkContext.binaryFiles
pyspark.context.SparkContext.setSystemProperty
pyspark.context.SparkContext.version
pyspark.files.SparkFiles
pyspark.files.SparkFiles.get
pyspark.rdd.RDD.count
pyspark.rdd.RDD.distinct
pyspark.rdd.RDD.reduceByKey
pyspark.rdd.RDD.saveAsTextFile
pyspark.rdd.RDD.take
pyspark.rdd.RDD.zipWithIndex
pyspark.sql.context.SQLContext.udf
pyspark.sql.types.StructType.simpleString
更改¶
更新了 Pandas EWIs 的文档,
PNDSPY1001
、PNDSPY1002
和PNDSPY1003
SPRKSCL1137
,使其与标准化格式保持一致,确保了所有 EWIs 的一致性和清晰度。更新了以下 Scala EWIs 的文档:
SPRKSCL1106
和SPRKSCL1107
。与标准化格式保持一致,从而确保所有 EWIs 的一致性和清晰度。
已修复¶
已修复导致 UserDefined 符号在第三方使用情况清单中显示的错误。
Version 2.4.1 (Dec 4, 2024)¶
Application & CLI Version 2.4.1¶
Included SMA Core Versions¶
Snowpark Conversion Core 4.13.1
Command Line Interface¶
已更改
为输出文件夹添加了时间戳。
Snowpark Conversion Core 4.13.1¶
已添加¶
在库映射表中添加了“Source Language”列
在 DetailedReport.docx 的 Pandas API 摘要表中添加了
Others
作为新类别
更改¶
更新了 Python EWI
SPRKPY1058
的文档。更新了 pandas EWI
PNDSPY1002
的消息,以显示相关的 Pandas 元素。更新了我们创建 .csv 报告的方式,现在,在第二次运行后其会被覆盖。
已修复¶
修复了导致在输出中无法生成笔记本文件的错误。
修复了
pyspark.sql.conf.RuntimeConfig
中的get
和set
方法的替换器,替换器现可匹配正确的全名。修复了查询标签版本不正确的问题。
修复了 UserDefined 软件包报告为 ThirdPartyLib 的问题。
\
Version 2.3.1 (Nov 14, 2024)¶
Application & CLI Version 2.3.1¶
Included SMA Core Versions¶
Snowpark Conversion Core 4.12.0
Desktop App¶
已修复
修复了 --sql 选项中区分大小写的问题。
已移除
从 show-ac 消息中删除了平台名称。
Snowpark Conversion Core 4.12.0¶
已添加¶
新增对 Snowpark Python 1.23.0 和 1.24.0 的支持。
为
pyspark.sql.dataframe.DataFrame.writeTo
函数添加了新的 EWI。现在,该函数的所有使用都将采用 EWI SPRKPY1087。
更改¶
将 Scala EWIs 的文档从
SPRKSCL1137
更新为SPRKSCL1156
,使其与标准化格式保持一致,确保了所有 EWIs 的一致性和清晰度。将 Scala EWIs 的文档从
SPRKSCL1117
更新为SPRKSCL1136
,使其与标准化格式保持一致,确保了所有 EWIs 的一致性和清晰度。更新了针对以下 EWIs 显示的消息:
SPRKPY1082
SPRKPY1083
将 Scala EWIs 的文档从
SPRKSCL1100
更新为SPRKSCL1105
,从SPRKSCL1108
更新为SPRKSCL1116
,从SPRKSCL1157
更新为SPRKSCL1175
,使其与标准化格式保持一致,确保了所有 EWIs 的一致性和清晰度。使用 EWI 将以下 PySpark 元素的映射状态从 NotSupported 更新为 Direct:
pyspark.sql.readwriter.DataFrameWriter.option
=>snowflake.snowpark.DataFrameWriter.option
:现在,该函数的所有使用都将采用 EWI SPRKPY1088pyspark.sql.readwriter.DataFrameWriter.options
=>snowflake.snowpark.DataFrameWriter.options
:现在,该函数的所有使用都将采用 EWI SPRKPY1089
将以下 PySpark 元素的映射状态从 Workaround 更新为 Rename:
pyspark.sql.readwriter.DataFrameWriter.partitionBy
=>snowflake.snowpark.DataFrameWriter.partition_by
更新了 EWI 文档:SPRKSCL1000、SPRKSCL1001、SPRKSCL1002、SPRKSCL1100、SPRKSCL1101、SPRKSCL1102、SPRKSCL1103、SPRKSCL1104、SPRKSCL1105。
Removed¶
从转换状态中移除了
pyspark.sql.dataframe.DataFrameStatFunctions.writeTo
,此元素已不再存在。
已弃用¶
已弃用以下 EWI 代码:
SPRKPY1081
SPRKPY1084
Version 2.3.0 (Oct 30, 2024)¶
应用程序和 CLI 版本 2.3.0¶
Snowpark Conversion Core 4.11.0
Snowpark Conversion Core 4.11.0¶
已添加¶
在
Issues.csv
文件中添加了一个名为Url
的新列,该列重定向到相应的 EWI 文档。为以下 Spark 元素添加了新的 EWIs:
[SPRKPY1082] pyspark.sql.readwriter.DataFrameReader.load
[SPRKPY1083] pyspark.sql.readwriter.DataFrameWriter.save
[SPRKPY1084] pyspark.sql.readwriter.DataFrameWriter.option
[SPRKPY1085] pyspark.ml.feature.VectorAssembler
[SPRKPY1086] pyspark.ml.linalg.VectorUDT
新增 38 个 Pandas 元素:
pandas.core.frame.DataFrame.select
andas.core.frame.DataFrame.str
pandas.core.frame.DataFrame.str.replace
pandas.core.frame.DataFrame.str.upper
pandas.core.frame.DataFrame.to_list
pandas.core.frame.DataFrame.tolist
pandas.core.frame.DataFrame.unique
pandas.core.frame.DataFrame.values.tolist
pandas.core.frame.DataFrame.withColumn
pandas.core.groupby.generic._SeriesGroupByScalar
pandas.core.groupby.generic._SeriesGroupByScalar[S1].agg
pandas.core.groupby.generic._SeriesGroupByScalar[S1].aggregate
pandas.core.indexes.datetimes.DatetimeIndex.year
pandas.core.series.Series.columns
pandas.core.tools.datetimes.to_datetime.date
pandas.core.tools.datetimes.to_datetime.dt.strftime
pandas.core.tools.datetimes.to_datetime.strftime
pandas.io.parsers.readers.TextFileReader.apply
pandas.io.parsers.readers.TextFileReader.astype
pandas.io.parsers.readers.TextFileReader.columns
pandas.io.parsers.readers.TextFileReader.copy
pandas.io.parsers.readers.TextFileReader.drop
pandas.io.parsers.readers.TextFileReader.drop_duplicates
pandas.io.parsers.readers.TextFileReader.fillna
pandas.io.parsers.readers.TextFileReader.groupby
pandas.io.parsers.readers.TextFileReader.head
pandas.io.parsers.readers.TextFileReader.iloc
pandas.io.parsers.readers.TextFileReader.isin
pandas.io.parsers.readers.TextFileReader.iterrows
pandas.io.parsers.readers.TextFileReader.loc
pandas.io.parsers.readers.TextFileReader.merge
pandas.io.parsers.readers.TextFileReader.rename
pandas.io.parsers.readers.TextFileReader.shape
pandas.io.parsers.readers.TextFileReader.to_csv
pandas.io.parsers.readers.TextFileReader.to_excel
pandas.io.parsers.readers.TextFileReader.unique
pandas.io.parsers.readers.TextFileReader.values
pandas.tseries.offsets
Version 2.2.3 (Oct 24, 2024)¶
Application Version 2.2.3¶
Included SMA Core Versions¶
Snowpark Conversion Core 4.10.0
Desktop App¶
已修复¶
修复了导致 SMA 在 Windows 版本菜单栏中显示 SnowConvert 而非 Snowpark Migration Accelerator 标签的错误。
修复了对于 macOS 中的
.config
目录和 Windows 中的AppData
目录没有读写权限时,导致 SMA 崩溃的错误。
Command Line Interface¶
已更改
将 CLI 可执行文件名从
snowct
重命名为sma
。移除了源语言参数,因此您不需要再指定运行的是 Python 还是 Scala 评估/转换。
通过添加以下新实参扩展了 CLI 支持的命令行参数:
--enableJupyter
|-j
:该标志用于指示是否已启用从 Databricks 笔记本到 Jupyter 的转换。--sql
|-f
:在检测 SQL 命令时使用的数据库引擎语法。--customerEmail
|-e
:配置客户电子邮件地址。--customerCompany
|-c
:配置客户的公司。--projectName
|-p
:配置客户项目。
更新了部分文本,以体现应用程序的正确名称,确保所有消息的一致性和清晰度。
更新了应用程序使用条款。
更新并扩展了 CLI 文档,以体现最新功能、增强和更改。
更新了在继续执行 SMA 之前显示的文本,以作出改进
更新了 CLI,在提示用户确认时接受 “是” 作为有效实参。
指定实参
-y
或--yes
,允许 CLI 在不等待用户交互的情况下继续执行。更新了
--sql
实参的帮助信息,以显示该实参预期应收到的值。
Snowpark Conversion Core 版本 4.10.0¶
已添加¶
为
pyspark.sql.readwriter.DataFrameWriter.partitionBy
函数添加了新 EWI。现在,该函数的所有使用都将采用 EWI SPRKPY1081。在
ImportUsagesInventory.csv
文件中添加了一个名为Technology
的新列。
更改¶
更新了第三方库就绪度分数,同时考虑了
Unknown
库。更新了
AssessmentFiles.zip
文件,使其包含.json
文件,而非.pam
文件。改进了从 CSV 到 JSON 的转换机制,以提高清单处理性能。
改进了以下 EWIs 的文档:
SPRKPY1029
SPRKPY1054
SPRKPY1055
SPRKPY1063
SPRKPY1075
SPRKPY1076
将以下 Spark Scala 元素的映射状态从
Direct
更新为Rename
。org.apache.spark.sql.functions.shiftLeft
=>com.snowflake.snowpark.functions.shiftleft
org.apache.spark.sql.functions.shiftRight
=>com.snowflake.snowpark.functions.shiftright
将以下 Spark Scala 元素的映射状态从
Not Supported
更新为Direct
。org.apache.spark.sql.functions.shiftleft
=>com.snowflake.snowpark.functions.shiftleft
org.apache.spark.sql.functions.shiftright
=>com.snowflake.snowpark.functions.shiftright
已修复¶
修复了导致 SMA 错误地填充
ImportUsagesInventory.csv
文件的Origin
列的错误。修复了导致 SMA 在
ImportUsagesInventory.csv
文件和DetailedReport.docx
文件中未将导入的库io
、json
、logging
和unittest
归类为 Python 内置导入的错误。
Version 2.2.2 (Oct 11, 2024)¶
应用程序版本 2.2.2¶
功能更新包括:
Snowpark Conversion Core 4.8.0
Snowpark Conversion Core 版本 4.8.0¶
已添加¶
添加了
EwiCatalog.csv
和 .md 文件来重新组织文档添加了
pyspark.sql.functions.ln
Direct 的映射状态。为
pyspark.context.SparkContext.getOrCreate
添加了转换请查看 EWI SPRKPY1080,以了解更多详情。
添加了对 SymbolTable 的改进,用于推断函数中参数的类型。
新增的 SymbolTable 支持静态方法,不会假设第一个参数是 self。
为缺失的 EWIs 添加了文档
SPRKHVSQL1005
SPRKHVSQL1006
SPRKSPSQL1005
SPRKSPSQL1006
SPRKSCL1002
SPRKSCL1170
SPRKSCL1171
SPRKPY1057
SPRKPY1058
SPRKPY1059
SPRKPY1060
SPRKPY1061
SPRKPY1064
SPRKPY1065
SPRKPY1066
SPRKPY1067
SPRKPY1069
SPRKPY1070
SPRKPY1077
SPRKPY1078
SPRKPY1079
SPRKPY1101
更改¶
更新了以下内容的映射状态:
pyspark.sql.functions.array_remove
从NotSupported
更新为Direct
。
已修复¶
修复了“Detail Report”中的“Code File Sizing”表,排除了 .sql 和 .hql 文件,并在表中添加了“Extra Large”行。
修复了在
Python
上将SparkSession
定义为多行时,缺少update_query_tag
的问题。修复了在
Scala
上将SparkSession
定义为多行时,缺少update_query_tag
的问题。修复了某些存在解析错误的 SQL 语句中缺少 EWI
SPRKHVSQL1001
的问题。修复了在字符串字面量中保留换行符值的问题
修复了在“File Type Summary”表中显示的代码总行数
修复了成功识别文件时“Parsing Score”显示为 0 的问题
修复了 Databricks Magic SQL 单元格清单中的 LOC 计数问题
Version 2.2.0 (Sep 26, 2024)¶
Application Version 2.2.0¶
功能更新包括:
Snowpark Conversion Core 4.6.0
Snowpark Conversion Core 版本 4.6.0¶
已添加¶
为
pyspark.sql.readwriter.DataFrameReader.parquet
添加转换。在
pyspark.sql.readwriter.DataFrameReader.option
是 Parquet 方法时,为其添加转换。
更改¶
更新了以下内容的映射状态:
pyspark.sql.types.StructType.fields
从NotSupported
更新到Direct
。pyspark.sql.types.StructType.names
从NotSupported
更新到Direct
。pyspark.context.SparkContext.setLogLevel
从Workaround
更新到Transformation
。更多详情,请参阅 EWIs SPRKPY1078 和 SPRKPY1079
org.apache.spark.sql.functions.round
从WorkAround
更新到Direct
。org.apache.spark.sql.functions.udf
从NotDefined
更新到Transformation
。更多详情,请参阅 EWIs SPRKSCL1174 和 SPRKSCL1175
将以下 Spark 元素的映射状态从
DirectHelper
更新为Direct
:org.apache.spark.sql.functions.hex
org.apache.spark.sql.functions.unhex
org.apache.spark.sql.functions.shiftleft
org.apache.spark.sql.functions.shiftright
org.apache.spark.sql.functions.reverse
org.apache.spark.sql.functions.isnull
org.apache.spark.sql.functions.unix_timestamp
org.apache.spark.sql.functions.randn
org.apache.spark.sql.functions.signum
org.apache.spark.sql.functions.sign
org.apache.spark.sql.functions.collect_list
org.apache.spark.sql.functions.log10
org.apache.spark.sql.functions.log1p
org.apache.spark.sql.functions.base64
org.apache.spark.sql.functions.unbase64
org.apache.spark.sql.functions.regexp_extract
org.apache.spark.sql.functions.expr
org.apache.spark.sql.functions.date_format
org.apache.spark.sql.functions.desc
org.apache.spark.sql.functions.asc
org.apache.spark.sql.functions.size
org.apache.spark.sql.functions.locate
org.apache.spark.sql.functions.ntile
已修复¶
修复了 Pandas Api 总数百分比中显示的值
修复了 DetailReport 中 ImportCalls 表的总百分比
已弃用¶
弃用了以下 EWI 代码:
SPRKSCL1115
Version 2.1.7 (Sep 12, 2024)¶
应用程序版本 2.1.7¶
功能更新包括:
Snowpark Conversion Core 4.5.7
Snowpark Conversion Core 4.5.2
Snowpark Conversion Core 版本 4.5.7¶
Hotfixed¶
修复了在没有使用数据时,在“Spark Usages Summaries”中添加总行数的问题
升级了 Python 程序集,现在 Version=
1.3.111
解析多行实参中的尾随逗号
Snowpark Conversion Core 版本 4.5.2¶
已添加¶
为
pyspark.sql.readwriter.DataFrameReader.option
添加了转换:在链来自 CSV 方法调用时。
在链来自 JSON 方法调用时。
为
pyspark.sql.readwriter.DataFrameReader.json
添加了转换。
更改¶
对传递给 Python/Scala 函数的 SQL 字符串执行 SMA
在 Scala/Python 中创建 AST,以发出临时 SQL 单元
创建 SqlEmbeddedUsages.csv 清单
弃用 SqlStatementsInventroy.csv 和 SqlExtractionInventory.csv
无法处理 SQL 字面量时集成 EWI
创建新任务来处理嵌入 SQL 的代码
在 Python 中收集 SqlEmbeddedUsages.csv 清单的信息
在 Python 中将 SQL 转换后的代码替换为字面量
在实施测试用例之后对其进行更新
在 SqlEmbeddedUsages 清单中创建用于遥测的表和视图
在 Scala 中为 SqlEmbeddedUsages.csv 报告收集信息
在 Scala 中将 SQL 转换后的代码替换为字面量
检查嵌入式 SQL 报告的行号顺序
在
SqlFunctionsInfo.csv
中填入了为 SparkSQL 和 HiveSQL 记录的 SQL 函数更新了以下各项的映射状态:
org.apache.spark.sql。SparkSession.sparkContext
从NotSupported转型到转换。org.apache.spark.sql.Builder.config
从NotSupported
更新到Transformation
。通过此新映射状态,SMA 将从源代码中删除该函数所有相关调用。
Version 2.1.6 (Sep 5, 2024)¶
应用程序版本 2.1.6¶
Snowpark Engines Core 版本 4.5.1 的修补程序更改
Spark Conversion Core 版本 4.5.1¶
修补程序
添加了一种机制,可在导出的 Databricks 笔记本中转换由 SMA 生成的临时 Databricks 笔记本
Version 2.1.5 (Aug 29, 2024)¶
应用程序版本 2.1.5¶
功能更新包括:
更新了 Spark Conversion Core:4.3.2
Spark Conversion Core 版本 4.3.2¶
已添加¶
添加了一种机制(通过装饰),用于获取笔记本单元格中识别出的元素的行和列
为 pyspark.sql.functions.from_json 添加了 EWI。
为 pyspark.sql.readwriter.DataFrameReader.csv 添加了转换。
为 Scala 文件启用了查询标签机制。
添加了代码分析分数及详细报告的额外链接。
在 InputFilesInventory.csv 中添加了名为 OriginFilePath 的一列
更改¶
将 pyspark.sql.functions.from_json 的映射状态从“Not Supported”更新为 Transformation。
将以下 Spark 元素的映射状态从“Workaround”更新为“Direct”:
org.apache.spark.sql.functions.countDistinct
org.apache.spark.sql.functions.max
org.apache.spark.sql.functions.min
org.apache.spark.sql.functions.mean
已弃用¶
已弃用以下 EWI 代码:
SPRKSCL1135
SPRKSCL1136
SPRKSCL1153
SPRKSCL1155
已修复¶
修复了导致 Spark API 分数计算不正确的错误。
修复了避免将 SQL 空文件或含注释文件复制到输出文件夹中的错误。
修复了 DetailedReport 中的一个错误,该错误导致笔记本统计数据 LOC 和单元格计数不准确。
Version 2.1.2 (Aug 14, 2024)¶
应用程序版本 2.1.2¶
功能更新包括:
更新了 Spark Conversion Core:4.2.0
Spark Conversion Core 版本 4.2.0¶
已添加¶
将技术列添加到 SparkUsagesInventory
添加了一个用于未定义的 SQL 元素的 EWI。
添加了 SqlFunctions 清单
收集 SqlFunctions 清单的信息
更改¶
引擎现在可以处理和打印部分地进行了解析的 Python 文件,而非保留原始文件而不做任何修改。
出现解析错误的 Python 笔记本单元格也会被处理和打印。
已修复¶
修复了
pandas.core.indexes.datetimes.DatetimeIndex.strftime
被错误地报告的问题。修复了 SQL 就绪度分数与“SQL Usages by Support Status”之间不匹配的问题。
修复了导致 SMA 报告
pandas.core.series.Series.empty
映射状态不正确的错误。修复了 DetailedReport.docx 中的“Spark API Usages Ready for Conversion”与 Assesment.json 中的 UsagesReadyForConversion 行之间不匹配的问题。
Version 2.1.1 (Aug 8, 2024)¶
应用程序版本 2.1.1¶
功能更新包括:
更新了 Spark Conversion Core:4.1.0
Spark Conversion Core 版本 4.1.0¶
已添加¶
在
AssessmentReport.json
文件中添加了以下信息第三方库就绪度分数。
已确定的第三方库调用次数。
Snowpark 内支持的第三方库调用次数。
与第三方就绪度分数、Spark API 就绪度分数和 SQL 就绪度分数关联的颜色代码。
在 Spark 创建表中,对
SqlSimpleDataType
进行了转换。添加了 direct 形式的
pyspark.sql.functions.get
映射。添加了 direct 形式的
pyspark.sql.functions.to_varchar
映射。作为统一后更改的一部分,此工具现在会在引擎中生成执行信息文件。
添加了
pyspark.sql.SparkSession.builder.appName
的替换器。
更改¶
更新了以下 Spark 元素的映射状态
从“Not Supported”更新为“Direct”映射:
pyspark.sql.functions.sign
pyspark.sql.functions.signum
更改了笔记本单元格清单报告,以指明“Element”列中每个单元格的内容种类
添加了
SCALA_READINESS_SCORE
列,该列报告的就绪度分数仅与 Scala 文件中对 Spark API 的引用有关。部分支持在
ALTER TABLE
和ALTER VIEW
中转换表属性在 Spark 创建表中,将
SqlSimpleDataType
节点的转换状态从“Pending”更新为“Transformation”SMA 支持的 Snowpark Scala API 版本从
1.7.0
更新为1.12.1
:更新了以下内容的映射状态:
org.apache.spark.sql.SparkSession.getOrCreate
从 Rename 更新为 Directorg.apache.spark.sql.functions.sum
从“Workaround”更新为“Direct”
SMA 支持的 Snowpark Python API 版本从
1.15.0
更新为1.20.0
:更新了以下内容的映射状态:
pyspark.sql.functions.arrays_zip
从“Not Supported”更新为“Direct”
更新了以下 Pandas 元素的映射状态:
Direct 映射:
pandas.core.frame.DataFrame.any
pandas.core.frame.DataFrame.applymap
更新了以下 Pandas 元素的映射状态:
从“Not Supported”更新为“Direct”映射:
pandas.core.frame.DataFrame.groupby
pandas.core.frame.DataFrame.index
pandas.core.frame.DataFrame.T
pandas.core.frame.DataFrame.to_dict
从“Not Supported”更新为“Rename”映射:
pandas.core.frame.DataFrame.map
更新了以下 Pandas 元素的映射状态:
Direct 映射:
pandas.core.frame.DataFrame.where
pandas.core.groupby.generic.SeriesGroupBy.agg
pandas.core.groupby.generic.SeriesGroupBy.aggregate
pandas.core.groupby.generic.DataFrameGroupBy.agg
pandas.core.groupby.generic.DataFrameGroupBy.aggregate
pandas.core.groupby.generic.DataFrameGroupBy.apply
“Not Supported”映射:
pandas.core.frame.DataFrame.to_parquet
pandas.core.generic.NDFrame.to_csv
pandas.core.generic.NDFrame.to_excel
pandas.core.generic.NDFrame.to_sql
更新了以下 Pandas 元素的映射状态:
Direct 映射:
pandas.core.series.Series.empty
pandas.core.series.Series.apply
pandas.core.reshape.tile.qcut
使用 EWI 的 Direct 映射:
pandas.core.series.Series.fillna
pandas.core.series.Series.astype
pandas.core.reshape.melt.melt
pandas.core.reshape.tile.cut
pandas.core.reshape.pivot.pivot_table
更新了以下 Pandas 元素的映射状态:
Direct 映射:
pandas.core.series.Series.dt
pandas.core.series.Series.groupby
pandas.core.series.Series.loc
pandas.core.series.Series.shape
pandas.core.tools.datetimes.to_datetime
pandas.io.excel._base.ExcelFile
“Not Supported”映射:
pandas.core.series.Series.dt.strftime
更新了以下 Pandas 元素的映射状态:
从“Not Supported”更新为“Direct”映射:
pandas.io.parquet.read_parquet
pandas.io.parsers.readers.read_csv
更新了以下 Pandas 元素的映射状态:
从“Not Supported”更新为“Direct”映射:
pandas.io.pickle.read_pickle
pandas.io.sql.read_sql
pandas.io.sql.read_sql_query
更新了“了解 SQL 就绪度分数”的描述。
更新了
PyProgramCollector
以收集包,并使用来自 Python 源代码的数据填充当前的包清单。将
pyspark.sql.SparkSession.builder.appName
的映射状态从“Rename”更新为“Transformation”。删除了以下 Scala 集成测试:
AssesmentReportTest_AssessmentMode.ValidateReports_AssessmentMode
AssessmentReportTest_PythonAndScala_Files.ValidateReports_PythonAndScala
AssessmentReportTestWithoutSparkUsages.ValidateReports_WithoutSparkUsages
将
pandas.core.generic.NDFrame.shape
的映射状态从“Not Supported”更新为“Direct”。将
pandas.core.series
的映射状态从“Not Supported”更新为“Direct”。
已弃用¶
弃用了 EWI 代码
SPRKSCL1160
,因为org.apache.spark.sql.functions.sum
现在是“Direct”映射。
已修复¶
修复了在 Jupyter 笔记本单元格中不支持不带实参的 Custom Magics 的错误。
修复了在出现解析错误时,在 issues.csv 报告中错误地生成 EWIs 的问题。
修复了导致 SMA 无法将 Databricks 导出的笔记本作为 Databricks 笔记本处理的错误。
修复了在处理包对象内创建的声明类型名称冲突时,出现的堆栈溢出错误。
修复了对涉及泛型的复杂 lambda 类型名称的处理,例如,
def func[X,Y](f:(Map[Option[X], Y] => Map[Y, X]))...
修复了一个 bug,该 bug 可导致 SMA 在尚未识别的 Pandas 元素中添加 PySpark EWI 代码,而非 Pandas EWI 代码。
修复了详细报告模板中的一个错字:将列从“Percentage of all Python Files”重命名为“Percentage of all files”。
修复了错误地报告
pandas.core.series.Series.shape
的 bug。