Snowpark Connect for Spark release notes for 2026¶
Snowflake uses semantic versioning for Snowpark Connect for Spark updates.
For documentation, see Snowpark Connect for Apache Spark and Orchestrating Snowpark Connect for Spark workloads.
1.25.0 (May 5, 2026)¶
Snowpark Connect for Spark¶
Bug fixes¶
- Validate JSON write encoding to surface unsupported charsets eagerly
- Fix
joinWithouter joins to returnNULLstruct for unmatched side - Preserve generator output types in
LATERAL VIEWalias - Escape glob metacharacters in JSON stage paths
- Canonicalize encoding names case-insensitively for JSON reader
- Fix
BigIntegeroverflow during JSON schema inference - Normalize compression codec names case-insensitively for JSON reader
- Always treat JSON read schemas as nullable to match Spark
- Use
limitinstead ofsamplefor Parquet variant schema discovery - Anchor stage paths to prevent unintended prefix matches in stage reads
- Map Spark CSV
PERMISSIVEmode to SnowflakeON_ERROR=PERMISSIVE - Allow empty CSV files to be read without raising an error
- Treat
RESOLVED_REFERENCE_COLUMN_NOT_FOUNDas a no-op indrop - Raise
COLUMN_ALREADY_EXISTSfor conflicts between*and aliased columns - Honor
spark.sql.legacy.negativeIndexInArrayInsertinarray_insert - Optimize
map_zip_withusing native SQL instead of a Python UDF - Optimize
substringandsubstrusing a native Snowflake function - Prevent temporary object name collisions in multithreaded applications
- Force nullable schemas on table creation via
CREATE TABLE AS SELECT
New features¶
- Support the
truncatewrite option to preserve table identity on overwrite - Implicitly cast scalar types when reading JSON to match user-supplied schema
- Track nullability of grouping columns in
ROLLUP,CUBE, andGROUPING SETS - Propagate nullability through star (
*) expression expansion - Track nullability through implicit and explicit casts
1.24.0 (April 24, 2026)¶
Snowpark Connect for Spark¶
Bug fixes¶
- Disable
filter_classpath_jarsat server startup - Support UDT cast-to-string and reject invalid UDT casts
- Fix DataFrame
describeandsummaryAPIs - Add
SUPPORTED_SCALESguard to skip workloads at unsupported scales
New features¶
- Add Scala 2.13 equivalent JARs to dependency packages
- Add Hive partitioning implementation and limitations reference
- Remove 29 unused JARs from
snowpark_connect_depspackages (~23 MB) - Skip explicit structured cast when server supports implicit cast for Parquet
- Bump Snowpark dependency to 1.50.0
1.23.0 (April 22, 2026)¶
Snowpark Connect for Spark¶
Behavior changes¶
- Set Parquet
useLogicalTypedefault totrue
Bug fixes¶
- Fix
count()to match Spark SQL behavior - Relax protobuf version constraint from
<6.32.0to<6.34.0 - Consistently coerce to unstructured types
- Replace
snowflake.snowpark_connect.includesimport withpyspark.sql - Always use vectorized Parquet scanner; remove
useVectorizedScannerconfiguration option - Fix
regexp_extractdefaults, inline flags, and PCRE handling - Fix SQL operator compatibility gaps
- Fix
IN NULLsemantics to match Spark behavior - Support named persistent external stage read in XML UDTF
- Preserve UDT metadata through temp views and
toDFrenames - Use SQL path for catalog table existence checks
- Allow star expression in the map columns aggregation
New features¶
- Implement sequence support for timestamp/date and interval types
- Add CTE session parameter
- Initialize tracking nullability of columns and complex types
- Track nullability for built-in functions across multiple expression categories
- Track nullable in
Setcommand - Add nullability to
range - Introduce performance regression gate in GitHub Actions
1.22.0 (April 18, 2026)¶
Snowpark Connect for Spark¶
Bug fixes¶
- Fix CTE-qualified column refs in ORDER BY/WHERE/GROUP BY
- Fix
withColumnon join key after using-style join - Fix
fillnaraising immediately for missing subset column - Fix case sensitive read of internal stage
- Reduce window function boundary materialization
- Preserve struct/map/array schema with empty content
- Support ON_ERROR=CONTINUE for INFER_SCHEMA in CSV and JSON reads
- Fix hex compile-time type dispatch
- Avoid redundant temp table creation for
read.parquettosaveAsTable - Preserve
StructType/MapTypein strict mode - Case-insensitive qualifier comparison in column resolution
- Use Snowpark builtin for
CBRTfunction - Fix XML
nullValueand whitespace handling - Use Decimal for
DecimalTypein strict mode - Fix
map_concatbug - Fix
unionByNameto handle quotes in column names and respect caseSensitive config - Remove trailing commas from JSON test resource file
New features¶
- Snowpark Connect Java Client library to support Spark Scala and Java workloads
- Use native implementation for
ARRAY_REPEATandMAP_ENTRIES - Use
MAP_ENTRIESinmap_cast - Reduce number of queries used for VARIANT inference in
read_parquet - Add cross-request sub-plan cache for
map_relation
1.21.1 (April 10, 2026)¶
Snowpark Connect for Spark¶
Bug fixes¶
- Implement JSON encoding validation
- Reduce query size for functions that internally rename columns
- Relax py4j version constraints to allow for broader compatibility
- Isolate artifacts by spark session
New features¶
- Add default application name for session
- Add JSON date/time format conversion
1.21.0 (April 09, 2026)¶
Snowpark Connect for Spark¶
Bug fixes¶
- Handle glob metacharacter escaping in CSV/JSON paths
- Fix JSON non-nullable schema to match Spark behavior
- Add default column matching case for XML
- Fix TEXT
lineSepwith hex encoding for RECORD_DELIMITER - Fix spark read xml external stage
- Empty CSV returns empty DataFrame
- Add default idx to
regexp_extract - Fix CSV non-nullable schema to match Spark behavior
- Fix temp stage naming collision under parallel tests
- Add fast path to regexp functions
- Schema coercion on
storeAssignmentPolicy - CSV backslash delimiter double-escape
- Optimize
posexplode - CSV
lineSepempty validation - Fix bug that xml cannot read external stage file
- Reduce default log verbosity for users
New features¶
- Added support for DML row counts
- Support
overwrite(condition)forDataFrameWriterV2 - Iceberg
mergeSchemaon write — top-level column evolution - Added support for partition overwrites in
DataFrameWriterV2 - Add
app_nameparameter toinit_spark_session
1.20.0 (April 03, 2026)¶
Snowpark Connect for Spark¶
Bug fixes¶
- Fix performance issue
- Fix merge schema for JSON
- Fix
arrays_zipfor complex types - Fix LCAs in implicit aggregations
New features¶
- Cache result of JSON file format
- Resolve known types from
map_unresolved_functionwithout typer - Support hive partitioning for JSON copy into mode
- Add SCOS session registration on server initialization
- Modify warmup query with distinct string for filtering
1.19.0 (March 26, 2026)¶
Snowpark Connect for Spark¶
Bug fixes¶
- Fix accessing struct field from array via getItem
- Fix names for accessing array elements
- Added missing compression for TEXT format
- Reduce query size in
DataFrame.replace, UDTF creation, andread_parquet - Emulate types on create [temp] view
- Fixed casting structured types to
- Fix text write type validation
- Support XML read dir in parallel
- Optimize
convfunction usage - Support both Snowflake and
net.snowflake.spark.snowflakeformat read and write - Emulate types on create table
- Fix accessing nested structs with arrays
- Fix Parquet error message
- Optimize to_number reducing query size
- Fix UDF cache to consider query database change
- Optimize
maskfunction - Pass PATTERN to NVS fallback reader during Parquet schema inference
- Null and structured type coercion
New features¶
- Introduce DIRECTED join hint
- Integrate XML inferSchema
1.18.0 (March 19, 2026)¶
Snowpark Connect for Spark¶
Bug fixes¶
- Added missing JDBC Type mapping
- Support user provided schema in parquet
- Handle invalid UTF-8 characters in JSON gracefully
- Resolve LCA columns only if actually used
- Optimize get_json_object query generation
- Strip semicolon from SQL query
- Make
processInBulk=Truethe default for JSON reads and fixNullTypeschema inference - Fix bug regarding incorrect stage read
- Add non check in udf registration
- Tighten limit for error message
- Allow missing fields in user provided schema
- JSON and CSV compression inference
- Fix for
coalesce(1)creating a single file
New features¶
- Add
execute_jarmethod to launch Java/Scala workloads
Snowpark Submit¶
Bug fixes¶
- Fix error swallowing with
--wait-for-completionflag
1.17.0 (March 13, 2026)¶
Snowpark Connect for Spark¶
Bug fixes¶
- JSON and CSV compression inference.
- Fix for
coalescecreating a single file. - Refactor JSON read to use
COPY INTOfor single-file reads and addVariantTypeschema inference. - Allow JSON loading without explicit schema.
- Fix
multi_linein JSON. - Fix JSON infer schema to avoid scanning whole files.
- Correctly handle casting to timestamp
ltz. - Clamp hash returned value.
- Fix for
repartitionwithpartitionBy. - Fix to use
[connections.spark-connect]section header inconfig.toml. - Convert Java
date/timestampformat tokens to Snowflake equivalents for CSV reads. - Calculate schema for
pivotfunctions. - Fix UDTFs in aliased lateral join.
- Align result for SQL
SETcommand. - Fix return type for
CEILandFLOORfunctions. - Improve query generation in
unbase64v2. - Fix some of option to Snowflake mapping for CSV.
- Fix serialization for
POJO. - Improve CSV header error messages.
- Improve
mapTypedetection logic withtry_castfor Parquet reads.
New features¶
- Support for
reduceGroupsAPI. - Support specifying connection name inside
init_spark_session. - Add config param to use UDF for
unbase64.
1.16.0 (March 12, 2026)¶
Snowpark Connect for Spark¶
Bug fixes¶
- Optimize SQL generation in function
unbase64. - Fix
from_jsonregression - Fix for records that span multiple BZ2 compression block boundaries
- Fix nullability mapping in unresolved attribute
- Initialize
spark-connectsession with any connection, not just one namedspark-connect - Add XML options validation
- Drop CSV ESCAPE option when it matches the quote character to prevent compilation error
- Fix incorrect conversion of named tuples in
productEncoder - Verify
mergeSchemafor CSV and JSON is not supported - Fix Parquet complex type round-trip (write + read)
- Fix schema for
pivot/unpivot - Fix return type for
MODandPMODfunctions - Fix CSV header extraction for files with leading blank lines
- Test timezones correctly and replace string-based date/time serialization with epoch-based
- Update Java version check for Windows
- Flatten nested
withColumncalls - Change logic for
Literal _IntegralTypein add/sub operations - Return
LongTypeforCOUNTfunctions - Read JSON: test compression = bz2/bzip2/none
- Improve performance of
to_varchar/to_char - Make better comparison in I/O testing
- Set
multi_linetoFalseby default for copy JSON
Snowpark Submit¶
Bug fixes¶
- Throw error on unspecified compute pool.
1.15.0 (March 06, 2026)¶
Snowpark Connect for Spark¶
Bug fixes¶
- Remove result scan when calling
df.count() - Make sure infer schema runs on limited rows for reading JSON
- Fix
createDataFramefor interval types - Change logic for
Literal _IntegralTypein multiplication and division operations - Widen and coerce type for
Setoperations - Fix
neo4jmulti label support - Modify JAR metadata so that Grype does not detect Netty vulnerability
- Return correct type for
ANY_VALUEfunction - Return widened type for sequence
- Add support for config
spark.sql.parquet.inferTimestampNTZ.enabled - Batch column rename/cast in
_validate_schema_and_get_writer - JDBC hang when partitioned queries given with fetch size
- Return trimmed exception message when it exceeds the HTTP header limits
- Fix
map_type_to_snowflake_typeforBigDecimal - Fix literal decimal precision and scale
- Improve random string generation
- Make BZ2 compressed JSON loading ignore corrupt records
New features¶
- Use staged files from config in Scala UDFs
- Use permissive
TRY_CASTin JSON reading - Make the number of server threads configurable
Snowpark Submit¶
Bug fixes¶
- Adding back
init_spark_session()to testing - Update
snowpark-submitcommand line output to clarifysnowflake-connection-nameis required.
1.14.0 (February 19, 2026)¶
Snowpark Connect for Spark¶
Bug fixes¶
- Cache table type when running
saveAsTable - Optimize literal input for substring and type casting for
coalesce - Handle decimal overflow in
avg/meanand fix decimal type coercion - Iceberg - Preserve grants on overwrite
- Standardize SQL passthrough mode
- Optimize
from_utc_timestamp/to_utc_timestampfor literal timezone - Handle JSON null values in structured types to match Spark semantics
- Emulate integral types on creating tables from SQL
- Fix edge case with mapping nested rows in Scala UDFs
- Fix how Parquet handles read and write of complex structured datatypes
- Support save ignore argument for parquet files
- Add support for artifact repository
- Fix array nullability in Scala UDxF
- Fix
log1pfor args from (-1, 0) range - Fix
first_valueandlast_valuein aggregate context - Fix reading
DayTimeIntervalTypefor Scala client
New features¶
- Handle timezones correctly in Scala UDFs
- Support Java 11 and 17 without any configuration
Snowpark Submit updates¶
New features¶
- Support
snowpark-submitfor python 3.9 - Enhance
init_spark_sessionto be usable insnowpark-submitworkflow
1.13.0 (February 13, 2026)¶
Snowpark Connect for Spark¶
Bug fixes¶
- Fixed
splitfunction issue - Downgraded snowflake-snowpark-python dependency to version 1.44
- Fixed
Neo4jdialect matching to improve SQL translation - Fixed operation ID returned in execute responses to be consistent
- Fixed
gRPCmetadata handling for TCP channel connections
New features¶
- Added support for
partition_hintinmapPartitionsoperations - Added XML reader support for scenarios with user-defined schemas
1.11.0 (January 28, 2026)¶
Snowpark Connect for Spark¶
Bug fixes¶
- Preserve hidden columns after various DataFrame operators
- Fix issues for scala udf input types (
byte,binary,scala.math.BigDecimal)
Other updates¶
- Add
snowpark-submitUser Defined Args to comment
1.10.0 (January 22, 2026)¶
Snowpark Connect for Spark¶
Bug fixes¶
- Fix config unset error for session configuration.
- Use copy into to load CSV files in parallel.
- Fix writes for DataFrames using outer joins.
- Handle nulls in Scala UDFs.
- Optimize CTE query generation with parameter protection.
- Avoid casting arguments of
DATEDIFF. - Fix appending partitioned files and reading of null partitions.
- Make a 10X performance improvement for conversion between base 10 and 16 using SQL.
New features¶
- Overwrite only modified partitions for parquet files.
Other updates¶
- Updated logic to detect if Snowpark Connect for Spark is running on XP.
- Support writing to a table with variant data type in Snowflake.
- Remove unnecessary info logs.
- Move Java tests out of Scala tests job to a separate job.
- Update the dependency version for gcsfs.
Snowpark Submit¶
None.
1.9.0 (January 14, 2026)¶
Snowpark Connect for Spark¶
Bug fixes¶
- Fix serializing Scala tuples.
- Fix loading huge JSON files.
- Implement small fixes for customer issues.
- Implement fixes for struct comparisons.
- Add handling for 0-column DataFrames.
- Correct upload file path.
- Fix
Upload_files_if_needednot running in parallel. - Improve input type inference when UDF input types are not defined in the proto.
- Fix NA edge cases.
New features¶
- Support reading single JSON BZ2 file.
- Support Scala UDFs in server-side Snowpark Connect for Spark.
- Implement cast between string and
daytime. - Add support for Scala UDFs in
group_map.
Snowpark Submit¶
Bug fixes¶
- Reduce generated workload names.
1.8.0 (January 07, 2026)¶
Snowpark Connect for Spark¶
Bug fixes¶
- Fixed JAVA_HOME handling for Windows.
New features¶
- Support
neo4jdata source via JDBC.
Snowpark Submit¶
None.