Snowpark Connect for Spark release notes for 2026¶

Snowflake uses semantic versioning for Snowpark Connect for Spark updates.

For documentation, see Snowpark Connect for Apache Spark and Orchestrating Snowpark Connect for Spark workloads.

1.25.0 (May 5, 2026)¶

Snowpark Connect for Spark¶

Bug fixes¶

Validate JSON write encoding to surface unsupported charsets eagerly
Fix joinWith outer joins to return NULL struct for unmatched side
Preserve generator output types in LATERAL VIEW alias
Escape glob metacharacters in JSON stage paths
Canonicalize encoding names case-insensitively for JSON reader
Fix BigInteger overflow during JSON schema inference
Normalize compression codec names case-insensitively for JSON reader
Always treat JSON read schemas as nullable to match Spark
Use limit instead of sample for Parquet variant schema discovery
Anchor stage paths to prevent unintended prefix matches in stage reads
Map Spark CSV PERMISSIVE mode to Snowflake ON_ERROR=PERMISSIVE
Allow empty CSV files to be read without raising an error
Treat RESOLVED_REFERENCE_COLUMN_NOT_FOUND as a no-op in drop
Raise COLUMN_ALREADY_EXISTS for conflicts between * and aliased columns
Honor spark.sql.legacy.negativeIndexInArrayInsert in array_insert
Optimize map_zip_with using native SQL instead of a Python UDF
Optimize substring and substr using a native Snowflake function
Prevent temporary object name collisions in multithreaded applications
Force nullable schemas on table creation via CREATE TABLE AS SELECT

New features¶

Support the truncate write option to preserve table identity on overwrite
Implicitly cast scalar types when reading JSON to match user-supplied schema
Track nullability of grouping columns in ROLLUP, CUBE, and GROUPING SETS
Propagate nullability through star (*) expression expansion
Track nullability through implicit and explicit casts

1.24.0 (April 24, 2026)¶

Snowpark Connect for Spark¶

Bug fixes¶

Disable filter_classpath_jars at server startup
Support UDT cast-to-string and reject invalid UDT casts
Fix DataFrame describe and summary APIs
Add SUPPORTED_SCALES guard to skip workloads at unsupported scales

New features¶

Add Scala 2.13 equivalent JARs to dependency packages
Add Hive partitioning implementation and limitations reference
Remove 29 unused JARs from snowpark_connect_deps packages (~23 MB)
Skip explicit structured cast when server supports implicit cast for Parquet
Bump Snowpark dependency to 1.50.0

1.23.0 (April 22, 2026)¶

Snowpark Connect for Spark¶

Behavior changes¶

Set Parquet useLogicalType default to true

Bug fixes¶

Fix count() to match Spark SQL behavior
Relax protobuf version constraint from <6.32.0 to <6.34.0
Consistently coerce to unstructured types
Replace snowflake.snowpark_connect.includes import with pyspark.sql
Always use vectorized Parquet scanner; remove useVectorizedScanner configuration option
Fix regexp_extract defaults, inline flags, and PCRE handling
Fix SQL operator compatibility gaps
Fix IN NULL semantics to match Spark behavior
Support named persistent external stage read in XML UDTF
Preserve UDT metadata through temp views and toDF renames
Use SQL path for catalog table existence checks
Allow star expression in the map columns aggregation

New features¶

Implement sequence support for timestamp/date and interval types
Add CTE session parameter
Initialize tracking nullability of columns and complex types
Track nullability for built-in functions across multiple expression categories
Track nullable in Set command
Add nullability to range
Introduce performance regression gate in GitHub Actions

1.22.0 (April 18, 2026)¶

Snowpark Connect for Spark¶

Bug fixes¶

Fix CTE-qualified column refs in ORDER BY/WHERE/GROUP BY
Fix withColumn on join key after using-style join
Fix fillna raising immediately for missing subset column
Fix case sensitive read of internal stage
Reduce window function boundary materialization
Preserve struct/map/array schema with empty content
Support ON_ERROR=CONTINUE for INFER_SCHEMA in CSV and JSON reads
Fix hex compile-time type dispatch
Avoid redundant temp table creation for read.parquet to saveAsTable
Preserve StructType/MapType in strict mode
Case-insensitive qualifier comparison in column resolution
Use Snowpark builtin for CBRT function
Fix XML nullValue and whitespace handling
Use Decimal for DecimalType in strict mode
Fix map_concat bug
Fix unionByName to handle quotes in column names and respect caseSensitive config
Remove trailing commas from JSON test resource file

New features¶

Snowpark Connect Java Client library to support Spark Scala and Java workloads
Use native implementation for ARRAY_REPEAT and MAP_ENTRIES
Use MAP_ENTRIES in map_cast
Reduce number of queries used for VARIANT inference in read_parquet
Add cross-request sub-plan cache for map_relation

1.21.1 (April 10, 2026)¶

Snowpark Connect for Spark¶

Bug fixes¶

Implement JSON encoding validation
Reduce query size for functions that internally rename columns
Relax py4j version constraints to allow for broader compatibility
Isolate artifacts by spark session

New features¶

Add default application name for session
Add JSON date/time format conversion

1.21.0 (April 09, 2026)¶

Snowpark Connect for Spark¶

Bug fixes¶

Handle glob metacharacter escaping in CSV/JSON paths
Fix JSON non-nullable schema to match Spark behavior
Add default column matching case for XML
Fix TEXT lineSep with hex encoding for RECORD_DELIMITER
Fix spark read xml external stage
Empty CSV returns empty DataFrame
Add default idx to regexp_extract
Fix CSV non-nullable schema to match Spark behavior
Fix temp stage naming collision under parallel tests
Add fast path to regexp functions
Schema coercion on storeAssignmentPolicy
CSV backslash delimiter double-escape
Optimize posexplode
CSV lineSep empty validation
Fix bug that xml cannot read external stage file
Reduce default log verbosity for users

New features¶

Added support for DML row counts
Support overwrite(condition) for DataFrameWriterV2
Iceberg mergeSchema on write — top-level column evolution
Added support for partition overwrites in DataFrameWriterV2
Add app_name parameter to init_spark_session

1.20.0 (April 03, 2026)¶

Snowpark Connect for Spark¶

Bug fixes¶

Fix performance issue
Fix merge schema for JSON
Fix arrays_zip for complex types
Fix LCAs in implicit aggregations

New features¶

Cache result of JSON file format
Resolve known types from map_unresolved_function without typer
Support hive partitioning for JSON copy into mode
Add SCOS session registration on server initialization
Modify warmup query with distinct string for filtering

1.19.0 (March 26, 2026)¶

Snowpark Connect for Spark¶

Bug fixes¶

Fix accessing struct field from array via getItem
Fix names for accessing array elements
Added missing compression for TEXT format
Reduce query size in DataFrame.replace, UDTF creation, and read_parquet
Emulate types on create [temp] view
Fixed casting structured types to
Fix text write type validation
Support XML read dir in parallel
Optimize conv function usage
Support both Snowflake and net.snowflake.spark.snowflake format read and write
Emulate types on create table
Fix accessing nested structs with arrays
Fix Parquet error message
Optimize to_number reducing query size
Fix UDF cache to consider query database change
Optimize mask function
Pass PATTERN to NVS fallback reader during Parquet schema inference
Null and structured type coercion

New features¶

Introduce DIRECTED join hint
Integrate XML inferSchema

1.18.0 (March 19, 2026)¶

Snowpark Connect for Spark¶

Bug fixes¶

Added missing JDBC Type mapping
Support user provided schema in parquet
Handle invalid UTF-8 characters in JSON gracefully
Resolve LCA columns only if actually used
Optimize get_json_object query generation
Strip semicolon from SQL query
Make processInBulk=True the default for JSON reads and fix NullType schema inference
Fix bug regarding incorrect stage read
Add non check in udf registration
Tighten limit for error message
Allow missing fields in user provided schema
JSON and CSV compression inference
Fix for coalesce(1) creating a single file

New features¶

Add execute_jar method to launch Java/Scala workloads

Snowpark Submit¶

Bug fixes¶

Fix error swallowing with --wait-for-completion flag

1.17.0 (March 13, 2026)¶

Snowpark Connect for Spark¶

Bug fixes¶

JSON and CSV compression inference.
Fix for coalesce creating a single file.
Refactor JSON read to use COPY INTO for single-file reads and add VariantType schema inference.
Allow JSON loading without explicit schema.
Fix multi_line in JSON.
Fix JSON infer schema to avoid scanning whole files.
Correctly handle casting to timestamp ltz.
Clamp hash returned value.
Fix for repartition with partitionBy.
Fix to use [connections.spark-connect] section header in config.toml.
Convert Java date/timestamp format tokens to Snowflake equivalents for CSV reads.
Calculate schema for pivot functions.
Fix UDTFs in aliased lateral join.
Align result for SQL SET command.
Fix return type for CEIL and FLOOR functions.
Improve query generation in unbase64 v2.
Fix some of option to Snowflake mapping for CSV.
Fix serialization for POJO.
Improve CSV header error messages.
Improve mapType detection logic with try_cast for Parquet reads.

New features¶

Support for reduceGroups API.
Support specifying connection name inside init_spark_session.
Add config param to use UDF for unbase64.

1.16.0 (March 12, 2026)¶

Snowpark Connect for Spark¶

Bug fixes¶

Optimize SQL generation in function unbase64.
Fix from_json regression
Fix for records that span multiple BZ2 compression block boundaries
Fix nullability mapping in unresolved attribute
Initialize spark-connect session with any connection, not just one named spark-connect
Add XML options validation
Drop CSV ESCAPE option when it matches the quote character to prevent compilation error
Fix incorrect conversion of named tuples in productEncoder
Verify mergeSchema for CSV and JSON is not supported
Fix Parquet complex type round-trip (write + read)
Fix schema for pivot/unpivot
Fix return type for MOD and PMOD functions
Fix CSV header extraction for files with leading blank lines
Test timezones correctly and replace string-based date/time serialization with epoch-based
Update Java version check for Windows
Flatten nested withColumn calls
Change logic for Literal _IntegralType in add/sub operations
Return LongType for COUNT functions
Read JSON: test compression = bz2/bzip2/none
Improve performance of to_varchar/to_char
Make better comparison in I/O testing
Set multi_line to False by default for copy JSON

Snowpark Submit¶

Bug fixes¶

Throw error on unspecified compute pool.

1.15.0 (March 06, 2026)¶

Snowpark Connect for Spark¶

Bug fixes¶

Remove result scan when calling df.count()
Make sure infer schema runs on limited rows for reading JSON
Fix createDataFrame for interval types
Change logic for Literal _IntegralType in multiplication and division operations
Widen and coerce type for Set operations
Fix neo4j multi label support
Modify JAR metadata so that Grype does not detect Netty vulnerability
Return correct type for ANY_VALUE function
Return widened type for sequence
Add support for config spark.sql.parquet.inferTimestampNTZ.enabled
Batch column rename/cast in _validate_schema_and_get_writer
JDBC hang when partitioned queries given with fetch size
Return trimmed exception message when it exceeds the HTTP header limits
Fix map_type_to_snowflake_type for BigDecimal
Fix literal decimal precision and scale
Improve random string generation
Make BZ2 compressed JSON loading ignore corrupt records

New features¶

Use staged files from config in Scala UDFs
Use permissive TRY_CAST in JSON reading
Make the number of server threads configurable

Snowpark Submit¶

Bug fixes¶

Adding back init_spark_session() to testing
Update snowpark-submit command line output to clarify snowflake-connection-name is required.

1.14.0 (February 19, 2026)¶

Snowpark Connect for Spark¶

Bug fixes¶

Cache table type when running saveAsTable
Optimize literal input for substring and type casting for coalesce
Handle decimal overflow in avg/mean and fix decimal type coercion
Iceberg - Preserve grants on overwrite
Standardize SQL passthrough mode
Optimize from_utc_timestamp/to_utc_timestamp for literal timezone
Handle JSON null values in structured types to match Spark semantics
Emulate integral types on creating tables from SQL
Fix edge case with mapping nested rows in Scala UDFs
Fix how Parquet handles read and write of complex structured datatypes
Support save ignore argument for parquet files
Add support for artifact repository
Fix array nullability in Scala UDxF
Fix log1p for args from (-1, 0) range
Fix first_value and last_value in aggregate context
Fix reading DayTimeIntervalType for Scala client

New features¶

Handle timezones correctly in Scala UDFs
Support Java 11 and 17 without any configuration

Snowpark Submit updates¶

New features¶

Support snowpark-submit for python 3.9
Enhance init_spark_session to be usable in snowpark-submit workflow

1.13.0 (February 13, 2026)¶

Snowpark Connect for Spark¶

Bug fixes¶

Fixed split function issue
Downgraded snowflake-snowpark-python dependency to version 1.44
Fixed Neo4j dialect matching to improve SQL translation
Fixed operation ID returned in execute responses to be consistent
Fixed gRPC metadata handling for TCP channel connections

New features¶

Added support for partition_hint in mapPartitions operations
Added XML reader support for scenarios with user-defined schemas

1.11.0 (January 28, 2026)¶

Snowpark Connect for Spark¶

Bug fixes¶

Preserve hidden columns after various DataFrame operators
Fix issues for scala udf input types (byte, binary, scala.math.BigDecimal)

Other updates¶

Add snowpark-submit User Defined Args to comment

1.10.0 (January 22, 2026)¶

Snowpark Connect for Spark¶

Bug fixes¶

Fix config unset error for session configuration.
Use copy into to load CSV files in parallel.
Fix writes for DataFrames using outer joins.
Handle nulls in Scala UDFs.
Optimize CTE query generation with parameter protection.
Avoid casting arguments of DATEDIFF.
Fix appending partitioned files and reading of null partitions.
Make a 10X performance improvement for conversion between base 10 and 16 using SQL.

New features¶

Overwrite only modified partitions for parquet files.

Other updates¶

Updated logic to detect if Snowpark Connect for Spark is running on XP.
Support writing to a table with variant data type in Snowflake.
Remove unnecessary info logs.
Move Java tests out of Scala tests job to a separate job.
Update the dependency version for gcsfs.

Snowpark Submit¶

None.

1.9.0 (January 14, 2026)¶

Snowpark Connect for Spark¶

Bug fixes¶

Fix serializing Scala tuples.
Fix loading huge JSON files.
Implement small fixes for customer issues.
Implement fixes for struct comparisons.
Add handling for 0-column DataFrames.
Correct upload file path.
Fix Upload_files_if_needed not running in parallel.
Improve input type inference when UDF input types are not defined in the proto.
Fix NA edge cases.