Snowpark Library for Python release notes for 2024¶
This article contains the release notes for the Snowpark Library for Python, including the following when applicable:
Behavior changes
New features
Customer-facing bug fixes
Snowflake uses semantic versioning for Snowpark Library for Python updates.
See Snowpark Developer Guide for Python for documentation.
- warning:
As Python 3.8 has reached its End of Life (https://devguide.python.org/versions/), deprecation warnings will be triggered when using snowpark-python with Python 3.8. For more details, see Snowflake Python Runtime Support. Snowpark Python 1.24.0 will be the last client and server version to support Python 3.8, in accordance with Anaconda’s policy (https://www.anaconda.com/blog/python-3-8-reaches-end-of-life). Upgrade your existing Python 3.8 objects to Python 3.9 or greater.
Version 1.26.0 (2024-12-05)¶
New features¶
Added support for property
version
and class methodget_active_session
forSession
class.Added new methods and variables to enhance data type handling and JSON serialization/deserialization:
To
DataType
, its derived classes, andStructField
:type_name
: Returns the type name of the data.simple_string
: Provides a simple string representation of the data.json_value
: Returns the data as a JSON-compatible value.json
: Converts the data to a JSON string.
To
ArrayType
,MapType
,StructField
,PandasSeriesType
,PandasDataFrameType
andStructType
:from_json
: Enables these types to be created from JSON data.
To
MapType
:keyType
: keys of the mapvalueType
: values of the map
Added support for method
appName
inSessionBuilder
.Added support for
include_nulls
argument inDataFrame.unpivot
.Added support for following functions in
functions.py
:size
to get size of array, object, or map columns.collect_list
an alias ofarray_agg
.concat_ws_ignore_nulls
to concatenate strings with a separator, ignoring null values.substring
makeslen
argument optional.
Added parameter
ast_enabled
to session for internal usage (default:False
).
Improvements¶
Added support for specifying the following to
DataFrame.create_or_replace_dynamic_table
:iceberg_config
A dictionary that can hold the following iceberg configuration options:external_volume
catalog
base_location
catalog_sync
storage_serialization_policy
Added support for nested data types to
DataFrame.print_schema
Added support for
level
parameter toDataFrame.print_schema
Improved flexibility of
DataFrameReader
andDataFrameWriter
API by adding support for the following:Added
format
method toDataFrameReader
andDataFrameWriter
to specify file format when loading or unloading results.Added
load
method toDataFrameReader
to work in conjunction withformat
.Added
save
method toDataFrameWriter
to work in conjunction withformat
.Added support to read keyword arguments to
options
method forDataFrameReader
andDataFrameWriter
.
Relaxed the cloudpickle dependency for Python 3.11 to simplify build requirements. However, for Python 3.11,
cloudpickle==2.2.1
remains the only supported version.
Bug fixes¶
Removed warnings that dynamic pivot features were in private preview, because dynamic pivot is now generally available.
Fixed a bug in
session.read.options
whereFalse
Boolean values were incorrectly parsed asTrue
in the generated file format.
Dependency updates¶
Added a runtime dependency on
python-dateutil
.
Snowpark pandas API updates¶
New features¶
Added partial support for
Series.map
whenarg
is a pandasSeries
or acollections.abc.Mapping
. No support for instances ofdict
that implement__missing__
but are not instances ofcollections.defaultdict
.Added support for
DataFrame.align
andSeries.align
foraxis=1
andaxis=None
.Added support for
pd.json_normalize
.Added support for
GroupBy.pct_change
withaxis=0
,freq=None
, andlimit=None
.Added support for
DataFrameGroupBy.__iter__
andSeriesGroupBy.__iter__
.Added support for
np.sqrt
,np.trunc
,np.floor
, numpy trig functions,np.exp
,np.abs
,np.positive
andnp.negative
.Added partial support for the dataframe interchange protocol method
DataFrame.__dataframe__()
.
Bug fixes¶
Fixed a bug in
df.loc
where setting a single column from a series results in unexpectedNone
values.
Improvements¶
Use UNPIVOT INCLUDE NULLS for unpivot operations in pandas instead of sentinel values.
Improved documentation for
pd.read_excel
.
Version 1.25.0 (2024-11-13)¶
New features¶
Added the following new functions in
snowflake.snowpark.dataframe
:map
Improvements¶
When target stage is not set in profiler, a default stage from
Session.get_session_stage
is used instead of raisingSnowparkSQLException
.Allowed lower case or mixed case input when calling
Session.stored_procedure_profiler.set_active_profiler
.Added distributed tracing using open telemetry APIs for action function in
DataFrame
:cache_result
Removed opentelemetry warning from logging.
Bug fixes¶
Fixed the pre-action and post-action query propagation when
In
expressions were used in selects.Fixed a bug that raised error
AttributeError
while callingSession.stored_procedure_profiler.get_output
whenSession.stored_procedure_profiler
is disabled.
Dependency updates¶
Added a dependency on
protobuf>=5.28
andtzlocal
at runtime.Added a dependency on
protoc-wheel-0
for the development profile.Require
snowflake-connector-python>=3.12.0, <4.0.0
(was>=3.10.0
).
Snowpark pandas API updates¶
New features¶
Added support for
Index.to_numpy
.Added support for
DataFrame.align
andSeries.align
foraxis=0
.Added support for
snowflake.snowpark.functions.window
Added support for
pd.read_pickle
(Uses native pandas for processing).Added support for
pd.read_html
(Uses native pandas for processing).Added support for
pd.read_xml
(Uses native pandas for processing).Added support for aggregation functions
"size"
andlen
inGroupBy.aggregate
,DataFrame.aggregate
, andSeries.aggregate
.Added support for list values in
Series.str.len
.
Bug fixes¶
Fixed a bug where aggregating a single-column dataframe with a single callable function (e.g.
pd.DataFrame([0]).agg(np.mean)
) would fail to transpose the result.Fixed bugs where
DataFrame.dropna()
would:Treat an empty
subset
(e.g.[]
) as if it specified all columns instead of no columns.Raise a
TypeError
for a scalarsubset
instead of filtering on just that column.Raise a
ValueError
for asubset
of typepandas.Index
instead of filtering on the columns in the index.
Creation of scoped read-only tables to mitigate
TableNotFoundError
when using dynamic pivot in a notebook environment.Fixed a bug when concat dataframe or series objects are coming from the same dataframe when axis = 1.
Improvements¶
Improve
np.where
with scalar x value by eliminating unnecessary join and temp table creation.Improve
get_dummies
performance by flattening the pivot with join.
Snowpark local testing updates¶
New features¶
Added support for patching functions that are unavailable in the
snowflake.snowpark.functions
module.Added support for
snowflake.snowpark.functions.any_value
Bug fixes¶
Fixed a bug where
Table.update
could not handleVariantType
,MapType
, andArrayType
data types.Fixed a bug where column aliases were incorrectly resolved in
DataFrame.join
, causing errors when selecting columns from a joined DataFrame.Fixed a bug where
Table.update
andTable.merge
could fail if the target table’s index was not the defaultRangeIndex
.
Version 1.24.0 (2024-10-28)¶
New features¶
Updated
Session
class to be thread-safe. This allows concurrent DataFrame transformations, DataFrame actions, UDF and stored procedure registration, and concurrent file uploads when using the sameSession
object.The feature is disabled by default and can be enabled by setting
FEATURE_THREAD_SAFE_PYTHON_SESSION
toTrue
for account.Updating session configurations, like changing database or schema, when multiple threads are using the session may lead to unexpected behavior.
When enabled, some internally created temporary table names returned from
DataFrame.queries
API are not deterministic, and may be different when DataFrame actions are executed. This does not affect explicit user-created temporary tables.
Added support for ‘Service’ domain to
session.lineage.trace
API.Added support for the following methods in
DataFrameWriter
to support daisy-chaining:option
options
partition_by
Added support for
snowflake_cortex_summarize
.
Improvements¶
Improved the following new capability for function
snowflake.snowpark.functions.array_remove
so that it is now possible to use in python.Disables SQL simplification when sort is performed after limit.
Previously,
df.sort().limit()
anddf.limit().sort()
generated the same query with sort in front of limit. Now,df.limit().sort()
generates a query that readsdf.limit().sort()
.Improve performance of generated queries for
df.limit().sort()
because limit stops table scanning as soon as the number of records is satisfied.
Bug fixes¶
Fixed a bug where the automatic cleanup of temporary tables could interfere with the results of async query execution.
Fixed a bug in
DataFrame.analytics.time_series_agg
function to handle multiple data points in same sliding interval.Fixed a bug that created inconsistent casing in field names of structured objects in iceberg schemas.
Deprecations¶
As Python 3.8 has reached its End of Life (https://devguide.python.org/versions/), deprecation warnings will be triggered when using snowpark-python with Python 3.8. For more details, see Snowflake Python Runtime Support.
Snowpark 1.24.0 is the last client and server version to support Python 3.8, in accordance with Anaconda’s policy (https://www.anaconda.com/blog/python-3-8-reaches-end-of-life). Upgrade your existing Python 3.8 objects to Python 3.9 or greater.
Snowpark pandas API updates¶
New features¶
Added support for
np.subtract
,np.multiply
,np.divide
, andnp.true_divide
.Added support for tracking usages of
__array_ufunc__
.Added numpy compatibility support for
np.float_power
,np.mod
,np.remainder
,np.greater
,np.greater_equal
,np.less
,np.less_equal
,np.not_equal
, andnp.equal
.Added numpy compatibility support for
np.log
,np.log2
, andnp.log10
Added support for
DataFrameGroupBy.bfill
,SeriesGroupBy.bfill
,DataFrameGroupBy.ffill
, andSeriesGroupBy.ffill
.Added support for
on
parameter withResampler
.Added support for timedelta inputs in
value_counts()
.Added support for applying Snowpark Python function
snowflake_cortex_summarize
.Added support for
DataFrame.attrs
andSeries.attrs
.Added support for
DataFrame.style
.Added numpy compatibility support for
np.full_like
Improvements¶
Improved generated SQL query for
head
andiloc
when the row key is a slice.Improved error message when passing an unknown timezone to
tz_convert
andtz_localize
inSeries
,DataFrame
,Series.dt
, andDatetimeIndex
.Improved documentation for
tz_convert
andtz_localize
inSeries
,DataFrame
,Series.dt
, andDatetimeIndex
to specify the supported timezone formats.Added additional kwargs support for
df.apply
andseries.apply
( as well asmap
andapplymap
) when using snowpark functions. This allows for some position independent compatibility between apply and functions where the first argument is not a pandas object.Improved generated SQL query for
iloc
andiat
when the row key is a scalar.Removed all joins in
iterrows
.Improved documentation for
Series.map
to reflect the unsupported features.Added support for
np.may_share_memory
which is used internally by many scikit-learn functions. This method will always return false when called with a Snowpark pandas object.
Bug Fixes¶
Fixed a bug where
DataFrame
andSeries
pct_change()
would raiseTypeError
when input contained timedelta columns.Fixed a bug where
replace()
would sometimes propagateTimedelta
types incorrectly throughreplace()
. Instead raiseNotImplementedError
forreplace()
onTimedelta
.Fixed a bug where
DataFrame
andSeries
round()
would raiseAssertionError
forTimedelta
columns. Instead raiseNotImplementedError
forround()
onTimedelta
.Fixed a bug where
reindex
fails when the new index is a Series with non-overlapping types from the original index.Fixed a bug where calling
__getitem__
on a DataFrameGroupBy object always returned a DataFrameGroupBy object ifas_index=False
.Fixed a bug where inserting timedelta values into an existing column would silently convert the values to integers instead of raising
NotImplementedError
.Fixed a bug where
DataFrame.shift()
on axis=0 and axis=1 would fail to propagate timedelta types.DataFrame.abs()
,DataFrame.__neg__()
,DataFrame.stack()
, andDataFrame.unstack()
now raiseNotImplementedError
for timedelta inputs instead of failing to propagate timedelta types.
Snowpark local testing updates¶
Bug fixes¶
Fixed a bug where
DataFrame.alias
raisesKeyError
for input column name.Fixed a bug where
to_csv
on Snowflake stage fails when data contains empty strings.
Version 1.23.0 (2024-10-09)¶
New features¶
Added the following new functions in
snowflake.snowpark.functions
:make_interval
Added support for using Snowflake Interval constants with
Window.range_between()
when the order by column is TIMESTAMP or DATE type.Added support for file writes. This feature is currently in private preview.
Added
thread_id
toQueryRecord
to track the thread id submitting the query history.Added support for
Session.stored_procedure_profiler
.
Bug fixes¶
Fixed a bug where registering a stored procedure or UDxF with type hints would give a warning
NoneType
has nolen()
when trying to read default values from function.
Snowpark pandas API updates¶
New features¶
Added support for
TimedeltaIndex.mean
method.Added support for some cases of aggregating
Timedelta
columns onaxis=0
withagg
oraggregate
.Added support for
by
,left_by
,right_by
,left_index
, andright_index
forpd.merge_asof
.Added support for passing parameter
include_describe
toSession.query_history
.Added support for
DatetimeIndex.mean
andDatetimeIndex.std
methods.Added support for
Resampler.asfreq
,Resampler.indices
,Resampler.nunique
, andResampler.quantile
.Added support for
resample
frequencyW
,ME
,YE
withclosed = "left"
.Added support for
DataFrame.rolling.corr
andSeries.rolling.corr
forpairwise = False
and intwindow
.Added support for string time-based
window
andmin_periods = None
forRolling
.Added support for
DataFrameGroupBy.fillna
andSeriesGroupBy.fillna
.Added support for constructing
Series
andDataFrame
objects with the lazyIndex
object asdata
,index
, andcolumns
arguments.Added support for constructing
Series
andDataFrame
objects withindex
andcolumn
values not present inDataFrame
/Series
data
.Added support for
pd.read_sas
(Uses native pandas for processing).Added support for applying
rolling().count()
andexpanding().count()
toTimedelta
series and columns.Added support for
tz
in bothpd.date_range
andpd.bdate_range
.Added support for
Series.items
.Added support for
errors="ignore"
inpd.to_datetime
.Added support for
DataFrame.tz_localize
andSeries.tz_localize
.Added support for
DataFrame.tz_convert
andSeries.tz_convert
.Added support for applying Snowpark Python functions (e.g.,
sin
) inSeries.map
,Series.apply
,DataFrame.apply
andDataFrame.applymap
.
Improvements¶
Improved
to_pandas
to persist the original timezone offset for TIMESTAMP_TZ type.Improved
dtype
results for TIMESTAMP_TZ type to show correct timezone offset.Improved
dtype
results for TIMESTAMP_LTZ type to show correct timezone.Improved error message when passing non-bool value to
numeric_only
for groupby aggregations.Removed unnecessary warning about sort algorithm in
sort_values
.Use SCOPED object for internal create temp tables. The SCOPED objects will be stored sproc scoped if created within stored sproc, otherwise will be session scoped, and the object will be automatically cleaned at the end of the scope.
Improved warning messages for operations that lead to materialization with inadvertent slowness.
Removed unnecessary warning message about
convert_dtype
inSeries.apply
.
Bug fixes¶
Fixed a bug where an
Index
object created from aSeries
/DataFrame
incorrectly updates theSeries
/DataFrame
’s index name after an inplace update has been applied to the originalSeries
/DataFrame
.Suppressed an unhelpful
SettingWithCopyWarning
that sometimes appeared when printingTimedelta
columns.Fixed
inplace
argument forSeries
objects derived from otherSeries
objects.Fixed a bug where
Series.sort_values
failed if series name overlapped with index column name.Fixed a bug where transposing a dataframe would map
Timedelta
index levels to integer column levels.Fixed a bug where
Resampler
methods on timedelta columns would produce integer results.Fixed a bug where
pd.to_numeric()
would leaveTimedelta
inputs asTimedelta
instead of converting them to integers.Fixed
loc
set when setting a single row, or multiple rows, of a DataFrame with a Series value.
Version 1.22.1 (2024-09-11)¶
This is a re-release of 1.22.0. Please refer to the 1.22.0 release notes for detailed release content.
Version 1.22.0 (2024-09-10)¶
New features¶
Added the following new functions in
snowflake.snowpark.functions
:array_remove
ln
Improvements¶
Improved documentation for
Session.write_pandas
by making theuse_logical_type
option more explicit.Added support for specifying the following to
DataFrameWriter.save_as_table
:enable_schema_evolution
data_retention_time
max_data_extension_time
change_tracking
copy_grants
iceberg_config
- A dictionary that can hold the following iceberg configuration options:external_volume
catalog
base_location
catalog_sync
storage_serialization_policy
Added support for specifying the following to
DataFrameWriter.copy_into_table
:iceberg_config
- A dictionary that can hold the following iceberg configuration options:external_volume
catalog
base_location
catalog_sync
storage_serialization_policy
Added support for specifying the following parameters to
DataFrame.create_or_replace_dynamic_table
:mode
refresh_mode
initialize
clustering_keys
is_transient
data_retention_time
max_data_extension_time
Bug fixes¶
Fixed a bug in
session.read.csv
that caused an error when settingPARSE_HEADER = True
in an externally defined file format.Fixed a bug in query generation from set operations that allowed generation of duplicate queries when children have common subqueries.
Fixed a bug in
session.get_session_stage
that referenced a non-existing stage after switching database or schema.Fixed a bug where calling
DataFrame.to_snowpark_pandas
without explicitly initializing the Snowpark pandas plugin caused an error.Fixed a bug where using the
explode
function in dynamic table creation caused a SQL compilation error due to improper boolean type casting on theouter
parameter.
Snowpark local testing updates¶
New features¶
Added support for type coercion when passing columns as input to UDF calls.
Added support for
Index.identical
.
Bug fixes¶
Fixed a bug where the truncate mode in
DataFrameWriter.save_as_table
incorrectly handled DataFrames containing only a subset of columns from the existing table.Fixed a bug where function
to_timestamp
does not set the default timezone of the column datatype.
Snowpark pandas API updates¶
New features¶
Added limited support for the
Timedelta
type, including the following features. Snowpark pandas will raiseNotImplementedError
for unsupportedTimedelta
use cases.support for tracking the
Timedelta
type throughcopy
,cache_result
,shift
,sort_index
,assign
,bfill
,ffill
,fillna
,compare
,diff
,drop
,dropna
,duplicated
,empty
,equals
,insert
,isin
,isna
,items
,iterrows
,join
,len
,mask
,melt
,merge
,nlargest
,nsmallest
,to_pandas
.support for converting non-timedelta to timedelta via
astype
.NotImplementedError
will be raised for the rest of methods that do not supportTimedelta
.support for subtracting two timestamps to get a
Timedelta
.support indexing with
Timedelta
data columns.support for adding or subtracting timestamps and
Timedelta
.support for binary arithmetic between two
Timedelta
values.support for binary arithmetic and comparisons between
Timedelta
values and numeric values.support for lazy
TimedeltaIndex
.support for
pd.to_timedelta
.support for
GroupBy
aggregationsmin
,max
,mean
,idxmax
,idxmin
,std
,sum
,median
,count
,any
,all
,size
,nunique
,head
,tail
,aggregate
.support for
GroupBy
filtrationsfirst
andlast
.support for
TimedeltaIndex
attributes:days
,seconds
,microseconds
andnanoseconds
.support for
diff
with timestamp columns onaxis=0
andaxis=1
.support for
TimedeltaIndex
methods:ceil
,floor
andround
.support for
TimedeltaIndex.total_seconds
method.
Added support for index’s arithmetic and comparison operators.
Added support for
Series.dt.round
.Added documentation pages for
DatetimeIndex
.Added support for
Index.name
,Index.names
,Index.rename
, andIndex.set_names
.Added support for
Index.__repr__
.Added support for
DatetimeIndex.month_name
andDatetimeIndex.day_name
.Added support for
Series.dt.weekday
,Series.dt.time
, andDatetimeIndex.time
.Added support for
Index.min
andIndex.max
.Added support for
pd.merge_asof
.Added support for
Series.dt.normalize
andDatetimeIndex.normalize
.Added support for
Index.is_boolean
,Index.is_integer
,Index.is_floating
,Index.is_numeric
, andIndex.is_object
.Added support for
DatetimeIndex.round
,DatetimeIndex.floor
andDatetimeIndex.ceil
.Added support for
Series.dt.days_in_month
andSeries.dt.daysinmonth
.Added support for
DataFrameGroupBy.value_counts
andSeriesGroupBy.value_counts
.Added support for
Series.is_monotonic_increasing
andSeries.is_monotonic_decreasing
.Added support for
Index.is_monotonic_increasing
andIndex.is_monotonic_decreasing
.Added support for
pd.crosstab
.Added support for
pd.bdate_range
and included business frequency support (B, BME, BMS, BQE, BQS, BYE, BYS) for bothpd.date_range
andpd.bdate_range
.Added support for lazy
Index
objects aslabels
inDataFrame.reindex
andSeries.reindex
.Added support for
Series.dt.days
,Series.dt.seconds
,Series.dt.microseconds
, andSeries.dt.nanoseconds
.Added support for creating a
DatetimeIndex
from anIndex
of numeric or string type.Added support for string indexing with
Timedelta
objects.Added support for
Series.dt.total_seconds
method.
Improvements¶
Improved concat and join performance when operations are performed on a series coming from the same DataFrame by avoiding unnecessary joins.
Refactored
quoted_identifier_to_snowflake_type
to avoid making metadata queries if the types have been cached locally.Improved
pd.to_datetime
to handle all local input cases.Create a lazy index from another lazy index without pulling data to client.
Raised
NotImplementedError
for Index bitwise operators.Display a more clear error message when
Index.names
is set to a non-list-like object.Raise a warning whenever
MultiIndex
values are pulled in locally.Improved warning message for
pd.read_snowflake
to include the creation reason when temp table creation is triggered.Improved performance for
DataFrame.set_index
, or settingDataFrame.index
orSeries.index
by avoiding checks that require eager evaluation. As a consequence, when the new index that does not match the currentSeries
orDataFrame
object length, aValueError
is no longer raised. Instead, when theSeries
orDataFrame
object is longer than the provided index, the new index of theSeries
orDataFrame
is filled withNaN
values for the “extra” elements. Otherwise, the extra values in the provided index are ignored.
Bug fixes¶
Stopped ignoring nanoseconds in
pd.Timedelta
scalars.Fixed
AssertionError
in tree of binary operations.Fixed bug in
Series.dt.isocalendar
using a named SeriesFixed
inplace
argument for Series objects derived from DataFrame columns.Fixed a bug where
Series.reindex
andDataFrame.reindex
did not update the result index’s name correctly.Fixed a bug where
Series.take
did not give an error whenaxis=1
was specified.
Version 1.21.1 (2024-09-05)¶
Bug fixes¶
Fixed a bug where using
to_pandas_batches
with async jobs caused an error due to improper handling of waiting for asynchronous query completion.
Version 1.21.0 (2024-08-19)¶
New features¶
Added support for
snowflake.snowpark.testing.assert_dataframe_equal
, which is a utility function to check the equality of two Snowpark DataFrames.
Improvements¶
Added support for server-side string size limitations.
Added support for creating and invoking stored procedures, UDFs and UDTFs with optional arguments.
Added support for column lineage in the
DataFrame.lineage.trace
API.Added support for passing
INFER_SCHEMA
options toDataFrameReader
viaINFER_SCHEMA_OPTIONS
.Added support for passing
parameters
parameter toColumn.rlike
andColumn.regexp
.Added support for automatically cleaning up temporary tables created by
df.cache_result()
in the current session when the DataFrame is no longer referenced (i.e., gets garbage collected). It is still an experimental feature and not enabled by default. It can be enabled by settingsession.auto_clean_up_temp_table_enabled
toTrue
.Added support for string literals to the
fmt
parameter ofsnowflake.snowpark.functions.to_date
.
Bug fixes¶
Fixed a bug where the SQL generated for selecting
*
column has an incorrect subquery.Fixed a bug in
DataFrame.to_pandas_batches
where the iterator could throw an error if a certain transformation is made to the pandas DataFrame due to the wrong isolation level.Fixed a bug in
DataFrame.lineage.trace
to split the quoted feature view’s name and version correctly.Fixed a bug in
Column.isin
that caused invalid SQL generation when passed an empty list.Fixed a bug that fails to raise
NotImplementedError
while setting a cell with a list-like item.
Snowpark local testing updates¶
New features¶
Added support for the following APIs:
snowflake.snowpark.functions
rank
dense_rank
percent_rank
cume_dist
ntile
datediff
array_agg
snowflake.snowpark.column.Column.within_group
Added support for parsing flags in Regex statements for mocked plans. This maintains parity with the
rlike
andregexp
changes above.
Bug fixes¶
Fixed a bug where the window functions
LEAD
andLAG
do not handle the optionignore_nulls
properly.Fixed a bug where values were not populated into the result DataFrame during the insertion of a table merge operation.
Improvements¶
Fix pandas
FutureWarning
about integer indexing.
Snowpark pandas API updates¶
New features¶
Added support for
DataFrame.backfill
,DataFrame.bfill
,Series.backfill
, andSeries.bfill
.Added support for
DataFrame.compare
andSeries.compare
with default parameters.Added support for
Series.dt.microsecond
andSeries.dt.nanosecond
.Added support for
Index.is_unique
andIndex.has_duplicates
.Added support for
Index.equals
.Added support for
Index.value_counts
.Added support for
Series.dt.day_name
andSeries.dt.month_name
.Added support for indexing on Index, e.g.,
df.index[:10]
.Added support for
DataFrame.unstack
andSeries.unstack
.Added support for
DataFrame.asfreq
andSeries.asfreq
.Added support for
Series.dt.is_month_start
andSeries.dt.is_month_end
.Added support for
Index.all
andIndex.any
.Added support for
Series.dt.is_year_start
andSeries.dt.is_year_end
.Added support for
Series.dt.is_quarter_start
andSeries.dt.is_quarter_end
.Added support for lazy
DatetimeIndex
.Added support for
Series.argmax
andSeries.argmin
.Added support for
Series.dt.is_leap_year
.Added support for
DataFrame.items
.Added support for
Series.dt.floor
andSeries.dt.ceil
.Added support for
Index.reindex
.Added support for
DatetimeIndex
properties:year
,month
,day
,hour
,minute
,second
,microsecond
,nanosecond
,date
,dayofyear
,day_of_year
,dayofweek
,day_of_week
,weekday
,quarter
,is_month_start
,is_month_end
,is_quarter_start
,is_quarter_end
,is_year_start
,is_year_end
andis_leap_year
.Added support for
Resampler.fillna
andResampler.bfill
.Added limited support for the
Timedelta
type, including creatingTimedelta
columns andto_pandas
.Added support for
Index.argmax
andIndex.argmin
.
Improvements¶
Removed the public preview warning message when importing Snowpark pandas.
Removed unnecessary count query from the
SnowflakeQueryCompiler.is_series_like
method.Dataframe.columns
now returns a native pandas Index object instead of a Snowpark Index object.Refactored and introduced
query_compiler
argument inIndex
constructor to createIndex
from query compiler.pd.to_datetime
now returns aDatetimeIndex
object instead of aSeries
object.pd.date_range
now returns aDatetimeIndex
object instead of aSeries
object.
Bug fixes¶
Made passing an unsupported aggregation function to
pivot_table
raiseNotImplementedError
instead ofKeyError
.Removed axis labels and callable names from error messages and telemetry about unsupported aggregations.
Fixed
AssertionError
inSeries.drop_duplicates
andDataFrame.drop_duplicates
when called aftersort_values
.Fixed a bug in
Index.to_frame
where the result frame’s column name may be wrong where name is unspecified.Fixed a bug where some Index docstrings are ignored.
Fixed a bug in
Series.reset_index(drop=True)
where the result name may be wrong.Fixed a bug in
Groupby.first/last
ordering by the correct columns in the underlying window expression.
Version 1.20.0 (2024-07-17)¶
Version 1.20.0 of the Snowpark Library for Python introduces some new features.
New features¶
Added distributed tracing using open telemetry APIs for table stored procedure functions in
DataFrame
:_execute_and_get_query_id
Added support for the
arrays_zip
function.Improved performance for binary column expressions and
df._in
by avoiding unnecessary casts for numeric values. You can enable this optimization by settingsession.eliminate_numeric_sql_value_cast_enabled = True
.Improved error messages for
write_pandas
when the target table does not exist andauto_create_table=False
.Added open telemetry tracing on UDxF functions in Snowpark.
Added open telemetry tracing on stored procedure registration in Snowpark.
Added a new optional parameter called
format_json
to theSession.SessionBuilder.app_name
function that sets the app name in theSession.query_tag
in JSON format. By default, this parameter is set toFalse
.
Bug fixes¶
Fixed a bug where the SQL generated for
lag(x, 0)
was incorrect and failed with the error messageargument 1 to function LAG needs to be constant, found 'SYSTEM$NULL_TO_FIXED(null)'
.
Snowpark local testing updates¶
New features¶
Added support for the following APIs:
snowflake.snowpark.functions
random
Added new parameters to the
patch
function when registering a mocked function:distinct
allows an alternate function to be specified for when a SQL function should be distinct.pass_column_index
passes a named parameter,column_index
, to the mocked function that contains thepandas.Index
for the input data.pass_row_index
passes a named parameter,row_index
, to the mocked function that is the 0-indexed row number on which the function is currently operating.pass_input_data
passes a named parameter,input_data
, to the mocked function that contains the entire input dataframe for the current expression.Added support for the
column_order
parameter in theDataFrameWriter.save_as_table
method.
Bug fixes¶
Fixed a bug that caused
DecimalType
columns to be incorrectly truncated to integer precision when used inBinaryExpressions
.
Snowpark pandas API Updates¶
New features¶
Added new API support for the following:
DataFrames
DataFrame.nlargest
andDataFrame.nsmallest
DataFrame.assign
DataFrame.stack
DataFrame.pivot
DataFrame.to_csv
DataFrame.corr
DataFrame.corr
DataFrame.equals
DataFrame.reindex
DataFrame.at
andDataFrame.iat
Series
Series.nlargest
andSeries.nsmallest
Series.at
andSeries.iat
Series.dt.isocalendar
Series.equals
Series.reindex
Series.to_csv
Series.case_when
except when condition or replacement is callableseries.plot()
with data materialized the data to the local client
GroupBy
DataFrameGroupBy.all
andDataFrameGroupBy.any
DataFrameGroupBy
andSeriesGroupBy
aggregationsfirst
andlast
DataFrameGroupBy.get_group
SeriesGroupBy.all
andSeriesGroupBy.any
General
pd.pivot
read_excel
(Uses local pandas for processing)df.plot()
with data materialized the data to the local client
Extended existing APIs as follows:
Added support for
replace
andfrac > 1
inDataFrame.sample
andSeries.sample
.Added partial support for
Series.str.translate
where the values in thetable
are single-codepoint strings.Added support for
limit
parameter whenmethod
parameter is used infillna
.
Added documentation pages for
Index
and its APIs.
Bug fixes¶
Fixed an issue when using np.where and df.where when the scalar
other
is the literal 0.Fixed a bug regarding precision loss when converting to Snowpark pandas
DataFrame
orSeries
withdtype=np.uint64
.Fixed a bug where
values
is set toindex
whenindex
andcolumns
contain all columns in DataFrame duringpivot_table
.
Improvements¶
Added support for
Index.copy()
.Added support for Index APIs:
dtype
,values
,item()
,tolist()
,to_series()
andto_frame()
.Expand support for DataFrames with no rows in
pd.pivot_table
andDataFrame.pivot_table
.Added support for
inplace
parameter inDataFrame.sort_index
andSeries.sort_index
.
Version 1.19.0 (2024-06-25)¶
Version 1.19.0 of the Snowpark Library for Python introduces some new features.
New features¶
Added support for the
to_boolean
function.Added documentation pages for
Index
and its APIs.
Bug fixes¶
Fixed a bug where Python stored procedures with tables return type fails when run in a task.
Fixed a bug where
df.dropna
fails due toRecursionError: maximum recursion depth exceeded
when the DataFrame has more than 500 columns.Fixed a bug where
AsyncJob.result("no_result")
doesn’t wait for the query to finish execution.
Local testing updates¶
New features¶
Added support for the
strict
parameter when registering UDFs and Stored Procedures.
Bug fixes¶
Fixed a bug in
convert_timezone
that made setting thesource_timezone
parameter return an error.Fixed a bug where creating a DataFrame with empty data of type
DateType
raisesAttributeError
.Fixed a bug where table merge fails when an update clause exists but no update takes place.
Fixed a bug in the mock implementation of
to_char
that raisesIndexError
when an incoming column has a nonconsecutive row index.Fixed a bug in handling
CaseExpr
expressions that raisesIndexError
when an incoming column has a nonconsecutive row index.Fixed a bug in the implementation of
Column.like
that raisesIndexError
when an incoming column has a nonconsecutive row index.
Improvements¶
Added support for type coercion in the implementation of
DataFrame.replace
,DataFrame.dropna
, and the mock functioniff
.
Snowpark pandas API updates¶
New features¶
Added partial support for
DataFrame.pct_change
andSeries.pct_change
without thefreq
andlimit
parameters.Added support for
Series.str.get
.Added support for
Series.dt.dayofweek
,Series.dt.day_of_week
,Series.dt.dayofyear
, andSeries.dt.day_of_year
.Added support for
Series.str.__getitem__ (Series.str[...])
.Added support for
Series.str.lstrip
andSeries.str.rstrip
.Added support for
DataFrameGroupby.size
andSeriesGroupby.size
.Added support for
DataFrame.expanding
andSeries.expanding
for aggregationscount
,sum
,min
,max
,mean
,std
, andvar
withaxis=0
.Added support for
DataFrame.rolling
andSeries.rolling
for aggregation count withaxis=0
.Added support for
Series.str.match
.Added support for
DataFrame.resample
andSeries.resample
for aggregation size.
Bug fixes¶
Fixed a bug that causes output of
GroupBy.aggregate
columns to be ordered incorrectly.Fixed a bug where calling
DataFrame.describe
on a frame with duplicate columns of differingdtypes
could cause an error or incorrect results.Fixed a bug in
DataFrame.rolling
andSeries.rolling
sowindow=0
now throwsNotImplementedError
instead ofValueError
Improvements¶
Added support for named aggregations in
DataFrame.aggregate
andSeries.aggregate
withaxis=0
.pd.read_csv
reads using the native pandas CSV parser, then uploads data to Snowflake using parquet. This enables most of the parameters supported byread_csv
, including date parsing and numeric conversions. Uploading via parquet is roughly twice as fast as uploading via CSV.Initial work to support a
pd.Index
directly in Snowpark pandas. Support forpd.Index
as a first-class component of Snowpark pandas is under active development.Added a lazy index constructor and support for
len
,shape
,size
,empty
,to_pandas()
, andnames
. Fordf.index
, Snowpark pandas creates a lazy index object.For
df.columns
, Snowpark pandas supports a non-lazy version of anIndex
as the data is already stored locally.
Version 1.18.0 (2024-05-28)¶
Version 1.18.0 of the Snowpark library introduces some new features.
New features¶
Added the
DataFrame.cache_result
andSeries.cache_result
methods for users to persistDataFrame
andSeries
objects to a temporary table for the duration of a session to improve latency of subsequent operations.
Improvements¶
Added support for
DataFrame.pivot_table
with noindex
parameter and with themargins
parameter.Updated the signature of
DataFrame.shift
,Series.shift
,DataFrameGroupBy.shift
, andSeriesGroupBy.shift
to match pandas 2.2.1. Snowpark pandas does not yet support the newly-added suffix argument or sequence values of periods.Re-added support for
Series.str.split
.
Bug fixes¶
Fixed an issue with mixed columns for string methods (
Series.str.*
).
Local testing updates¶
New features¶
Added support for the following
DataFrameReader
read options to file formats CSV and JSON:PURGE
PATTERN
INFER_SCHEMA with value
False
ENCODING with value
UTF8
Added support for
DataFrame.analytics.moving_agg
andDataFrame.analytics.cumulative_agg_agg
.Added support for the
if_not_exists
parameter during UDF and stored procedure registration.
Bug fixes¶
Fixed a bug with processing time formats where the fractional second part was not handled properly.
Fixed a bug that caused function calls on
*
to fail.Fixed a bug that prevented the creation of
map
andstruct
type objects.Fixed a bug where the function
date_add
was unable to handle some numeric types.Fixed a bug where
TimestampType
casting resulted in incorrect data.Fixed a bug that caused
DecimalType
data to have incorrect precision in some cases.Fixed a bug where referencing a missing table or view raised an
IndexError
.Fixed a bug where the mocked function
to_timestamp_ntz
could not handleNone
data.Fixed a bug where mocked UDFs handled output data of
None
improperly.Fixed a bug where
DataFrame.with_column_renamed
ignored attributes from parentDataFrames
after join operations.Fixed a bug where the integer precision of large values was lost when converted to a pandas
DataFrame
.Fixed a bug where the schema of a
datetime
object was wrong when creating aDataFrame
from a pandasDataFrame
.Fixed a bug in the implementation of
Column.equal_nan
where null data was handled incorrectly.Fixed a bug where
DataFrame.drop
ignored attributes from parentDataFrames
after join operations.Fixed a bug in mocked function
date_part
where column type was set incorrectly.Fixed a bug where
DataFrameWriter.save_as_table
did not raise exceptions when inserting null data into non-nullable columns.Fixed a bug in the implementation of
DataFrameWriter.save_as_table
where:Append or truncate failed when incoming data had a different schema than the existing table.
Truncate failed when incoming data did not specify columns that are nullable.
Improvements¶
Removed the dependency check for
pyarrow
because it is not used.Improved the target type coverage of
Column.cast
, adding support for casting to boolean and all integral types.Aligned the error experience when calling UDFs and stored procedures.
Added appropriate error messages for the
is_permanent
andanonymous
options in UDFs and stored procedures registration to make it clearer that those features are not yet supported.File read operations with unsupported options and values now raise
NotImplementedError
instead of warnings and unclear error information.
Version 1.17.0 (2024-05-21)¶
Version 1.17.0 of the Snowpark library introduces some new features.
New features¶
Added support to add a comment on tables and views using the functions listed below:
DataFrameWriter.save_as_table
DataFrame.create_or_replace_view
DataFrame.create_or_replace_temp_view
DataFrame.create_or_replace_dynamic_table
Improvements¶
Improved error message to remind users to set
{"infer_schema": True}
when reading CSV file without specifying its schema.
Local testing updates¶
New features¶
Added support for
NumericType
andVariantType
data conversion in the mocked functionto_timestamp_ltz
,to_timestamp_ntz
,to_timestamp_tz
andto_timestamp
.Added support for
DecimalType
,BinaryType
,ArrayType
,MapType
,TimestampType
,DateType
andTimeType
data conversion in the mocked functionto_char
.Added support for the following APIs:
snowflake.snowpark.functions.to_varchar
snowflake.snowpark.DataFrame.pivot
snowflake.snowpark.Session.cancel_all
Introduced a new exception class
snowflake.snowpark.mock.exceptions.SnowparkLocalTestingException
.Added support for casting to
FloatType
.
Bug fixes¶
Fixed a bug that stored procedures and UDFs should not remove imports already in the
sys.path
during the clean-up step.Fixed a bug that when processing
datetime
format, the fractional second part is not handled properly.Fixed a bug where file operations on the Windows platform were unable to properly handle file separators in directory names.
Fixed a bug that on the Windows platform that, when reading a pandas dataframe, an
IntervalType
column with integer data can not be processed.Fixed a bug that prevented users from being able to select multiple columns with the same alias.
Fixed a bug where
Session.get_current_[schema|database|role|user|account|warehouse]
returns uppercased identifiers when identifiers are quoted.Fixed a bug that function
substr
andsubstring
can not handle a zero-basedstart_expr
.
Improvements¶
Standardized the error experience by raising
SnowparkLocalTestingException
in error cases, which is on par with theSnowparkSQLException
raised in non-local execution.Improved the error experience of the
Session.write_pandas
method so thatNotImplementError
will be raised when called.Aligned the error experience with reusing a closed session in non-local execution.
Version 1.16.0 (2024-05-08)¶
Version 1.16.0 of the Snowpark library introduces some new features.
New features¶
Added
snowflake.snowpark.Session.lineage.trace
to explore data lineage of Snowflake objects.Added support for registering stored procedures with packages given as Python modules.
Added support for structured type schema parsing.
Bug fixes¶
Fixed a bug where, when inferring a schema, single quotes were added to stage files that already had single quotes.
Local testing updates¶
New features¶
Added support for
StringType
,TimestampType
andVariantType
data conversion in the mocked functionto_date
.Added support for the following APIs:
snowflake.snowpark.functions
:get
concat
concat_ws
Bug fixes¶
Fixed a bug that caused
NaT
andNaN
values to not be recognized.Fixed a bug where, when inferring a schema, single quotes were added to stage files that already had single quotes.
Fixed a bug where
DataFrameReader.csv
was unable to handle quoted values containing a delimiter.Fixed a bug that when there is a
None
value in an arithmetic calculation, the output should remainNone
instead ofmath.nan
.Fixed a bug in function
sum
andcovar_pop
that when there is amath.nan
value in the data, the output should also bemath.nan
.Fixed a bug where stage operations can not handle directories.
Fixed a bug that
DataFrame.to_pandas
should take Snowflake numeric types with precision 38 asint64
.
Version 1.15.0 (2024-04-24)¶
Version 1.15.0 of the Snowpark library introduces some new features.
New features¶
Added
truncate
save mode inDataFrameWrite
to overwrite existing tables by truncating the underlying table instead of dropping it.Added telemetry to calculate query plan height and number of duplicate nodes during collect operations.
Added the functions below to unload data from a
DataFrame
into one or more files in a stage:DataFrame.write.json
DataFrame.write.csv
DataFrame.write.parquet
Added distributed tracing using open telemetry APIs for action functions in
DataFrame
andDataFrameWriter
:snowflake.snowpark.DataFrame
:collect
collect_nowait
to_pandas
count
show
snowflake.snowpark.DataFrameWriter
:save_as_table
Added support for
snow://
URLs tosnowflake.snowpark.Session.file.get
andsnowflake.snowpark.Session.file.get_stream
Added support to register stored procedures and UDFs with a
comment
.UDAF client support is ready for public preview. Please stay tuned for the Snowflake announcement of UDAF public preview.
Added support for dynamic pivot. This feature is currently in private preview.
Improvements¶
Improved the generated query performance for both compilation and execution by converting duplicate subqueries to Common Table Expressions (CTEs). It is still an experimental feature and it is not enabled by default. You can enable it by setting
session.cte_optimization_enabled
toTrue
.
Bug fixes¶
Fixed a bug where
statement_params
is not passed to query executions that register stored procedures and user defined functions.Fixed a bug causing
snowflake.snowpark.Session.file.get_stream
to fail for quoted stage locations.Fixed a bug that an internal type hint in
utils.py
might raiseAttributeError
when the underlying module can not be found.
Local testing updates¶
New features¶
Added support for registering UDFs and stored procedures.
Added support for the following APIs:
snowflake.snowpark.Session
:file.put
file.put_stream
file.get
file.get_stream
read.json
add_import
remove_import
get_imports
clear_imports
add_packages
add_requirements
clear_packages
remove_package
udf.register
udf.register_from_file
sproc.register
sproc.register_from_file
snowflake.snowpark.functions
current_database
current_session
date_trunc
object_construct
object_construct_keep_null
pow
sqrt
udf
sproc
Added support for
StringType
,TimestampType
andVariantType
data conversion in the mocked functionto_time
.
Bug fixes¶
Fixed a bug that null filled columns for constant functions.
Fixed
to_object
,to_array
andto_binary
to better handle null inputs.Fixed a bug that timestamp data comparison can not handle years beyond 2262.
Fixed a bug that
Session.builder.getOrCreate
should return the created mock session.
Version 1.14.0 (2024-03-20)¶
Version 1.14.0 of the Snowpark library introduces some new features.
New features¶
Added support for creating vectorized UDTFs with the
process
method.Added support for dataframe functions:
to_timestamp_ltz
to_timestamp_ntz
to_timestamp_tz
locate
Added support for ASOF JOIN type.
Added support for the following local testing APIs:
snowflake.snowpark.functions:
to_double
to_timestamp
to_timestamp_ltz
to_timestamp_ntz
to_timestamp_tz
greatest
least
convert_timezone
dateadd
date_part
snowflake.snowpark.Session:
get_current_account
get_current_warehouse
get_current_role
use_schema
use_warehouse
use_database
use_role
Improvements¶
Added telemetry to local testing.
Improved the error message of
DataFrameReader
to raiseFileNotFound
error when reading a path that does not exist or when there are no files under the path.
Bug fixes¶
Fixed a bug in
SnowflakePlanBuilder
wheresave_as_table
does not correctly filter columns whose names start with$
and is followed by a number.Fixed a bug where statement parameters might have no effect when resolving imports and packages.
Fixed bugs in local testing:
LEFT ANTI and LEFT SEMI joins drop rows with null values.
DataFrameReader.csv
incorrectly parses data when the optional parameterfield_optionally_enclosed_by
is specified.Column.regexp
only considers the first entry whenpattern
is aColumn
.Table.update
raisesKeyError
when updating null values in the rows.VARIANT columns raise errors at
DataFrame.collect
.count_distinct
does not work correctly when counting.Null values in integer columns raise
TypeError
.
Version 1.13.0 (2024-02-26)¶
Version 1.13.0 of the Snowpark library introduces some new features.
New Features¶
Added support for an optional
date_part
argument in functionlast_day
.SessionBuilder.app_name
will set thequery_tag
after the session is created.Added support for the following local testing functions:
current_timestamp
current_date
current_time
strip_null_value
upper
lower
length
initcap
Improvements¶
Added cleanup logic at interpreter shutdown to close all active sessions.
Bug fixes¶
Fixed a bug in
DataFrame.to_local_iterator
where the iterator could yield wrong results if another query is executed before the iterator finishes due to wrong isolation level.Fixed a bug that truncated table names in error messages while running a plan with local testing enabled.
Fixed a bug that
Session.range
returns empty result when the range is large.
Version 1.12.1 (2024-02-08)¶
Version 1.12.1 of the Snowpark library introduces some new features.
Improvements¶
Use
split_blocks=True
by default, duringto_pandas
conversion, for optimal memory allocation. This parameter is passed topyarrow.Table.to_pandas
, which enablesPyArrow
to split the memory allocation into smaller, more manageable blocks instead of allocating a single contiguous block. This results in better memory management when dealing with larger datasets.
Bug fixes¶
Fixed a bug in
DataFrame.to_pandas
that caused an error when evaluating on a Dataframe with anIntergerType
column with null values.
Version 1.12.0 (2024-01-29)¶
Version 1.12.0 of the Snowpark library introduces some new features.
Behavior Changes (API Compatible)¶
When parsing data types during a
to_pandas
operation, we rely on GS precision value to fix precision issues for large integer values. This may affect users where a column that was earlier returned asint8
gets returned asint64
. Users can fix this by explicitly specifying precision values for their return column.Aligned behavior for
Session.call
in case of table stored procedures where runningSession.call
would not trigger a stored procedure unless acollect()
operation was performed.StoredProcedureRegistration
now automatically addssnowflake-snowpark-python
as a package dependency on the client’s local version of the library. An error is thrown if the server cannot support that version.
New features¶
Exposed
statement_params
inStoredProcedure.__call__
.Added two optional arguments to
Session.add_import
:chunk_size
: The number of bytes to hash per chunk of the uploaded files.whole_file_hash
: By default only the first chunk of the uploaded import is hashed to save time. When this is set to True each uploaded file is fully hashed instead.
Added parameters
external_access_integrations
andsecrets
when creating a UDAF from Snowpark Python to allow integration with external access.Added a new method
Session.append_query_tag
, which allows an additional tag to be added to the current query tag by appending it as a comma separated value.Added a new method
Session.update_query_tag
, which allows updates to a JSON encoded dictionary query tag.SessionBuilder.getOrCreate
will now attempt to replace the singleton it returns when token expiration has been detected.Added the following functions in
snowflake.snowpark.functions
:array_except
create_map
sign
/signum
Added the following functions to
DataFrame.analytics
:Added the
moving_agg
function inDataFrame.analytics
to enable moving aggregations like sums and averages with multiple window sizes.Added the
cummulative_agg
function inDataFrame.analytics
to enable moving aggregations like sums and averages with multiple window sizes.
Bug fixes¶
Fixed a bug in
DataFrame.na.fill
that caused Boolean values to erroneously override integer values.Fixed a bug in
Session.create_dataframe
where the Snowpark DataFrames created using pandas DataFrames were not inferring the type for timestamp columns correctly. The behavior is as follows:Earlier timestamp columns without a timezone would be converted to nanosecond epochs and inferred as
LongType()
, but will now be correctly maintained as timestamp values and be inferred asTimestampType(TimestampTimeZone.NTZ)
.Earlier timestamp columns with a timezone would be inferred as
TimestampType(TimestampTimeZone.NTZ)
and loose timezone information but will now be correctly inferred asTimestampType(TimestampTimeZone.LTZ)
and timezone information is retained correctly.Set session parameter
PYTHON_SNOWPARK_USE_LOGICAL_TYPE_FOR_CREATE_DATAFRAME
to revert back to old behavior. Snowflake recommends that you update your code to align with correct behavior because the parameter will be removed in the future.
Fixed a bug that
DataFrame.to_pandas
gets decimal type when scale is not 0, and creates an object dtype inpandas
. Instead, we cast the value to a float64 type.Fixed bugs that wrongly flattened the generated SQL when one of the following happens:
DataFrame.filter()
is called afterDataFrame.sort().limit()
.DataFrame.sort()
orfilter()
is called on a DataFrame that already has a window function or sequence-dependent data generator column. For instance,df.select("a", seq1().alias("b")).select("a", "b").sort("a")
won’t flatten the sort clause anymore.A window or sequence-dependent data generator column is used after
DataFrame.limit()
. For instance,df.limit(10).select(row_number().over())
won’t flatten the limit and select in the generated SQL.
Fixed a bug where aliasing a DataFrame column raised an error when the DataFame was copied from another DataFrame with an aliased column. For instance,
df = df.select(col("a").alias("b")) df = copy(df) df.select(col("b").alias("c")) # Threw an error. Now it's fixed.
Fixed a bug in
Session.create_dataframe
that the non-nullable field in a schema is not respected for Boolean type. Note that this fix is only effective when the user has the privilege to create a temp table.Fixed a bug in SQL simplifier where non-select statements in
session.sql
dropped a SQL query when used withlimit()
.Fixed a bug that raised an exception when session parameter
ERROR_ON_NONDETERMINISTIC_UPDATE
is true.