pd.DataFrame supported APIs

The following table is structured as follows: The first column contains the method name. The second column is a flag for whether or not there is an implementation in Snowpark for the method in the left column.

Note

Y stands for yes, i.e., supports distributed implementation, N stands for no and API simply errors out, P stands for partial (meaning some parameters may not be supported yet), and D stands for defaults to single node pandas execution via UDF/Sproc.

Attributes

DataFrame attribute

Snowpark implemented? (Y/N/P/D)

Notes for current implementation

T

P

D if any column name is not str or tuple of str

at

P

N for set with MultiIndex

attrs

Y

axes

Y

columns

Y

dtypes

Y

empty

Y

flags

Y

iat

Y

iloc

Y

index

Y

This operation is not recommended since it pulls the index into local memory.

loc

P

N for set with MultiIndex.

ndim

Y

shape

Y

size

Y

style

Y

Performed locally on the client

values

Y

Methods

DataFrame method

Snowpark implemented? (Y/N/P/D)

Missing parameters

Notes for current implementation

abs

Y

add

P

level

add_prefix

Y

add_suffix

Y

agg

P

margins, observed, sort

If axis == 0: Y when function is one of count, mean, min, max, sum, median, size; std and var supported with ddof=0 or ddof=1; quantile is supported when q is the default value or a scalar. If axis == 1: Y when function is count, min, max, or sum and the index is not a MultiIndex.

aggregate

P

margins, observed, sort

See agg

align

P

copy, level, fill_value

N for MultiIndex, for deprecated parameters

method, limit, fill_axis, broadcast_axis, or if fill_value is not default of np.nan

all

P

N for non-integer/boolean types

any

P

N for non-integer/boolean types

apply

P

N if func is not callable or result_type is given or args and kwargs contain DataFrame or Series N if func maps to different column labels.

applymap

P

N if na_action == "ignore"

asfreq

P

how, normalize, fill_value

Only DatetimeIndex is supported and its freq will be lost. Only rule frequencies ‘s’, ‘min’, ‘h’, and ‘D’ are supported.

asof

N

assign

Y

astype

P

N if from string to datetime/timedelta or errors == "ignore"

at_time

N

backfill

P

N if param downcast is set.

between_time

N

bfill

P

N if param downcast is set.

bool

N

boxplot

N

clip

N

combine

N

combine_first

N

compare

P

align_axis, keep_shape, keep_equal, result_names

convert_dtypes

N

Not supported since Snowpark pandas is already using nullable datatypes internally.

copy

Y

corr

P

N if method is not ‘pearson’

corrwith

N

count

Y

cov

N

cummax

Y

cummin

Y

cumprod

N

cumsum

P

Y if values are numeric

describe

Y

diff

Y

div

P

level

divide

P

level

dot

N

drop

Y

drop_duplicates

Y

droplevel

N

dropna

P

N if axis == 1

duplicated

Y

eq

P

level

equals

Y

eval

N

ewm

N

expanding

P

method is ignored

N if axis = 1

explode

N

ffill

P

N if parameter downcast is set. limit parameter only supported if method parameter is used.

fillna

P

See ffill

filter

N

first

Y

first_valid_index

Y

floordiv

P

level

from_dict

N

from_records

N

ge

P

level

get

Y

groupby

P

observed is ignored since Categoricals are not implemented yet

Y, support axis == 0 and by is column label or Series from the current DataFrame; otherwise N; Note that supported functions are agg, count, cumcount, cummax, cummin, cumsum, first, last, max, mean, median, min, quantile, shift, size, std, sum, and var. Otherwise N

gt

P

level

head

Y

hist

N

idxmax

P

N for MultiIndex dataframes

idxmin

P

N for MultiIndex dataframes

infer_objects

N

info

P

Index is different, zero bytes reported for memory

insert

Y

interpolate

N

isetitem

N

isin

Y

isna

Y

isnull

Y

items

Y

iterrows

Y

itertuples

Y

join

P

N if given the validate param.

keys

Y

kurt

N

kurtosis

N

last

Y

last_valid_index

Y

le

P

level

lt

P

level

map

N

mask

P

N if given axis when other is a DataFrame or level parameters; N if cond or other is Callable

max

Y

mean

Y

median

Y

melt

P

col_level

N when columns are MultiIndex

memory_usage

N

merge

P

N if param validate is given

min

Y

mod

P

level

mode

N

mul

P

level

multiply

P

level

ne

P

level

nlargest

P

N if keep == "all"

notna

Y

notnull

Y

nsmallest

P

N if keep == "all"

nunique

P

N if axis == 1

pad

P

See ffill

pct_change

P

limit, freq

pipe

N

pivot

P

See pivot_table

pivot_table

P

observed, sort

N if index, columns, or values is not str, list of str, or None; or MultiIndex; or any argfunc is not “count”, “mean”, “min”, “max”, or “sum”. N if index is None, margins is True and aggfunc is “count” or “mean” or a dictionary. N if index is None and aggfunc is a dictionary containing lists of aggfuncs to apply.

plot

D

Performed locally on the client

pop

N

pow

P

level

prod

N

product

N

quantile

P

Y if axis == 0, and interpolation is "linear" or "nearest", and method is "single".

query

N

radd

P

level

rank

P

N if axis == 1

rdiv

P

level

reindex

P

N if axis is MultiIndex or method is nearest.

reindex_like

N

rename

P

N if mapper is callable or the series has multiindex

rename_axis

Y

reorder_levels

N

replace

P

copy is ignored, method, limit

resample

P

axis, label, convention, kind, , level, origin, , offset, group_keys

Only DatetimeIndex is supported and its freq will be lost. rule frequencies ‘s’, ‘min’, ‘h’, and ‘D’ are supported. rule frequencies ‘W’, ‘ME’, and ‘YE’ are supported with closed = “left”

reset_index

Y

rfloordiv

P

level

rmod

P

level

rmul

P

level

rolling

P

method is ignored, step, win_type, closed, on

N for non-integer window, axis = 1, or min_periods = 0

round

P

N if decimals is Series

rpow

P

level

rsub

P

level

rtruediv

P

level

sample

P

N if weights or random_state is specified when axis = 0

select_dtypes

Y

sem

N

set_axis

Y

set_flags

N

set_index

Y

shift

P

freq

No support for freq != None.

skew

P

N if axis == 1 or skipna == False or numeric_only=False

sort_index

P

key

N if given the key param. N if axis == 1, or MultiIndex.

sort_values

P

key, kind is ignored

N if given the key param or axis == 1. The kind parameter has no effect. Snowpark pandas always uses a stable sort algorithm, while pandas by default does not.

squeeze

Y

stack

P

level, future_stack is ignored

N for MultiIndex

std

P

N if ddof is not 0 or 1

sub

P

level

subtract

P

level

sum

Y

swapaxes

N

swaplevel

N

tail

Y

take

Y

to_clipboard

N

to_csv

P

Supports writing to both local and snowflake stage. Filepath starting with @ is treated as snowflake stage location. Writing to local file supports all parameters. Writing to snowflake state does not support float_format, mode, encoding, quoting, quotechar, lineterminator, doublequote and decimal parameters.

to_dict

Y

to_excel

N

to_feather

N

to_gbq

N

to_hdf

N

to_html

N

to_json

N

to_latex

N

to_markdown

N

to_numpy

Y

to_orc

N

to_parquet

N

to_period

N

to_pickle

N

to_records

N

to_sql

N

to_stata

N

to_string

N

to_timestamp

N

to_xarray

N

to_xml

N

transform

P

Y if func is callable.

transpose

P

See T

truediv

P

level

truncate

N

tz_convert

P

axis, level, copy

N if timezone format is not supported. Only timezones listed in pytz.all_timezones are supported. For example, UTC is supported but UTC+/-<offset>, such as UTC+09:00, is not supported.

tz_localize

P

axis, level, copy ambiguous, nonexistent

N if timezone format is not supported. Only timezones listed in pytz.all_timezones are supported. For example, UTC is supported but UTC+/-<offset>, such as UTC+09:00, is not supported.

unstack

P

sort

N for non-integer level.

update

Y

value_counts

Y

var

P

See std

where

P

See mask

xs

N

Language: English