modin.pandas.Series.cache_result

Series.cache_result(inplace: bool = False) Optional[Series][source] (https://github.com/snowflakedb/snowpark-python/blob/v1.26.0/snowpark-python/src/snowflake/snowpark/modin/plugin/extensions/series_extensions.py#L205-L216)

Persists the current Snowpark pandas Series to a temporary table to improve the latency of subsequent operations.

Parameters:

inplace – bool, default False Whether to perform the materialization inplace.

Returns:

Snowpark pandas Series or None

Cached Snowpark pandas Series or None if inplace=True.

Note

  • The temporary table produced by this method lasts for the duration of the session.

Examples:

Let’s make a Series using a computationally expensive operation, e.g.: >>> series = pd.concat([pd.Series([i]) for i in range(30)])

Due to Snowpark pandas lazy evaluation paradigm, every time this Series is used, it will be recomputed - causing every subsequent operation on this Series to re-perform the 30 unions required to produce it. This makes subsequent operations more expensive. The cache_result API can be used to persist the Series to a temporary table for the duration of the session - replacing the nested 30 unions with a single read from a table.

>>> new_series = series.cache_result()
Copy
>>> import numpy as np
Copy
>>> np.all((new_series == series).values)
True
Copy
>>> series.reset_index(drop=True, inplace=True) # Slower
Copy
>>> new_series.reset_index(drop=True, inplace=True) # Faster
Copy
Language: English