modin.pandas.DataFrame.cache_result¶
- DataFrame.cache_result(inplace: bool = False) Optional[DataFrame][source] (https://github.com/snowflakedb/snowpark-python/blob/v1.23.0/src/snowflake/snowpark/modin/plugin/extensions/dataframe_extensions.py#L243-L254)¶
- Persists the current Snowpark pandas DataFrame to a temporary table to improve the latency of subsequent operations. - Parameters:
- inplace – bool, default False Whether to perform the materialization inplace. 
- Returns:
- Snowpark pandas DataFrame or None
- Cached Snowpark pandas DataFrame or None if - inplace=True.
 
 - Note - The temporary table produced by this method lasts for the duration of the session. 
 - Examples: - Let’s make a DataFrame using a computationally expensive operation, e.g.: >>> df = pd.concat([pd.DataFrame([range(i, i+5)]) for i in range(0, 150, 5)]) - Due to Snowpark pandas lazy evaluation paradigm, every time this DataFrame is used, it will be recomputed - causing every subsequent operation on this DataFrame to re-perform the 30 unions required to produce it. This makes subsequent operations more expensive. The cache_result API can be used to persist the DataFrame to a temporary table for the duration of the session - replacing the nested 30 unions with a single read from a table. - >>> new_df = df.cache_result() - >>> import numpy as np - >>> np.all((new_df == df).values) True - >>> df.reset_index(drop=True, inplace=True) # Slower - >>> new_df.reset_index(drop=True, inplace=True) # Faster