snowflake.snowpark.Session.write_pandas

Session.write_pandas(df: Union[pandas.DataFrame, modin.pandas.DataFrame, modin.pandas.Series], table_name: str, *, database: Optional[str] = None, schema: Optional[str] = None, chunk_size: Optional[int] = None, compression: str = 'gzip', on_error: str = 'abort_statement', parallel: int = 4, quote_identifiers: bool = True, auto_create_table: bool = False, create_temp_table: bool = False, overwrite: bool = False, table_type: Literal['', 'temp', 'temporary', 'transient'] = '', use_logical_type: Optional[bool] = None, **kwargs: Dict[str, Any]) Table[source] (https://github.com/snowflakedb/snowpark-python/blob/v1.26.0/snowpark-python/src/snowflake/snowpark/session.py#L2758-L3016)

Writes a pandas DataFrame to a table in Snowflake and returns a Snowpark DataFrame object referring to the table where the pandas DataFrame was written to.

Parameters:
  • df – The pandas DataFrame or Snowpark pandas DataFrame or Series we’d like to write back.

  • table_name – Name of the table we want to insert into.

  • database – Database that the table is in. If not provided, the default one will be used.

  • schema – Schema that the table is in. If not provided, the default one will be used.

  • chunk_size – Number of rows to be inserted once. If not provided, all rows will be dumped once. Default to None normally, 100,000 if inside a stored procedure.

  • compression – The compression used on the Parquet files: gzip or snappy. Gzip gives supposedly a better compression, while snappy is faster. Use whichever is more appropriate.

  • on_error – Action to take when COPY INTO statements fail. See details at copy options.

  • parallel – Number of threads to be used when uploading chunks. See details at parallel parameter.

  • quote_identifiers – By default, identifiers, specifically database, schema, table and column names (from DataFrame.columns) will be quoted. If set to False, identifiers are passed on to Snowflake without quoting, i.e. identifiers will be coerced to uppercase by Snowflake.

  • auto_create_table – When true, automatically creates a table to store the passed in pandas DataFrame using the passed in database, schema, and table_name. Note: there are usually multiple table configurations that would allow you to upload a particular pandas DataFrame successfully. If you don’t like the auto created table, you can always create your own table before calling this function. For example, auto-created tables will store list, tuple and dict as strings in a VARCHAR column.

  • create_temp_table – (Deprecated) The to-be-created table will be temporary if this is set to True. Note that to avoid breaking changes, currently when this is set to True, it overrides table_type.

  • overwrite – Default value is False and the pandas DataFrame data is appended to the existing table. If set to True and if auto_create_table is also set to True, then it drops the table. If set to True and if auto_create_table is set to False, then it truncates the table. Note that in both cases (when overwrite is set to True) it will replace the existing contents of the table with that of the passed in pandas DataFrame.

  • table_type – The table type of table to be created. The supported values are: temp, temporary, and transient. An empty string means to create a permanent table. Learn more about table types here.

  • use_logical_type – Boolean that specifies whether to use Parquet logical types when reading the parquet files for the uploaded pandas dataframe. With this file format option, Snowflake can interpret Parquet logical types during data loading. To enable Parquet logical types, set use_logical_type as True. Set to None to use Snowflakes default. For more information, see: file format options:.

Example:

>>> import pandas as pd
>>> pandas_df = pd.DataFrame([(1, "Steve"), (2, "Bob")], columns=["id", "name"])
>>> snowpark_df = session.write_pandas(pandas_df, "write_pandas_table", auto_create_table=True, table_type="temp")
>>> snowpark_df.sort('"id"').to_pandas()
   id   name
0   1  Steve
1   2    Bob

>>> pandas_df2 = pd.DataFrame([(3, "John")], columns=["id", "name"])
>>> snowpark_df2 = session.write_pandas(pandas_df2, "write_pandas_table", auto_create_table=False)
>>> snowpark_df2.sort('"id"').to_pandas()
   id   name
0   1  Steve
1   2    Bob
2   3   John

>>> pandas_df3 = pd.DataFrame([(1, "Jane")], columns=["id", "name"])
>>> snowpark_df3 = session.write_pandas(pandas_df3, "write_pandas_table", auto_create_table=False, overwrite=True)
>>> snowpark_df3.to_pandas()
   id  name
0   1  Jane

>>> pandas_df4 = pd.DataFrame([(1, "Jane")], columns=["id", "name"])
>>> snowpark_df4 = session.write_pandas(pandas_df4, "write_pandas_transient_table", auto_create_table=True, table_type="transient")
>>> snowpark_df4.to_pandas()
   id  name
0   1  Jane
Copy

Note

1. Unless auto_create_table is True, you must first create a table in Snowflake that the passed in pandas DataFrame can be written to. If your pandas DataFrame cannot be written to the specified table, an exception will be raised.

2. If the dataframe is Snowpark pandas DataFrame or Series, it will call modin.pandas.DataFrame.to_snowflake or modin.pandas.Series.to_snowflake internally to write a Snowpark pandas DataFrame into a Snowflake table.

3. If the input pandas DataFrame has datetime64[ns, tz] columns and auto_create_table is set to True, they will be converted to TIMESTAMP_LTZ in the output Snowflake table by default. If TIMESTAMP_TZ is needed for those columns instead, please manually create the table before loading data.

Language: English