modin.pandas.DataFrame.assign¶
- DataFrame.assign(**kwargs) DataFrame [source] (https://github.com/snowflakedb/snowpark-python/blob/v1.26.0/snowpark-python/.tox/docs/lib/python3.9/site-packages/modin/pandas/dataframe.py#L641-L651)¶
Assign new columns to a
DataFrame
.Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten.
- Parameters:
**kwargs (dict of {str: callable or Series}) – The column names are the keywords. If the values are callable, they are computed on the DataFrame and assigned to the new columns. The callable must not change input DataFrame (though Snowpark pandas doesn’t check it). If the values are not callable, (e.g. a Series, scalar, or array), they are simply assigned.
- Returns:
A new DataFrame with the new columns in addition to all the existing columns.
- Return type:
Notes
Assigning multiple columns within the same assign is possible. Later items in **kwargs may refer to newly created or modified columns in df; items are computed and assigned into df in order.
If an array that of the wrong length is passed in to assign, Snowpark pandas will either truncate the array, if it is too long, or broadcast the last element of the array until the array is the correct length if it is too short. This differs from native pandas, which will error out with a ValueError if the length of the array does not match the length of df. This is done to preserve Snowpark pandas’ lazy evaluation paradigm.
Examples
>>> df = pd.DataFrame({'temp_c': [17.0, 25.0]}, ... index=['Portland', 'Berkeley']) >>> df temp_c Portland 17.0 Berkeley 25.0
>>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32) temp_c temp_f Portland 17.0 62.6 Berkeley 25.0 77.0
>>> df.assign(temp_f=df['temp_c'] * 9 / 5 + 32) temp_c temp_f Portland 17.0 62.6 Berkeley 25.0 77.0
>>> df.assign(temp_f=lambda x: x['temp_c'] * 9 / 5 + 32, ... temp_k=lambda x: (x['temp_f'] + 459.67) * 5 / 9) temp_c temp_f temp_k Portland 17.0 62.6 290.15 Berkeley 25.0 77.0 298.15
>>> df = pd.DataFrame({'col1': [17.0, 25.0, 22.0]}) >>> df col1 0 17.0 1 25.0 2 22.0
>>> df.assign(new_col=[10, 11]) col1 new_col 0 17.0 10 1 25.0 11 2 22.0 11
>>> df.assign(new_col=[10, 11, 12, 13, 14]) col1 new_col 0 17.0 10 1 25.0 11 2 22.0 12