modin.pandas.DataFrame.replace

DataFrame.replace(to_replace=None, value=_NoDefault.no_default, inplace: bool = False, limit=None, regex: bool = False, method: str | NoDefault = _NoDefault.no_default)[source] (https://github.com/snowflakedb/snowpark-python/blob/v1.26.0/snowpark-python/src/snowflake/snowpark/modin/plugin/extensions/dataframe_overrides.py#L1594-L1616)

Replace values given in to_replace with value.

Values of the DataFrame are replaced with other values dynamically. This differs from updating with .loc or .iloc, which require you to specify a location to update with some value.

Parameters:
  • to_replace (str, regex, list, dict, Series, int, float, or None) –

    How to find the values that will be replaced.

    • numeric, str or regex:

      • numeric: numeric values equal to to_replace will be replaced with value

      • str: string exactly matching to_replace will be replaced with value

      • regex: regexs matching to_replace will be replaced with value

    • list of str, regex, or numeric:

      • First, if to_replace and value are both lists, they must be the same length.

      • Second, if regex=True then all the strings in both lists will be interpreted as regexs otherwise they will match directly. This doesn’t matter much for value since there are only a few possible substitution regexes you can use.

      • str, regex and numeric rules apply as above.

    • dict:

      • Dicts can be used to specify different replacement values for different existing values. For example, {{'a': 'b', 'y': 'z'}} replaces the value ‘a’ with ‘b’ and ‘y’ with ‘z’. To use a dict in this way, the optional value parameter should not be given.

      • For a DataFrame a dict can specify that different values should be replaced in different columns. For example, {{'a': 1, 'b': 'z'}} looks for the value 1 in column ‘a’ and the value ‘z’ in column ‘b’ and replaces these values with whatever is specified in value. The value parameter should not be None in this case. You can treat this as a special case of passing two lists except that you are specifying the column to search in.

      • For a DataFrame nested dictionaries, e.g., {{'a': {{'b': np.nan}}}}, are read as follows: look in column ‘a’ for the value ‘b’ and replace it with NaN. The optional value parameter should not be specified to use a nested dict in this way. You can nest regular expressions as well. Note that column names (the top-level dictionary keys in a nested dictionary) cannot be regular expressions.

    • None:

      • This means that the regex argument must be a string, compiled regular expression, or list, dict, ndarray or Series of such elements. If value is also None then this must be a nested dictionary or Series.

    See the examples section for examples of each of these.

  • value (scalar, dict, list, str, regex, default None) – Value to replace any values matching to_replace with. For a DataFrame a dict of values can be used to specify which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.

  • inplace (bool, default False) – Whether to modify the DataFrame rather than creating a new one.

  • limit (int, default None) – Maximum size gap to forward or backward fill. This parameter is not supported.

  • regex (bool or same types as to_replace, default False) – Whether to interpret to_replace and/or value as regular expressions. Alternatively, this could be a regular expression or a list, dict, or array of regular expressions in which case to_replace must be None.

  • method ({{'pad', 'ffill', 'bfill'}}) – The method to use when for replacement, when to_replace is a scalar, list or tuple and value is None. This parameter is not supported.

Returns:

DataFrame Object after replacement if inplace=False, None otherwise.

Return type:

DataFrame

Raises:
  • AssertionError

    • If regex is not a bool and to_replace is not None.

  • TypeError

    • If to_replace is not a scalar, array-like, dict, or None * If to_replace is a dict and value is not a list, dict, ndarray, or Series * If to_replace is None and regex is not compilable into a regular expression or is a list, dict, ndarray, or Series. * When replacing multiple bool or datetime64 objects and the arguments to to_replace does not match the type of the value being replaced

  • ValueError

    • If a list or an ndarray is passed to to_replace and value but they are not the same length.

  • NotImplementedError

    • If method or limit is provided.

Notes

  • Regex substitution is performed under the hood using snowflake backend. which supports POSIX ERE syntax for regular expressions. Please check usage notes for details. https://docs.snowflake.cn/en/sql-reference/functions-regexp#general-usage-notes

  • Regular expressions only replace string values. If a regular expression is created to match floating point numbers, it will only match string data not numeric data.

  • This method has a lot of options. You are encouraged to experiment and play with this method to gain intuition about how it works.

Examples

Scalar `to_replace` and `value`

>>> df = pd.DataFrame({'A': [0, 1, 2, 3, 4], 'B': [5, 6, 7, 8, 9]})
>>> df.replace(0, 5)
   A  B
0  5  5
1  1  6
2  2  7
3  3  8
4  4  9
Copy

List-like `to_replace`

>>> df.replace([0, 1, 2, 3], 4)
   A  B
0  4  5
1  4  6
2  4  7
3  4  8
4  4  9
Copy
>>> df.replace([0, 1, 2, 3], [4, 3, 2, 1])
   A  B
0  4  5
1  3  6
2  2  7
3  1  8
4  4  9
Copy

dict-like `to_replace`

>>> df.replace({0: 10, 1: 100})
     A  B
0   10  5
1  100  6
2    2  7
3    3  8
4    4  9
Copy
>>> df = pd.DataFrame({'A': [0, 1, 2, 3, 4], 'B': [5, 6, 7, 8, 9], 'C': ['a', 'b', 'c', 'd', 'e']})
>>> df.replace({'A': 0, 'B': 5}, 100)
     A    B  C
0  100  100  a
1    1    6  b
2    2    7  c
3    3    8  d
4    4    9  e
Copy
>>> df.replace({'A': {0: 100, 4: 400}})
     A  B  C
0  100  5  a
1    1  6  b
2    2  7  c
3    3  8  d
4  400  9  e
Copy

Regular expression `to_replace`

>>> df = pd.DataFrame({'A': ['bat', 'foo', 'bait'],
...                    'B': ['abc', 'bar', 'xyz']})
>>> df.replace(to_replace=r'^ba.$', value='new', regex=True)
      A    B
0   new  abc
1   foo  new
2  bait  xyz
Copy
>>> df.replace({'A': r'^ba.$'}, {'A': 'new'}, regex=True)
      A    B
0   new  abc
1   foo  bar
2  bait  xyz
Copy
>>> df.replace(regex=r'^ba.$', value='new')
      A    B
0   new  abc
1   foo  new
2  bait  xyz
Copy
>>> df.replace(regex={r'^ba.$': 'new', 'foo': 'xyz'})
      A    B
0   new  abc
1   xyz  new
2  bait  xyz
Copy
>>> df.replace(regex=[r'^ba.$', 'foo'], value='new')
      A    B
0   new  abc
1   new  new
2  bait  xyz
Copy

When regex=True, value is not None and to_replace is a string, the replacement will be applied in all columns of the DataFrame.

>>> df = pd.DataFrame({'A': [0, 1, 2, 3, 4],
...                    'B': ['a', 'b', 'c', 'd', 'e'],
...                    'C': ['f', 'g', 'h', 'i', 'j']})
Copy
>>> df.replace(to_replace='^[a-g]', value='e', regex=True)
     A  B  C
0  0.0  e  e
1  1.0  e  e
2  2.0  e  h
3  3.0  e  i
4  4.0  e  j
Copy

If value is not None and to_replace is a dictionary, the dictionary keys will be the DataFrame columns that the replacement will be applied.

>>> df.replace(to_replace={'B': '^[a-c]', 'C': '^[h-j]'}, value='e', regex=True)
   A  B  C
0  0  e  f
1  1  e  g
2  2  e  e
3  3  d  e
4  4  e  e
Copy
Language: English