modin.pandas.merge_asof

modin.pandas.merge_asof(left, right, on: str | None = None, left_on: str | None = None, right_on: str | None = None, left_index: bool = False, right_index: bool = False, by: str | list[str] | None = None, left_by: str | None = None, right_by: str | None = None, suffixes: Suffixes = ('_x', '_y'), tolerance: int | Timedelta | None = None, allow_exact_matches: bool = True, direction: str = 'backward') pd.DataFrame[source] (https://github.com/snowflakedb/snowpark-python/blob/v1.26.0/snowpark-python/src/snowflake/snowpark/modin/plugin/extensions/general_overrides.py#L655-L745)

Perform a merge by key distance.

This is similar to a left-join except that we match on nearest key rather than equal keys. Both DataFrames must be sorted by the key. For each row in the left DataFrame:

A “backward” search selects the last row in the right DataFrame whose ‘on’ key is less than or equal to the left’s key. A “forward” search selects the first row in the right DataFrame whose ‘on’ key is greater than or equal to the left’s key. A “nearest” search selects the row in the right DataFrame whose ‘on’ key is closest in absolute distance to the left’s key.

Optionally match on equivalent keys with ‘by’ before searching with ‘on’.

Parameters:
  • left (DataFrame or named Series.) –

  • right (DataFrame or named Series.) –

  • on (label) – Field name to join on. Must be found in both DataFrames. The data MUST be ordered. Furthermore, this must be a numeric column such as datetimelike, integer, or float. On or left_on/right_on must be given.

  • left_on (label) – Field name to join on in left DataFrame.

  • right_on (label) – Field name to join on in right DataFrame.

  • left_index (bool) – Use the index of the left DataFrame as the join key.

  • right_index (bool) – Use the index of the right DataFrame as the join key.

  • by (column name or list of column names) – Match on these columns before performing merge operation.

  • left_by (column name) – Field names to match on in the left DataFrame.

  • right_by (column name) – Field names to match on in the right DataFrame.

  • suffixes (2-length sequence (tuple, list, )) – Suffix to apply to overlapping column names in the left and right side, respectively.

  • tolerance (int or Timedelta, optional, default None) – Select asof tolerance within this range; must be compatible with the merge index.

  • allow_exact_matches (bool, default True) – If True, allow matching with the same ‘on’ value (i.e. less-than-or-equal-to / greater-than-or-equal-to) If False, don’t match the same ‘on’ value (i.e., strictly less-than / strictly greater-than).

  • direction (‘backward’ (default), ‘forward’, or ‘nearest’) – Whether to search for prior, subsequent, or closest matches.

Return type:

Snowpark pandas DataFrame

Examples

>>> left = pd.DataFrame({"a": [1, 5, 10], "left_val": ["a", "b", "c"]})
>>> left
    a left_val
0   1        a
1   5        b
2  10        c
>>> right = pd.DataFrame({"a": [1, 2, 3, 6, 7], "right_val": [1, 2, 3, 6, 7]})
>>> right
   a  right_val
0  1          1
1  2          2
2  3          3
3  6          6
4  7          7
>>> pd.merge_asof(left, right, on="a")
    a left_val  right_val
0   1        a          1
1   5        b          3
2  10        c          7
>>> pd.merge_asof(left, right, on="a", allow_exact_matches=False)
    a left_val  right_val
0   1        a        NaN
1   5        b        3.0
2  10        c        7.0
>>> pd.merge_asof(left, right, on="a", direction="forward")
    a left_val  right_val
0   1        a        1.0
1   5        b        6.0
2  10        c        NaN
Copy

Here is a real-world times-series example:

>>> quotes = pd.DataFrame(
...    {
...        "time": [
...            pd.Timestamp("2016-05-25 13:30:00.023"),
...            pd.Timestamp("2016-05-25 13:30:00.023"),
...            pd.Timestamp("2016-05-25 13:30:00.030"),
...            pd.Timestamp("2016-05-25 13:30:00.041"),
...            pd.Timestamp("2016-05-25 13:30:00.048"),
...            pd.Timestamp("2016-05-25 13:30:00.049"),
...            pd.Timestamp("2016-05-25 13:30:00.072"),
...            pd.Timestamp("2016-05-25 13:30:00.075")
...        ],
...        "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
...        "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03]
...    }
... )
>>> quotes
                     time     bid     ask
0 2016-05-25 13:30:00.023  720.50  720.93
1 2016-05-25 13:30:00.023   51.95   51.96
2 2016-05-25 13:30:00.030   51.97   51.98
3 2016-05-25 13:30:00.041   51.99   52.00
4 2016-05-25 13:30:00.048  720.50  720.93
5 2016-05-25 13:30:00.049   97.99   98.01
6 2016-05-25 13:30:00.072  720.50  720.88
7 2016-05-25 13:30:00.075   52.01   52.03
>>> trades = pd.DataFrame(
...    {
...        "time": [
...            pd.Timestamp("2016-05-25 13:30:00.023"),
...            pd.Timestamp("2016-05-25 13:30:00.038"),
...            pd.Timestamp("2016-05-25 13:30:00.048"),
...            pd.Timestamp("2016-05-25 13:30:00.048"),
...            pd.Timestamp("2016-05-25 13:30:00.048")
...        ],
...        "price": [51.95, 51.95, 720.77, 720.92, 98.0],
...        "quantity": [75, 155, 100, 100, 100]
...    }
... )
>>> trades
                     time   price  quantity
0 2016-05-25 13:30:00.023   51.95        75
1 2016-05-25 13:30:00.038   51.95       155
2 2016-05-25 13:30:00.048  720.77       100
3 2016-05-25 13:30:00.048  720.92       100
4 2016-05-25 13:30:00.048   98.00       100
>>> pd.merge_asof(trades, quotes, on="time")
                     time   price  quantity     bid     ask
0 2016-05-25 13:30:00.023   51.95        75   51.95   51.96
1 2016-05-25 13:30:00.038   51.95       155   51.97   51.98
2 2016-05-25 13:30:00.048  720.77       100  720.50  720.93
3 2016-05-25 13:30:00.048  720.92       100  720.50  720.93
4 2016-05-25 13:30:00.048   98.00       100  720.50  720.93
Copy
Language: English