modin.pandas.merge_asof¶
- modin.pandas.merge_asof(left, right, on: str | None = None, left_on: str | None = None, right_on: str | None = None, left_index: bool = False, right_index: bool = False, by: str | list[str] | None = None, left_by: str | None = None, right_by: str | None = None, suffixes: Suffixes = ('_x', '_y'), tolerance: int | Timedelta | None = None, allow_exact_matches: bool = True, direction: str = 'backward') pd.DataFrame[source] (https://github.com/snowflakedb/snowpark-python/blob/v1.41.0/src/snowflake/snowpark/modin/plugin/extensions/general_overrides.py#L661-L751)¶
- Perform a merge by key distance. - This is similar to a left-join except that we match on nearest key rather than equal keys. Both DataFrames must be sorted by the key. For each row in the left DataFrame: - A “backward” search selects the last row in the right DataFrame whose ‘on’ key is less than or equal to the left’s key. A “forward” search selects the first row in the right DataFrame whose ‘on’ key is greater than or equal to the left’s key. A “nearest” search selects the row in the right DataFrame whose ‘on’ key is closest in absolute distance to the left’s key. - Optionally match on equivalent keys with ‘by’ before searching with ‘on’. - Parameters:
- on (label) – Field name to join on. Must be found in both DataFrames. The data MUST be ordered. Furthermore, this must be a numeric column such as datetimelike, integer, or float. On or left_on/right_on must be given. 
- left_on (label) – Field name to join on in left DataFrame. 
- right_on (label) – Field name to join on in right DataFrame. 
- left_index (bool) – Use the index of the left DataFrame as the join key. 
- right_index (bool) – Use the index of the right DataFrame as the join key. 
- by (column name or list of column names) – Match on these columns before performing merge operation. 
- left_by (column name) – Field names to match on in the left DataFrame. 
- right_by (column name) – Field names to match on in the right DataFrame. 
- suffixes (2-length sequence (tuple, list, …)) – Suffix to apply to overlapping column names in the left and right side, respectively. 
- tolerance (int or Timedelta, optional, default None) – Select asof tolerance within this range; must be compatible with the merge index. 
- allow_exact_matches (bool, default True) – If True, allow matching with the same ‘on’ value (i.e. less-than-or-equal-to / greater-than-or-equal-to) If False, don’t match the same ‘on’ value (i.e., strictly less-than / strictly greater-than). 
- direction (‘backward’ (default), ‘forward’, or ‘nearest’) – Whether to search for prior, subsequent, or closest matches. 
 
- Return type:
- Snowpark pandas - DataFrame
 - Examples - >>> left = pd.DataFrame({"a": [1, 5, 10], "left_val": ["a", "b", "c"]}) >>> left a left_val 0 1 a 1 5 b 2 10 c >>> right = pd.DataFrame({"a": [1, 2, 3, 6, 7], "right_val": [1, 2, 3, 6, 7]}) >>> right a right_val 0 1 1 1 2 2 2 3 3 3 6 6 4 7 7 >>> pd.merge_asof(left, right, on="a") a left_val right_val 0 1 a 1 1 5 b 3 2 10 c 7 >>> pd.merge_asof(left, right, on="a", allow_exact_matches=False) a left_val right_val 0 1 a NaN 1 5 b 3.0 2 10 c 7.0 >>> pd.merge_asof(left, right, on="a", direction="forward") a left_val right_val 0 1 a 1.0 1 5 b 6.0 2 10 c NaN - Here is a real-world times-series example: - >>> quotes = pd.DataFrame( ... { ... "time": [ ... pd.Timestamp("2016-05-25 13:30:00.023"), ... pd.Timestamp("2016-05-25 13:30:00.023"), ... pd.Timestamp("2016-05-25 13:30:00.030"), ... pd.Timestamp("2016-05-25 13:30:00.041"), ... pd.Timestamp("2016-05-25 13:30:00.048"), ... pd.Timestamp("2016-05-25 13:30:00.049"), ... pd.Timestamp("2016-05-25 13:30:00.072"), ... pd.Timestamp("2016-05-25 13:30:00.075") ... ], ... "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01], ... "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03] ... } ... ) >>> quotes time bid ask 0 2016-05-25 13:30:00.023 720.50 720.93 1 2016-05-25 13:30:00.023 51.95 51.96 2 2016-05-25 13:30:00.030 51.97 51.98 3 2016-05-25 13:30:00.041 51.99 52.00 4 2016-05-25 13:30:00.048 720.50 720.93 5 2016-05-25 13:30:00.049 97.99 98.01 6 2016-05-25 13:30:00.072 720.50 720.88 7 2016-05-25 13:30:00.075 52.01 52.03 >>> trades = pd.DataFrame( ... { ... "time": [ ... pd.Timestamp("2016-05-25 13:30:00.023"), ... pd.Timestamp("2016-05-25 13:30:00.038"), ... pd.Timestamp("2016-05-25 13:30:00.048"), ... pd.Timestamp("2016-05-25 13:30:00.048"), ... pd.Timestamp("2016-05-25 13:30:00.048") ... ], ... "price": [51.95, 51.95, 720.77, 720.92, 98.0], ... "quantity": [75, 155, 100, 100, 100] ... } ... ) >>> trades time price quantity 0 2016-05-25 13:30:00.023 51.95 75 1 2016-05-25 13:30:00.038 51.95 155 2 2016-05-25 13:30:00.048 720.77 100 3 2016-05-25 13:30:00.048 720.92 100 4 2016-05-25 13:30:00.048 98.00 100 >>> pd.merge_asof(trades, quotes, on="time") time price quantity bid ask 0 2016-05-25 13:30:00.023 51.95 75 51.95 51.96 1 2016-05-25 13:30:00.038 51.95 155 51.97 51.98 2 2016-05-25 13:30:00.048 720.77 100 720.50 720.93 3 2016-05-25 13:30:00.048 720.92 100 720.50 720.93 4 2016-05-25 13:30:00.048 98.00 100 720.50 720.93