modin.pandas.to_datetime

modin.pandas.to_datetime(arg: DatetimeScalarOrArrayConvertible | DictConvertible | pd.DataFrame | Series, errors: DateTimeErrorChoices = 'raise', dayfirst: bool = False, yearfirst: bool = False, utc: bool = False, format: str | None = None, exact: bool | lib.NoDefault = _NoDefault.no_default, unit: str | None = None, infer_datetime_format: lib.NoDefault | bool = _NoDefault.no_default, origin: Any = 'unix', cache: bool = True) pd.DatetimeIndex | Series | DatetimeScalar | NaTType | None[source] (https://github.com/snowflakedb/snowpark-python/blob/v1.26.0/snowpark-python/src/snowflake/snowpark/modin/plugin/extensions/general_overrides.py#L1038-L1098)

Convert argument to datetime.

This function converts a scalar, array-like, Series or DataFrame/dict-like to a pandas datetime object.

Parameters:
  • arg (int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like) – The object to convert to a datetime. If a DataFrame is provided, the method expects minimally the following columns: "year", "month", "day".

  • errors ({'ignore', 'raise', 'coerce'}, default 'raise') –

    • If 'raise', then invalid parsing will raise an exception.

    • If 'coerce', then invalid parsing will be set as NaT.

    • If 'ignore', then invalid parsing will return the input.

  • dayfirst (bool, default False) –

    Specify a date parse order if arg is str or is list-like. If True, parses dates with the day first, e.g. "10/11/12" is parsed as 2012-11-10.

    Warning

    dayfirst=True is not strict, but will prefer to parse with day first. If a delimited date string cannot be parsed in accordance with the given dayfirst option, e.g. to_datetime(['31-12-2021']), then a warning will be shown.

  • yearfirst (bool, default False) –

    Specify a date parse order if arg is str or is list-like.

    • If True parses dates with the year first, e.g. "10/11/12" is parsed as 2010-11-12.

    • If both dayfirst and yearfirst are True, yearfirst is preceded (same as dateutil).

    Warning

    yearfirst=True is not strict, but will prefer to parse with year first.

  • utc (bool, default None) –

    Control timezone-related parsing, localization and conversion.

    • If True, the function always returns a timezone-aware UTC-localized Timestamp, Series or DatetimeIndex. To do this, timezone-naive inputs are localized as UTC, while timezone-aware inputs are converted to UTC.

    • If False (default), inputs will not be coerced to UTC. Timezone-naive inputs will remain naive, while timezone-aware ones will keep their time offsets. Limitations exist for mixed offsets (typically, daylight savings), see Examples section for details.

    See also: pandas general documentation about timezone conversion and localization.

  • format (str, default None) – The strftime to parse time, e.g. "%d/%m/%Y". Note that "%f" will parse all the way up to nanoseconds. See strftime documentation (https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior) for more information on choices.

  • exact (bool, default True) –

    Control how format is used:

    • If True, require an exact format match.

    • If False, allow the format to match anywhere in the target string.

  • unit (str, default 'ns') – The unit of the arg (D,s,ms,us,ns) denote the unit, which is an integer or float number. This will be based off the origin. Example, with unit='ms' and origin='unix', this would calculate the number of milliseconds to the unix epoch start.

  • infer_datetime_format (bool, default False) – If True and no format is given, attempt to infer the format of the datetime strings based on the first non-NaN element, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by ~5-10x.

  • origin (scalar, default 'unix') –

    Define the reference date. The numeric values would be parsed as number of units (defined by unit) since this reference date.

    • If 'unix' (or POSIX) time; origin is set to 1970-01-01.

    • If 'julian', unit must be 'D', and origin is set to beginning of Julian Calendar. Julian day number 0 is assigned to the day starting at noon on January 1, 4713 BC.

    • If Timestamp convertible, origin is set to Timestamp identified by origin.

  • cache (bool, default True) – cache parameter is ignored with Snowflake backend, i.e., no caching will be applied

Returns:

If parsing succeeded. Return type depends on input (types in parenthesis correspond to fallback in case of unsuccessful timezone or out-of-range timestamp parsing):

  • scalar: Timestamp (or datetime.datetime)

  • array-like: DatetimeIndex (or :class: Series of object dtype containing datetime.datetime)

  • Series: Series of datetime64 dtype (or :class: Series of object dtype containing datetime.datetime)

  • DataFrame: Series of datetime64 dtype (or Series of object dtype containing datetime.datetime)

Return type:

datetime

Raises:
  • ParserError – When parsing a date from string fails.

  • ValueError – When another datetime conversion error happens. For example when one of ‘year’, ‘month’, day’ columns is missing in a DataFrame, or when a Timezone-aware datetime.datetime is found in an array-like of mixed time offsets, and utc=False.

See also

DataFrame.astype

Cast argument to a specified dtype.

to_timedelta

Convert argument to timedelta.

convert_dtypes

Convert dtypes.

Notes

Many input types are supported, and lead to different output types:

  • scalars can be int, float, str, datetime object (from stdlib datetime module or numpy). They are converted to Timestamp when possible, otherwise they are converted to datetime.datetime. None/NaN/null scalars are converted to NaT.

  • array-like can contain int, float, str, datetime objects. They are converted to DatetimeIndex when possible, otherwise they are converted to Index with object dtype, containing datetime.datetime. None/NaN/null entries are converted to NaT in both cases.

  • Series are converted to Series with datetime64 dtype when possible, otherwise they are converted to Series with object dtype, containing datetime.datetime. None/NaN/null entries are converted to NaT in both cases.

  • DataFrame/dict-like are converted to Series with datetime64 dtype. For each row a datetime is created from assembling the various dataframe columns. Column keys can be common abbreviations like [‘year’, ‘month’, ‘day’, ‘minute’, ‘second’, ‘ms’, ‘us’, ‘ns’]) or plurals of the same.

The following causes are responsible for datetime.datetime objects being returned (possibly inside an Index or a Series with object dtype) instead of a proper pandas designated type (Timestamp or Series with datetime64 dtype):

  • when any input element is before Timestamp.min or after Timestamp.max, see timestamp limitations (https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-timestamp-limits).

  • when utc=False (default) and the input is an array-like or Series containing mixed naive/aware datetime, or aware with mixed time offsets. Note that this happens in the (quite frequent) situation when the timezone has a daylight savings policy. In that case you may wish to use utc=True.

Examples

Handling various input formats

Assembling a datetime from multiple columns of a DataFrame. The keys can be common abbreviations like [‘year’, ‘month’, ‘day’, ‘minute’, ‘second’, ‘ms’, ‘us’, ‘ns’]) or plurals of the same

>>> df = pd.DataFrame({'year': [2015, 2016],
...                    'month': [2, 3],
...                    'day': [4, 5]})
>>> pd.to_datetime(df)
0   2015-02-04
1   2016-03-05
dtype: datetime64[ns]
Copy

Passing infer_datetime_format=True can often-times speedup a parsing if it’s not an ISO8601 format exactly, but in a regular format.

>>> s = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000'] * 1000)
>>> s.head()
0    3/11/2000
1    3/12/2000
2    3/13/2000
3    3/11/2000
4    3/12/2000
dtype: object
Copy

Using a unix epoch time

>>> pd.to_datetime(1490195805, unit='s')
Timestamp('2017-03-22 15:16:45')
>>> pd.to_datetime(1490195805433502912, unit='ns')
Timestamp('2017-03-22 15:16:45.433502912')
Copy

Warning

For float arg, precision rounding might happen. To prevent unexpected behavior use a fixed-width exact type.

Using a non-unix epoch origin

>>> pd.to_datetime([1, 2, 3], unit='D',
...                origin=pd.Timestamp('1960-01-01'))
DatetimeIndex(['1960-01-02', '1960-01-03', '1960-01-04'], dtype='datetime64[ns]', freq=None)
Copy

Non-convertible date/times

If a date does not meet the timestamp limitations (https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-timestamp-limits), passing errors='ignore' will return the original input instead of raising any exception.

Passing errors='coerce' will force an out-of-bounds date to NaT, in addition to forcing non-dates (or non-parseable dates) to NaT.

>>> pd.to_datetime(['13000101', 'abc'], format='%Y%m%d', errors='coerce')
DatetimeIndex(['NaT', 'NaT'], dtype='datetime64[ns]', freq=None)
Copy

Timezones and time offsets

The default behaviour (utc=False) is as follows:

  • Timezone-naive inputs are kept as timezone-naive DatetimeIndex:

>>> pd.to_datetime(['2018-10-26 12:00:00', '2018-10-26 13:00:15'])
DatetimeIndex(['2018-10-26 12:00:00', '2018-10-26 13:00:15'], dtype='datetime64[ns]', freq=None)
Copy
>>> pd.to_datetime(['2018-10-26 12:00:00 -0500', '2018-10-26 13:00:00 -0500'])
DatetimeIndex(['2018-10-26 12:00:00-05:00', '2018-10-26 13:00:00-05:00'], dtype='datetime64[ns, UTC-05:00]', freq=None)
Copy
  • Use right format to convert to timezone-aware type (Note that when call Snowpark pandas API to_pandas() the timezone-aware output will always be converted to session timezone):

>>> pd.to_datetime(['2018-10-26 12:00:00 -0500', '2018-10-26 13:00:00 -0500'], format="%Y-%m-%d %H:%M:%S %z")
DatetimeIndex(['2018-10-26 12:00:00-05:00', '2018-10-26 13:00:00-05:00'], dtype='datetime64[ns, UTC-05:00]', freq=None)
Copy
  • Timezone-aware inputs with mixed time offsets (for example issued from a timezone with daylight savings, such as Europe/Paris):

>>> pd.to_datetime(['2020-10-25 02:00:00 +0200', '2020-10-25 04:00:00 +0100'])
DatetimeIndex([2020-10-25 02:00:00+02:00, 2020-10-25 04:00:00+01:00], dtype='object', freq=None)
Copy
>>> pd.to_datetime(['2020-10-25 02:00:00 +0200', '2020-10-25 04:00:00 +0100'], format="%Y-%m-%d %H:%M:%S %z")
DatetimeIndex([2020-10-25 02:00:00+02:00, 2020-10-25 04:00:00+01:00], dtype='object', freq=None)
Copy

Setting utc=True makes sure always convert to timezone-aware outputs:

  • Timezone-naive inputs are localized based on the session timezone

>>> pd.to_datetime(['2018-10-26 12:00', '2018-10-26 13:00'], utc=True)
DatetimeIndex(['2018-10-26 12:00:00+00:00', '2018-10-26 13:00:00+00:00'], dtype='datetime64[ns, UTC]', freq=None)
Copy
  • Timezone-aware inputs are converted to session timezone

>>> pd.to_datetime(['2018-10-26 12:00:00 -0530', '2018-10-26 12:00:00 -0500'],
...                utc=True)
DatetimeIndex(['2018-10-26 17:30:00+00:00', '2018-10-26 17:00:00+00:00'], dtype='datetime64[ns, UTC]', freq=None)
Copy
Language: English