python - Pandas get date from datetime stamp -
i'm working pandas data frame 'date_time' column has values datetime stamps: 2014-02-21 17:16:42
i can call column using df['date_time'], , want search rows particular date. i've been trying along lines of
df[(df['date_time']=='2014-02-21')]
but don't know how search date datetime value. also, i'm not sure if it's relevant, when check type(df.date_time[0]) returns string, instead of datetime type object.
thanks lot.
it more efficient not use strings here (assuming these datetime64 - should be!), these have calculated before comparing... , string stuff slow.
in [11]: s = pd.series(pd.to_datetime(['2014-02-21 17:16:42', '2014-02-22 17:16:42'])) in [12]: s out[12]: 0 2014-02-21 17:16:42 1 2014-02-22 17:16:42 dtype: datetime64[ns]
you can either simple ordering check:
in [13]: (pd.timestamp('2014-02-21') < s) & (s < pd.timestamp('2014-02-22')) out[13]: 0 true 1 false dtype: bool in [14]: s.loc[(pd.timestamp('2014-02-21') < s) & (s < pd.timestamp('2014-02-22'))] out[14]: 0 2014-02-21 17:16:42 dtype: datetime64[ns]
however, it's faster use datetimeindex.normalize
(which gets timestamp @ midnight of each timestamp):
in [15]: pd.datetimeindex(s).normalize() out[15]: <class 'pandas.tseries.index.datetimeindex'> [2014-02-21, 2014-02-22] length: 2, freq: none, timezone: none in [16]: pd.datetimeindex(s).normalize() == pd.timestamp('2014-02-21') out[16]: array([ true, false], dtype=bool) in [17]: s.loc[pd.datetimeindex(s).normalize() == pd.timestamp('2014-02-21')] out[17]: 0 2014-02-21 17:16:42 dtype: datetime64[ns]
here's timing (s above):
in [21]: %timeit s.loc[s.str.startswith('2014-02-21')] 1000 loops, best of 3: 1.16 ms per loop in [22]: %timeit s.loc[(pd.timestamp('2014-02-21') < s) & (s < pd.timestamp('2014-02-22'))] 1000 loops, best of 3: 1.23 ms per loop in [23]: %timeit s.loc[pd.datetimeindex(s).normalize() == pd.timestamp('2014-02-21')] 1000 loops, best of 3: 405 µs per loop
with larger s results more telling:
in [31]: s = pd.series(pd.to_datetime(['2014-02-21 17:16:42', '2014-02-22 17:16:42'] * 1000)) in [32]: %timeit s.loc[s.str.startswith('2014-02-21')] 10 loops, best of 3: 105 ms per loop in [33]: %timeit s.loc[(pd.timestamp('2014-02-21') < s) & (s < pd.timestamp('2014-02-22'))] 1000 loops, best of 3: 1.3 ms per loop in [34]: %timeit s.loc[pd.datetimeindex(s).normalize() == pd.timestamp('2014-02-21')] 1000 loops, best of 3: 694 µs per loop
note: in example column df['date_time']
s, , doing df.loc[pd.datetimeindex(df['date_time']) == ...]
.
Comments
Post a Comment