python - Pandas get date from datetime stamp -


i'm working pandas data frame 'date_time' column has values datetime stamps: 2014-02-21 17:16:42

i can call column using df['date_time'], , want search rows particular date. i've been trying along lines of

df[(df['date_time']=='2014-02-21')] 

but don't know how search date datetime value. also, i'm not sure if it's relevant, when check type(df.date_time[0]) returns string, instead of datetime type object.

thanks lot.

it more efficient not use strings here (assuming these datetime64 - should be!), these have calculated before comparing... , string stuff slow.

in [11]: s = pd.series(pd.to_datetime(['2014-02-21 17:16:42', '2014-02-22 17:16:42']))  in [12]: s out[12]: 0   2014-02-21 17:16:42 1   2014-02-22 17:16:42 dtype: datetime64[ns] 

you can either simple ordering check:

in [13]: (pd.timestamp('2014-02-21') < s) & (s < pd.timestamp('2014-02-22')) out[13]: 0     true 1    false dtype: bool  in [14]: s.loc[(pd.timestamp('2014-02-21') < s) & (s < pd.timestamp('2014-02-22'))] out[14]: 0   2014-02-21 17:16:42 dtype: datetime64[ns] 

however, it's faster use datetimeindex.normalize (which gets timestamp @ midnight of each timestamp):

in [15]: pd.datetimeindex(s).normalize() out[15]: <class 'pandas.tseries.index.datetimeindex'> [2014-02-21, 2014-02-22] length: 2, freq: none, timezone: none  in [16]: pd.datetimeindex(s).normalize() == pd.timestamp('2014-02-21') out[16]: array([ true, false], dtype=bool)  in [17]: s.loc[pd.datetimeindex(s).normalize() == pd.timestamp('2014-02-21')] out[17]: 0   2014-02-21 17:16:42 dtype: datetime64[ns] 

here's timing (s above):

in [21]: %timeit s.loc[s.str.startswith('2014-02-21')] 1000 loops, best of 3: 1.16 ms per loop  in [22]: %timeit s.loc[(pd.timestamp('2014-02-21') < s) & (s < pd.timestamp('2014-02-22'))] 1000 loops, best of 3: 1.23 ms per loop  in [23]: %timeit s.loc[pd.datetimeindex(s).normalize() == pd.timestamp('2014-02-21')] 1000 loops, best of 3: 405 µs per loop 

with larger s results more telling:

in [31]: s = pd.series(pd.to_datetime(['2014-02-21 17:16:42', '2014-02-22 17:16:42'] * 1000))  in [32]: %timeit s.loc[s.str.startswith('2014-02-21')] 10 loops, best of 3: 105 ms per loop  in [33]: %timeit s.loc[(pd.timestamp('2014-02-21') < s) & (s < pd.timestamp('2014-02-22'))] 1000 loops, best of 3: 1.3 ms per loop  in [34]: %timeit s.loc[pd.datetimeindex(s).normalize() == pd.timestamp('2014-02-21')] 1000 loops, best of 3: 694 µs per loop 

note: in example column df['date_time'] s, , doing df.loc[pd.datetimeindex(df['date_time']) == ...].


Comments

Popular posts from this blog

commonjs - How to write a typescript definition file for a node module that exports a function? -

openid - Okta: Failed to get authorization code through API call -

thorough guide for profiling racket code -