python - Date parsing and timezone adjusting in pandas dataframes -


i have 800,000 rows of data in dataframe, , 1 column of data df['date'] string of time , date 'yyyy-mm-dd hh:mm:ss.fff', doesn't have timezone information. know in new_york timezone , need convert cet. have 2 methods job done:

method 1 (very slow sure):

df['date'].apply(lambda x: timezone('america/new_york')\             .localize(datetime.datetime.strptime(x,'%y%m%d%h:%m:%s.%f'))\             .astimezone(timezone('cet'))) 

method 2 :

df.index = pd.to_datetime(df['date'],format='%y%m%d%h:%m:%s.%f') df.index.tz_localize('america/new_york').tz_convert('cet') 

i wondering if there other better ways it? or potential pitfalls of methods listed? thanks!

also, shift timestamp fix amount of time, such 1ms timedelta(0,0,1000), how can implement using method 2?

method 2 definately best way of doing this.

however, occurs me formatting date after have loaded data.

it faster parse dates on load of file, change them after have loaded it. (not mention cleaner)

if data loaded csv file using pandas.read_csv() function instance, can use parse_dates= option , date_parser= option.

you can try out directly lambda function date_parser= , set parse_dates= list of date columns.

like this:

pd.read_csv('myfile.csv', parse_dates=['date'] date_parser=lambda x: timezone('america/new_york')\         .localize(datetime.datetime.strptime(x,'%y%m%d%h:%m:%s.%f'))\         .astimezone(timezone('cet'))) 

should work , fastest.


Comments

Popular posts from this blog

apache - PHP Soap issue while content length is larger -

asynchronous - Python asyncio task got bad yield -

javascript - Complete OpenIDConnect auth when requesting via Ajax -