python - pandas pivot tabe for heatmap -


i trying generate heatmap using seaborn, having small problem formatting of data.

currently, data in form:

name     diag   date        1       2006-12-01        1       1994-02-12        2       2001-07-23 b        2       1999-09-12 b        1       2016-10-12 c        3       2010-01-20 c        2       1998-08-20 

i create heatmap (preferentially in python) showing name on 1 axis against diag - if occured. have tried pivot table using pd.pivot, given error

valueerror: index contains duplicate entries, cannot reshape

this came from:

piv = df.pivot_table(index='name',columns='diag')

time irrelevant, show names have had diag, , diag combos cluster together. need create new table or possible have? in cases name not associated diag

edit: have since tried: piv = df.pivot_table(index='name',columns='diag', values='time', aggfunc='mean')

however time in datetime format, end with:
pandas.core.base.dataerror: no numeric types aggregate

you need pivot_table aggregate function, because same index , column have multiple values , pivot need unique values only:

print (df)   name  diag  time 0        1    12 <-duplicates same a, 1 different value 1        1    13 <-duplicates same a, 1 different value 2        2    14 3    b     2    18 4    b     1     1 5    c     3     9 6    c     2     8  df = df.pivot_table(index='name',columns='diag', values='time', aggfunc='mean') print (df) diag     1     2    3 name                      12.5  14.0  nan b      1.0  18.0  nan c      nan   8.0  9.0 

alternative solution:

df = df.groupby(['name','diag'])['time'].mean().unstack() print (df) diag     1     2    3 name                      12.5  14.0  nan b      1.0  18.0  nan c      nan   8.0  9.0 

edit:

you can check duplicates duplicated:

df = df.loc[df.duplicated(['name','diag'], keep=false), ['name','diag']] print (df)   name  diag 0        1 1        1 

edit:

mean of datetimes not easy - need convert dates nanoseconds, mean , last convert datetimes. there problem - need replace nan scalar, e.g. 0 converted 0 datetime - 1970-01-01.

df.date = pd.to_datetime(df.date) df['dates_in_ns'] = pd.series(df.date.values.astype(np.int64), index=df.index) df = df.pivot_table(index='name',                     columns='diag',                      values='dates_in_ns',                      aggfunc='mean',                      fill_value=0) df = df.apply(pd.to_datetime) print (df) diag                   1          2          3 name                                              2000-07-07 12:00:00 2001-07-23 1970-01-01 b    2016-10-12 00:00:00 1999-09-12 1970-01-01 c    1970-01-01 00:00:00 1998-08-20 2010-01-20 

Comments

Popular posts from this blog

ios - Change Storyboard View using Seague -

commonjs - How to write a typescript definition file for a node module that exports a function? -

openid - Okta: Failed to get authorization code through API call -