python - pandas pivot tabe for heatmap -
i trying generate heatmap using seaborn, having small problem formatting of data.
currently, data in form:
name diag date 1 2006-12-01 1 1994-02-12 2 2001-07-23 b 2 1999-09-12 b 1 2016-10-12 c 3 2010-01-20 c 2 1998-08-20
i create heatmap (preferentially in python) showing name
on 1 axis against diag
- if occured. have tried pivot table using pd.pivot
, given error
valueerror: index contains duplicate entries, cannot reshape
this came from:
piv = df.pivot_table(index='name',columns='diag')
time irrelevant, show names
have had diag
, , diag
combos cluster together. need create new table or possible have? in cases name
not associated diag
edit: have since tried: piv = df.pivot_table(index='name',columns='diag', values='time', aggfunc='mean')
however time in datetime format, end with:
pandas.core.base.dataerror: no numeric types aggregate
you need pivot_table
aggregate function, because same index , column have multiple values , pivot
need unique values only:
print (df) name diag time 0 1 12 <-duplicates same a, 1 different value 1 1 13 <-duplicates same a, 1 different value 2 2 14 3 b 2 18 4 b 1 1 5 c 3 9 6 c 2 8 df = df.pivot_table(index='name',columns='diag', values='time', aggfunc='mean') print (df) diag 1 2 3 name 12.5 14.0 nan b 1.0 18.0 nan c nan 8.0 9.0
alternative solution:
df = df.groupby(['name','diag'])['time'].mean().unstack() print (df) diag 1 2 3 name 12.5 14.0 nan b 1.0 18.0 nan c nan 8.0 9.0
edit:
you can check duplicates duplicated
:
df = df.loc[df.duplicated(['name','diag'], keep=false), ['name','diag']] print (df) name diag 0 1 1 1
edit:
mean
of datetimes not easy - need convert dates nanoseconds
, mean , last convert datetimes. there problem - need replace nan
scalar, e.g. 0
converted 0
datetime - 1970-01-01
.
df.date = pd.to_datetime(df.date) df['dates_in_ns'] = pd.series(df.date.values.astype(np.int64), index=df.index) df = df.pivot_table(index='name', columns='diag', values='dates_in_ns', aggfunc='mean', fill_value=0) df = df.apply(pd.to_datetime) print (df) diag 1 2 3 name 2000-07-07 12:00:00 2001-07-23 1970-01-01 b 2016-10-12 00:00:00 1999-09-12 1970-01-01 c 1970-01-01 00:00:00 1998-08-20 2010-01-20
Comments
Post a Comment