merge - fill in missing values based on available data when merging datasets in R -
here question. want merge df1 , df2 datasets.
>df1 id sub time number base note 01 a01 100 20 20 y 01 a01 110 35 20 na 02 a02 100 15 15 y 02 a02 150 35 15 na 03 a04 120 10 10 y 03 a04 130 25 10 na 04 a05 90 19 19 y 04 a05 130 50 19 na .... >df2 sub time number a01 150 55 a04 200 60 a05 200 80 a02 200 55 ....
the merged dataset should this:
>merged id sub time number base note 01 a01 100 20 20 y 01 a01 110 35 20 na 01 a01 150 55 20 na 02 a02 100 15 15 y 02 a02 150 35 15 na 02 a02 200 55 15 na 03 a04 120 10 10 y 03 a04 130 25 10 na 03 a04 200 60 10 na 04 a05 90 19 19 y 04 a05 130 50 19 na 04 a05 200 80 19 na
if have solutions in r, please let me know. thanks!
you can rbind
2 after adding appropriate columns df2
. then, after sorting, na.locf
package zoo
carry last observation forward.
i cheated here adding new level df1$note
represent original na
values in column, na.locf
carry them forward.
these values can replaced na
in result, , factor level dropped.
require(zoo) levels(df1$note) <- c(levels(df1$note), 'na') df1$note[is.na(df1$note)] <- 'na' df2$id <- na ; df2$base <- na; df2$note <- na df <- rbind(df1, df2) df <- df[order(df$sub, df$time, df$number),] as.data.frame(lapply(df, na.locf)) ## id sub time number base note ## 1 1 a01 100 20 20 y ## 2 1 a01 110 35 20 na ## 3 1 a01 150 55 20 na ## 4 2 a02 100 15 15 y ## 5 2 a02 150 35 15 na ## 6 2 a02 200 55 15 na ## 7 3 a04 120 10 10 y ## 8 3 a04 130 25 10 na ## 9 3 a04 200 60 10 na ## 10 4 a05 90 19 19 y ## 11 4 a05 130 50 19 na ## 12 4 a05 200 80 19 na
Comments
Post a Comment