Read excel with two-line headers in R -
i have excel file simple structure on picture below.
b b c b c 0.42 0.42 0.42 0.42 0.42 0.42 0.42 0.42 0.42 0.42 0.42 0.42
how can data.frame in r shown below?
col_name b c 0.42 0.42 0.42 0.42 0.42 0.42 b 0.42 0.42 0.42 b 0.42 0.42 0.42
an important addition. sample structure - part of whole table.
whole excel looks like:
as long excel file small, approach this.
library(readxl) library(zoo) library(tidyr) library(dplyr) library(reshape2) cols <- read_excel("path_to_file.xlsx", col_names = false)[1:2, ] cols[1, ] <- na.locf(as.character(cols[1, ]), na.rm = false) cols <- vapply(x = cols, fun = function(x) if (is.na(x[1])) x[2] else paste0(x, collapse = "_"), fun.value = character(1)) dframe <- read_excel("path_to_file.xlsx", col_names = false, skip = 2) %>% setnames(cols) %>% mutate(row_id = row_number()) dframe dframe %>% select(row_id, a_a:b_c) %>% gather(column, value, a_a:b_c) %>% mutate(ab_group = sub("(^[a-b])_.+$", "\\1", column), column = sub("^[a-b]_", "", column)) %>% dcast(row_id + ab_group ~ column, value.var = "value")
the downside that, since read_excel
doesn't have nrows
argument, code reads file in twice. there other packages can read excel files, such gdata::read.xls
let read in top 2 rows column names. think package has little more overhead.
regardless, when face situation, either import column names separately , manipulate them want before assigning them data, or don't read in names @ , manually reset of variable names using setnames
.
Comments
Post a Comment