r - Group by aggregate dynamic column name matching -
is possible group_by using regex match on column names using dplyr?
library(dplyr) # dplyr_0.5.0; r version 3.3.2 (2016-10-31) # dummy data set.seed(1) df1 <- sample_n(iris, 20) %>% mutate(sepal.length = round(sepal.length), sepal.width = round(sepal.width)) group static version (looks/works fine, imagine if have 10-20 columns):
df1 %>% group_by(sepal.length, sepal.width) %>% summarise(mysum = sum(petal.length)) group dynamic - "ugly" version:
df1 %>% group_by_(.dots = colnames(df1)[ grepl("^sepal", colnames(df1))]) %>% summarise(mysum = sum(petal.length)) ideally, (doesn't work, starts_with returns indices):
df1 %>% group_by(starts_with("sepal")) %>% summarise(mysum = sum(petal.length)) error in eval(expr, envir, enclos) : wrong result size (0), expected 20 or 1
expected output:
# source: local data frame [6 x 3] # groups: sepal.length [?] # # sepal.length sepal.width mysum # <dbl> <dbl> <dbl> # 1 4 3 1.4 # 2 5 3 10.9 # 3 6 2 4.0 # 4 6 3 43.7 # 5 7 3 15.7 # 6 8 4 6.4 note: sounds duplicated post, kindly link relevant posts if any.
this feature implemented in future release, reference github issue #2619:
solution use group_by_at function:
df1 %>% group_by_at(vars(starts_with("sepal"))) %>% summarise(mysum = sum(petal.length)) edit: implemented in dplyr_0.7.1
Comments
Post a Comment