r - Is there a %in% operator across multiple columns -
imagine have 2 data frames
df1 <- data.frame(v1 = c(1, 2, 3), v2 = c("a", "b", "c")) df2 <- data.frame(v1 = c(1, 2, 2), v2 = c("b", "b", "c"))
here's like, side side:
> cbind(df1, df2) v1 v2 v1 v2 1 1 1 b 2 2 b 2 b 3 3 c 2 c
you want know observations duplicates, across all variables.
this can done pasting cols , using %in%:
df1vec <- apply(df1, 1, paste, collapse= "") df2vec <- apply(df2, 1, paste, collapse= "") df2vec %in% df1vec [1] false true false
the second observation 1 in df2 , in df1.
is there no faster way of generating output - %in%, %in% across multiple variables, or should content apply(paste) solution?
i go with
interaction(df2) %in% interaction(df1) # [1] false true false
you can wrap in binary operator:
"%in%" <- function(x, y) interaction(x) %in% interaction(y)
then
df2 %in% df1 # [1] false true false rbind(df2, df2) %in% df1 # [1] false true false false true false
disclaimer: have modified answer previous 1 using do.call(paste, ...)
instead of interaction(...)
. consult history if like. think arun's claims "terrible inefficiency" (a bit extreme imho) still hold if concise solution uses base r , fast-ish small-ish data that's it.
Comments
Post a Comment