r - Is there a %in% operator across multiple columns -


imagine have 2 data frames

df1 <- data.frame(v1 = c(1, 2, 3), v2 = c("a", "b", "c")) df2 <- data.frame(v1 = c(1, 2, 2), v2 = c("b", "b", "c")) 

here's like, side side:

> cbind(df1, df2)   v1 v2 v1 v2 1  1   1  b 2  2  b  2  b 3  3  c  2  c 

you want know observations duplicates, across all variables.

this can done pasting cols , using %in%:

df1vec <- apply(df1, 1, paste, collapse= "") df2vec <- apply(df2, 1, paste, collapse= "") df2vec %in% df1vec [1] false  true false 

the second observation 1 in df2 , in df1.

is there no faster way of generating output - %in%, %in% across multiple variables, or should content apply(paste) solution?

i go with

interaction(df2) %in% interaction(df1) # [1] false  true false 

you can wrap in binary operator:

"%in%" <- function(x, y) interaction(x) %in% interaction(y) 

then

df2 %in% df1 # [1] false  true false  rbind(df2, df2) %in% df1 # [1] false  true false false  true false 

disclaimer: have modified answer previous 1 using do.call(paste, ...) instead of interaction(...). consult history if like. think arun's claims "terrible inefficiency" (a bit extreme imho) still hold if concise solution uses base r , fast-ish small-ish data that's it.


Comments

Popular posts from this blog

commonjs - How to write a typescript definition file for a node module that exports a function? -

openid - Okta: Failed to get authorization code through API call -

thorough guide for profiling racket code -