duplicates - match columns and keep all duplicated elements in a data frame column [R] -
i have 2 data frames; df1 has 3 columns , df2 has 1 column.df1 has elements contained in df2 of them duplicated shown below.
df1= ***freetext***, ***specific***, ***icdcode*** jaundice,hepatitisa,b,c hepatitis b15 jaundice,hepatitisa,b,c hepatitis b b16 jaundice,hepatitisa,b,c hepatitis c b17.1 jaundice,hepatitisa,b,c jaundice r17 lobar pneumonia lobar pneumonia j18.1 lobar pneumonia ,scabies lobar pneumonia j18.1 scabiess scabies g10 df2= jaundice,hepatitisa,b,c scabiess lobar pneumonia ,scabies lobar pneumonia
i wish have match between 2 data frames such whenever match occurs there should resultant data frame taking form of df1.for example jaundice,hepatitisa,b,c should appear 4 times instead of appearing once in column. in other words duplicates should maintained shown below ;
resultant data frame should appear this. column1 column2 column3 jaundice,hepatitisa,b,c hepatitis b15 jaundice,hepatitisa,b,c hepatitis b b16 jaundice,hepatitisa,b,c hepatitis c b17.1 jaundice,hepatitisa,b,c jaundice r17
so,how supposed loop through df2 find match in df1(first column) , produce data frame of matches other corresponding rows shown above?
here script doesn't seem produce desired results
newmatches<- data.frame() for(i 1:nrow(df1){ for(j in 1:nrow(df2[,1]{grep(j, i, ignore.case=f, value=t)->newmatches}} #it doesn't produce other columns of df1
any , or suggestion may appreciated.am novice in r
as far understand, want filter rows of df1, keeping ones first column exists in df2. right? easiest way achieve be
df1[df1[, 1] %in% df2[, 1], ]
edit
here full code reproduce example:
df1 <- structure(list( freetext = structure(c(1l, 1l, 1l, 1l, 2l, 3l, 4l), .label = c("jaundice,hepatitisa,b,c", "lobar pneumonia", "lobar pneumonia ,scabies", "scabiess"), class = "factor"), specific = structure(c(1l, 2l, 3l, 4l, 5l, 5l, 6l), .label = c("hepatitis a", "hepatitis b", "hepatitis c", "jaundice", "lobar pneumonia", "scabies"), class = "factor"), icdcode = structure(c(1l, 2l, 3l, 6l, 5l, 5l, 4l), .label = c("b15", "b16", "b17.1", "g10", "j18.1", "r17"), class = "factor")), .names = c("freetext", "specific", "icdcode"), row.names = c(na, -7l), class = "data.frame") df2 <- structure(list( freetext = structure(c(1l, 4l, 3l, 2l), .label = c("jaundice,hepatitisa,b,c", "lobar pneumonia", "lobar pneumonia ,scabies", "scabiess"), class = "factor")), .names = "freetext", row.names = c(na, -4l), class = "data.frame") result <- df1[df1[, 1] %in% df2[, 1], ]
printing result
gives following output
freetext specific icdcode 1 jaundice,hepatitisa,b,c hepatitis b15 2 jaundice,hepatitisa,b,c hepatitis b b16 3 jaundice,hepatitisa,b,c hepatitis c b17.1 4 jaundice,hepatitisa,b,c jaundice r17 5 lobar pneumonia lobar pneumonia j18.1 6 lobar pneumonia ,scabies lobar pneumonia j18.1 7 scabiess scabies g10
Comments
Post a Comment