How to assign identical unique IDs to matching observations between two dataframes in r? -
have practical question when have 2 (or more) data frames , want assign unique ids each matching observation within each , across both datasets e.g.:
#1. create dataframe df1: a1 <- c(1, 1, 1, 1, 2, 2, 2, 2, 1, 1) b1 <- c(1, 5, 3, 2, 3, 4, 5, 1, 5, 2) c1 <- c("white", "red", "black", "white", "red", "white", "black", "silver", "red", "green") df1 <- data.frame(a1, b1, c1) df1 a1 b1 c1 1 1 1 white 2 1 5 red 3 1 3 black 4 1 2 white 5 2 3 red 6 2 4 white 7 2 5 black 8 2 1 silver 9 1 5 red 10 1 2 green #2. create dataframe df2: a2 <- c(2, 2, 1, 1, 2, 2, 2, 2, 2, 2) b2 <- c(3, 1, 3, 2, 1, 3, 4, 5, 3, 5) c2 <- c("black", "blue", "black", "white", "silver", "green", "green", "red", "blue", "white") df2 <- data.frame(a2, b2, c2) df2 a2 b2 c2 1 2 3 black 2 2 1 blue 3 1 3 black 4 1 2 white 5 2 1 silver 6 2 3 green 7 2 4 green 8 2 5 red 9 2 3 blue 10 2 5 white #3. assign unique ids each observation in df1: library(data.table) df1.2 <- data.table(df1, key="a1,b1,c1") df1.2[, id:=.grp, by=key(df1.2)] df1.2 <- as.data.frame(df1.2) df1.2 a1 b1 c1 id 1 1 1 white 1 2 1 2 green 2 3 1 2 white 3 4 1 3 black 4 5 1 5 red 5 6 1 5 red 5 7 2 1 silver 6 8 2 3 red 7 9 2 4 white 8 10 2 5 black 9 #4. problematic part!! assign identical unique ids matching observations of df2 compared df1.2 #and assign other unique ids other non-matching obs of df2. #name resulting dataframe df2.2 #my expected result ideally follows: df2.2 a2 b2 c2 id 1 2 3 black 10 2 2 1 blue 11 3 1 3 black 4 4 1 2 white 3 5 2 1 silver 6 6 2 3 green 12 7 2 4 green 13 8 2 5 red 14 9 2 3 blue 15 10 2 5 white 16
any on how df2.2 appreciated. thanks.
an easy way approach make hash:
library(dplyr) library(digest) df1 %>% rowwise() %>% do( data.frame(., id=digest( paste(.$a1,.$b1,.$c1), algo="md5"), stringsasfactors=false)) %>% ungroup() df2 %>% rowwise() %>% do( data.frame(., id=digest( paste(.$a2,.$b2,.$c2), algo="md5"), stringsasfactors=false)) %>% ungroup()
which produce following df1
:
a1 b1 c1 id 1 1 1 white b86fbb78b27f7db2ee50af2d68cce452 2 1 5 red 68d47f544832989834517630e4a2764c 3 1 3 black 724e37192140cb2009cf3d982f2be1e4 4 1 2 white f731b8b38255b8c312543283f8e1c634 5 2 3 red 2d50b86902056a51faad04d2c566faf2 6 2 4 white 9396667cd51d1e1b61b0b22a7767d3d9 7 2 5 black 9ba1f3e04c61c006d3c5382fcad098e6 8 2 1 silver 38dcd29d200c8b33cd38ac78ef9dd751 9 1 5 red 68d47f544832989834517630e4a2764c 10 1 2 green 7d9b1aadfd79de142b234b83d7867b9b
and following df2
:
a2 b2 c2 id 1 2 3 black d285febc8ab08e99b11609b98f077e66 2 2 1 blue bfa0405276406ac4bc596daf957dfa11 3 1 3 black 724e37192140cb2009cf3d982f2be1e4 4 1 2 white f731b8b38255b8c312543283f8e1c634 5 2 1 silver 38dcd29d200c8b33cd38ac78ef9dd751 6 2 3 green 67eefe9ee2d82486ded30a268289296b 7 2 4 green d773f58cf144eab15ef459e326494a2f 8 2 5 red 0724318a9f59d3960edfe4e90f9c4eff 9 2 3 blue 6883420cc137ba45b773f642176e9ce6 10 2 5 white 5dea9e63b5fbfb31fb81260cb5a5f41c
Comments
Post a Comment