How to assign identical unique IDs to matching observations between two dataframes in r? -



have practical question when have 2 (or more) data frames , want assign unique ids each matching observation within each , across both datasets e.g.:

#1. create dataframe df1:  a1 <- c(1, 1, 1, 1, 2, 2, 2, 2, 1, 1) b1 <- c(1, 5, 3, 2, 3, 4, 5, 1, 5, 2) c1 <- c("white", "red", "black", "white", "red",          "white", "black", "silver", "red", "green") df1 <- data.frame(a1, b1, c1) df1     a1 b1     c1 1   1  1  white 2   1  5    red 3   1  3  black 4   1  2  white 5   2  3    red 6   2  4  white 7   2  5  black 8   2  1 silver 9   1  5    red 10  1  2  green  #2. create dataframe df2:  a2 <- c(2, 2, 1, 1, 2, 2, 2, 2, 2, 2) b2 <- c(3, 1, 3, 2, 1, 3, 4, 5, 3, 5) c2 <- c("black", "blue", "black", "white", "silver",          "green", "green", "red", "blue", "white") df2 <- data.frame(a2, b2, c2) df2     a2 b2     c2 1   2  3  black 2   2  1   blue 3   1  3  black 4   1  2  white 5   2  1 silver 6   2  3  green 7   2  4  green 8   2  5    red 9   2  3   blue 10  2  5  white  #3. assign unique ids each observation in df1:  library(data.table) df1.2 <- data.table(df1, key="a1,b1,c1")  df1.2[, id:=.grp, by=key(df1.2)] df1.2 <- as.data.frame(df1.2) df1.2     a1 b1     c1 id 1   1  1  white  1 2   1  2  green  2 3   1  2  white  3 4   1  3  black  4 5   1  5    red  5 6   1  5    red  5 7   2  1 silver  6 8   2  3    red  7 9   2  4  white  8 10  2  5  black  9  #4. problematic part!! assign identical unique ids matching observations of df2 compared df1.2  #and assign other unique ids other non-matching obs of df2.  #name resulting dataframe df2.2  #my expected result ideally follows:  df2.2     a2 b2     c2 id 1   2  3  black 10  2   2  1   blue 11 3   1  3  black  4 4   1  2  white  3 5   2  1 silver  6 6   2  3  green 12 7   2  4  green 13 8   2  5    red 14 9   2  3   blue 15 10  2  5  white 16 

any on how df2.2 appreciated. thanks.

an easy way approach make hash:

library(dplyr) library(digest)  df1 %>%   rowwise() %>%   do( data.frame(., id=digest( paste(.$a1,.$b1,.$c1), algo="md5"),                    stringsasfactors=false)) %>% ungroup()  df2 %>%   rowwise() %>%   do( data.frame(., id=digest( paste(.$a2,.$b2,.$c2), algo="md5"),                stringsasfactors=false)) %>% ungroup() 

which produce following df1:

   a1 b1     c1                               id 1   1  1  white b86fbb78b27f7db2ee50af2d68cce452 2   1  5    red 68d47f544832989834517630e4a2764c 3   1  3  black 724e37192140cb2009cf3d982f2be1e4 4   1  2  white f731b8b38255b8c312543283f8e1c634 5   2  3    red 2d50b86902056a51faad04d2c566faf2 6   2  4  white 9396667cd51d1e1b61b0b22a7767d3d9 7   2  5  black 9ba1f3e04c61c006d3c5382fcad098e6 8   2  1 silver 38dcd29d200c8b33cd38ac78ef9dd751 9   1  5    red 68d47f544832989834517630e4a2764c 10  1  2  green 7d9b1aadfd79de142b234b83d7867b9b 

and following df2:

   a2 b2     c2                               id 1   2  3  black d285febc8ab08e99b11609b98f077e66 2   2  1   blue bfa0405276406ac4bc596daf957dfa11 3   1  3  black 724e37192140cb2009cf3d982f2be1e4 4   1  2  white f731b8b38255b8c312543283f8e1c634 5   2  1 silver 38dcd29d200c8b33cd38ac78ef9dd751 6   2  3  green 67eefe9ee2d82486ded30a268289296b 7   2  4  green d773f58cf144eab15ef459e326494a2f 8   2  5    red 0724318a9f59d3960edfe4e90f9c4eff 9   2  3   blue 6883420cc137ba45b773f642176e9ce6 10  2  5  white 5dea9e63b5fbfb31fb81260cb5a5f41c 

Comments

Popular posts from this blog

apache - PHP Soap issue while content length is larger -

asynchronous - Python asyncio task got bad yield -

javascript - Complete OpenIDConnect auth when requesting via Ajax -