How to aggregate in R with a custom function that uses two columns -
is possible aggregate custom function uses 2 columns return 1 column?
say have dataframe:
x <- c(2,4,3,1,5,7) y <- c(3,2,6,3,4,6) group <- c("a","a","a","a","b","b") data <- data.frame(group, x, y) data # group x y # 1 2 3 # 2 4 2 # 3 3 6 # 4 1 3 # 5 b 5 4 # 6 b 7 6
and have function want use on 2 columns (x , y):
pathlength <- function(xy) { out <- as.matrix(dist(xy)) sum(out[row(out) - col(out) == 1]) }
i tried following aggregate:
out <- aggregate(cbind(x, y) ~ group, data, fun = pathlength) out <- aggregate(cbind(x, y) ~ group, data, function(x) pathlength(x))
however, calls pathlength on x , y separately instead of together, giving me:
# group x y #1 5 8 #2 b 2 2
what want call pathlength on x , y , aggregate way. here want aggregate do:
reala <- matrix(c(2,4,3,1,3,2,6,3), nrow=4, ncol=2) pathlength(reala) # [1] 9.964725 realb <- matrix(c(5,7,4,6), nrow=2, ncol=2) pathlength(realb) # [1] 2.828427 group <- c("a", "b") pathlength <- c(9.964725,2.828427) real_out <- data.frame(group, pathlength) real_out # group pathlength # 1 9.964725 # 2 b 2.828427
does have suggestions? or there other function can't find on google let me this? i'd rather not work around using loop, i'm assuming slow big dataset.
as you've found out, base aggregate()
function works on 1 column @ time. instead use by()
function
by(data[,c("x","y")], data$group, pathlength) data$group: [1] 9.964725 ----------------------------------------------------------------------- data$group: b [1] 2.828427
or split()/lapply()
lapply(split(data[,c("x","y")], data$group), pathlength) $a [1] 9.964725 $b [1] 2.828427
Comments
Post a Comment