r - Filtering for only complete sets of years -
i have data on yield organized state , county. out of data want retain counties providing complete years between 1970 2000.
the following code clears away incomplete cases, fails omit cases- larger data set. fake data
some fake data:
fake data
k <- 5 # number of rows set nan df <- data.frame(state = c(rep(1, 10), rep(2, 10)), county = rep(1:4, 5), yield = 100) df[sample(1:20, k), 3] <- nan
current code:
df1 <- read.csv("gly2.csv",header=true) df <- data.frame(df1) droprows_1 <- function(df, v1, v2, v3, value = 'x'){ idx <- df[, v3] == value todrop <- df[idx, c(v1, v2)]; todrop # should have k rows missng todrop <- unique(todrop); todrop # unique values less nrow <- dim(todrop)[1] for(i in 1:nrow){ idx <- apply(df, 1, function(x) all(x == todrop[i, ])) df <- df[!idx, ] } return(df) } qq <- droprows_1(df, 1, 2, 3)
thank you
to drop county's single missing value, use:
library(dplyr) df %>% group_by(county) %>% filter( !any(is.nan(yield)))
Comments
Post a Comment