I am struggling with this code already for a few hours, so I hope that someone can help me. :)
I have a dataframe (sub.smi) that containts data on droughts (SMI value). Based on the latitude and longitude values I matched the SMI values to their respective postal codes. As I was not sure, which of the data sets I found is better, I included two datasets on postal codes and therefore there are two columns with postal codes. Then, I checked whether there is a match between the postal code columns (zip_1 & zip_2) which is the match column.
sub.smi <- data.frame(year=c(2015,2015,2015,2015,2015,2015),
month=c(1,1,1,1,1,1),
lat = c(47.26648, 47.26732, 47.26814, 47.30490, 47.30567, 47.37350),
lon = c(10.149783, 10.202620, 10.255458, 10.307151, 10.360030, 10.093070),
SMI = c(0.8630472, 0.8760275, 0.8250171, 0.8341259, 0.6553457, 0.8640428),
zip_1 = c(87561,87561,87561,87561,87561,87538),
zip_2 = c(87561,87561,87561,87541,87541,87561),
match = c("yes","yes","yes","no","no","no"),
date = c("Jan 2015","Jan 2015","Jan 2015","Jan 2015","Jan 2015","Jan 2015"))
sub.smi
# year month lat lon SMI zip_1 zip_2 match date
# 2015 1 47.26648 10.14978 0.8630472 87561 87561 yes Jan 2015
# 2015 1 47.26732 10.20262 0.8760275 87561 87561 yes Jan 2015
# 2015 1 47.26814 10.25546 0.8250171 87561 87561 yes Jan 2015
# 2015 1 47.30490 10.30715 0.8341259 87561 87541 no Jan 2015
# 2015 1 47.30567 10.36003 0.6553457 87561 87541 no Jan 2015
# 2015 1 47.37350 10.09307 0.8640428 87538 87561 no Jan 2015
As you can see from the dataframe, one postal code can have several SMI values within one time period. My next goal is to calculate one average SMI value for each postal code within one period. However, as both postal code columns do not always match each other, I would like to include all relevant SMIs in the average. F.e., if I want to calculate the average SMI for the postal code 87561 for January 2015, I want to include all SMIs where zip_1 and zip_2 are 87561, without double counting the columns where the postal codes match each other. I came up with this loop but there are still some issues:
library(dplyr)
df <- data.frame(matrix(ncol = 3, nrow = 0)) # create a dataframe for final results
plz <- unique(sub.smi$zip_1) # create a list of unique postal codes
time <- unique(sub.smi$date) # create a list of unique time periods
for (i in 1: length(time)){ # looping over all time periods
time_temp <- time[i] # filter the relevant time period
for (j in 1: length(plz)){ # looping over all postal codes of zip_1
plz_temp <- plz[j] # filter the relevant postal code
sub1 <- list() # create list for first if statement
sub2 <- list() # create list for second if statement
sub3 <- list() # create list for third if statement
if (sub$zip_1 == plz_temp && sub.smi$match == "yes"){
sub1 <- subset(sub$SMI, sub$zip_1== plz_temp & sub$match =="yes") # filter respective SMI values into a list
}
if (sub$zip_1 == plz_temp && sub$match == "no"){
sub2 <- subset(sub$SMI, sub$zip_1== plz_temp & sub$match =="no")
}
if (sub.smi$zip_2 == plz_temp && sub.smi$match == "no"){
sub3 <- subset(sub.smi$SMI, sub.smi$zip_2== plz_temp & sub.smi$match =="no")
}
av <- (do.call(sum, sub1) + do.call(sum, sub2)+do.call(sum, sub3))/(length(sub1)+length(sub2)+length(sub3)) # calculating the average SMI for respective postal code
df <- rbind(df, c(time_temp,plz_temp,av)) # include final average SMI, postal code, and time period into the dataframe
}
}
With this code I tried to filter all relevant SMI values without double counting the values where the postal codes matched each other. However, I struggle with two issues:
-
It seems the code runs the first if statement and then immediately jumps to the calculation of the average without running the other two if statements. As far I understood else if is not an option, as these statements would only be checked if the first statement were to be wrong. Has anyone an idea how all if statements could be checked independently from each other?
-
When the code tries to calculate the average SMI (av), I get this error: Error in do.call(sum, sub): second argument must be a list. I checked already - sub1, sub2, sub3 are lists but the code does not recognize them as lists. Does anyone have an idea what might be the issue?
Sorry for the long post. Thank you very much for your help.
Aucun commentaire:
Enregistrer un commentaire