jeudi 11 mars 2021

Running several independent if statements within a for loop in R

I am struggling with this code already for a few hours, so I hope that someone can help me. :)

I have a dataframe (sub.smi) that containts data on droughts (SMI value). Based on the latitude and longitude values I matched the SMI values to their respective postal codes. As I was not sure, which of the data sets I found is better, I included two datasets on postal codes and therefore there are two columns with postal codes. Then, I checked whether there is a match between the postal code columns (zip_1 & zip_2) which is the match column.

sub.smi <- data.frame(year=c(2015,2015,2015,2015,2015,2015),
                  month=c(1,1,1,1,1,1),
                  lat = c(47.26648, 47.26732,  47.26814, 47.30490, 47.30567, 47.37350),
                  lon = c(10.149783, 10.202620, 10.255458, 10.307151, 10.360030, 10.093070),
                  SMI = c(0.8630472, 0.8760275, 0.8250171, 0.8341259, 0.6553457, 0.8640428),
                  zip_1 = c(87561,87561,87561,87561,87561,87538),
                  zip_2 = c(87561,87561,87561,87541,87541,87561),
                  match = c("yes","yes","yes","no","no","no"),
                  date = c("Jan 2015","Jan 2015","Jan 2015","Jan 2015","Jan 2015","Jan 2015"))
                  
sub.smi
# year month     lat         lon          SMI   zip_1   zip_2   match   date
# 2015  1   47.26648    10.14978    0.8630472   87561   87561   yes Jan 2015
# 2015  1   47.26732    10.20262    0.8760275   87561   87561   yes Jan 2015
# 2015  1   47.26814    10.25546    0.8250171   87561   87561   yes Jan 2015
# 2015  1   47.30490    10.30715    0.8341259   87561   87541   no  Jan 2015
# 2015  1   47.30567    10.36003    0.6553457   87561   87541   no  Jan 2015
# 2015  1   47.37350    10.09307    0.8640428   87538   87561   no  Jan 2015

As you can see from the dataframe, one postal code can have several SMI values within one time period. My next goal is to calculate one average SMI value for each postal code within one period. However, as both postal code columns do not always match each other, I would like to include all relevant SMIs in the average. F.e., if I want to calculate the average SMI for the postal code 87561 for January 2015, I want to include all SMIs where zip_1 and zip_2 are 87561, without double counting the columns where the postal codes match each other. I came up with this loop but there are still some issues:

library(dplyr)

df <- data.frame(matrix(ncol = 3, nrow = 0)) # create a dataframe for final results
plz <- unique(sub.smi$zip_1) # create a list of unique postal codes 
time <- unique(sub.smi$date) # create a list of unique time periods

for (i in 1: length(time)){ # looping over all time periods
    time_temp <- time[i] # filter the relevant time period
    for (j in 1: length(plz)){ # looping over all postal codes of zip_1
        plz_temp <- plz[j] # filter the relevant postal code
        sub1 <- list() # create list for first if statement
        sub2 <- list() # create list for second if statement 
        sub3 <- list() # create list for third if statement
        if (sub$zip_1 == plz_temp && sub.smi$match == "yes"){
            sub1 <- subset(sub$SMI, sub$zip_1== plz_temp & sub$match =="yes") # filter respective SMI values into a list
            }
        if (sub$zip_1 == plz_temp && sub$match == "no"){
            sub2 <- subset(sub$SMI, sub$zip_1== plz_temp & sub$match =="no")
        }
        if (sub.smi$zip_2 == plz_temp && sub.smi$match == "no"){
            sub3 <- subset(sub.smi$SMI, sub.smi$zip_2== plz_temp & sub.smi$match =="no")
        }
        av <- (do.call(sum, sub1) + do.call(sum, sub2)+do.call(sum, sub3))/(length(sub1)+length(sub2)+length(sub3)) # calculating the average SMI for respective postal code
        df <- rbind(df, c(time_temp,plz_temp,av)) # include final average SMI, postal code, and time period into the dataframe
    }    
}

With this code I tried to filter all relevant SMI values without double counting the values where the postal codes matched each other. However, I struggle with two issues:

  1. It seems the code runs the first if statement and then immediately jumps to the calculation of the average without running the other two if statements. As far I understood else if is not an option, as these statements would only be checked if the first statement were to be wrong. Has anyone an idea how all if statements could be checked independently from each other?

  2. When the code tries to calculate the average SMI (av), I get this error: Error in do.call(sum, sub): second argument must be a list. I checked already - sub1, sub2, sub3 are lists but the code does not recognize them as lists. Does anyone have an idea what might be the issue?

Sorry for the long post. Thank you very much for your help.

Aucun commentaire:

Enregistrer un commentaire