vendredi 4 septembre 2015

R: Where a value in two data frames is the same, apply a set of condition on one to determine its classification

I have two sets of data. I wish to apply a classification (low, mid.lo, mid.up, high) one the first set (income by year) based on conditions contained in the other (year, and three breakpoints). Below are samples from those data sets - the real sets are much larger and are not of the same length.

income

    Country Year    GNI.caput
Argentina   2000    7470
Argentina   2001    7000
Argentina   2002    4050
Argentina   2003    3670
Argentina   2004    3810
Denmark 2000    32660
Denmark 2001    31440
Denmark 2002    30870
Denmark 2003    34850
Denmark 2004    42760
Kenya   2000    420
Kenya   2001    400
Kenya   2002    390
Kenya   2003    410
Kenya   2004    460
Philippines 2000    1230
Philippines 2001    1230
Philippines 2002    1190
Philippines 2003    1270
Philippines 2004    1400

breaks

Year    Break.1 Break.2 Break.3
2004    825 3225    10065
2003    765 3035    9385
2002    735 2935    9075
2001    745 2975    9205
2000    755 2995    9265

I have tried the following sets of loops, but neither completes, generating several errors each.

Attempt 1

for(i in seq_along(gni.data)){
    while(gni.data$Year == break.pts$Year) {
        if(gni.data$GNI.caput <= break.pts$Break.1) {
            gni.data$Indicator <- "Low"
        } else if(gni.data$GNI.caput <= break.pts$Break.2) {
            gni.data$Indicator <- "Mid.Low"
        } else if(gni.data$GNI.caput <= break.pts$Break.3) {
            gni.data$Indicator <- "Mid.Up"
        } else if(gni.data$GNI.caput > break.pts$Break.3) {
            gni.data$Indicator <- "High"
        } else gni.data$Indicator <- "NA"
    }
}

Warning messages: 1: In gni.data$Year == break.pts$Year : longer object length is not a multiple of shorter object length 2: In while (gni.data$Year == break.pts$Year) { : the condition has length > 1 and only the first element will be used ...

Attempt 2

for(i in seq_along(gni.data)){
    while(gni.data$Year == break.pts$Year) {
        ifelse(gni.data$GNI.caput <= break.pts$Break.1, gni.data$Indicator <- "Low", 
                ifelse(gni.data$GNI.caput <= break.pts$Break.2, gni.data$Indicator <- "Mid.Lo",
                       ifelse(gni.data$GNI.caput <= break.pts$Break.3, gni.data$Indicator <- "Mid.Up",
                              ifelse(gni.data$GNI.caput > break.pts$Break.3, gni.data$Indicator <- "High",
                                     gni.data$Indicator <- "NA"))))
    }
}

Warning messages same as for attempt 1.

Where am I going wrong? Thanks!

Aucun commentaire:

Enregistrer un commentaire