mardi 18 février 2020

Conditional statement within loop using multiple datasets R

I would like to figure out who was the most recent previous owner at a location within the last two years before the current owner. The locations are called reflo (reference location).

The conditions:

  • the previous owner has to have lived at the same location (lifetime_census$reflo==owners$reflo.x[i]) within two years of the current owner's year (lifetime_census$census_year <= 2 years of owners$spr_census)
  • if none, then assign NA

Previous owners (>20,000) are stored in a dataset called lifetime_census. Here is a sample of the data:

  id    squirrel_id reflo  census_year 
16161 5587        -310     2001   
17723 5587        -310     2002      
19345 5879        -310     2003    
16848 5101         Q1      2001         
17836 6501         Q1      2002      
19439 6501         Q1      2003      
21815 6057         Q1      2004       

I then have an owners dataset (here is a sample):

squirrel_id spr_census reflo.x 
6391        2005        Q1 
6130        2005       -310
6288        2005        A12

To illustrate what I am trying to achieve:

squirrel_id spr_census reflo.x  previous_owner  census_year
6391              2005  Q1      6057            2004
6130              2005 -310     5879            2003
6288              2005  A12     NA              NA

What I have currently tried is this:

n <- length(owners$squirrel_id)

for(i in 1:n) {
  last_owner <- subset(lifetime_census,
    lifetime_census$reflo==owners$reflo.x &
    lifetime_census$census_year <= owners$spr_census[i])  #owners can be in current or past year

    #Put it all together
    owners[i,"spring_owner"] <- last_owner

else {
owners[i, "spring_owner"] <- NA
}
}

This gives me a new column for the previous owner in any past year for reflo.x, adding NAs after all the conditions are not met. I cannot figure out how to restrict this search to the last two years.

Any ideas?

Aucun commentaire:

Enregistrer un commentaire