I would like to figure out who lived in a location before the current owner.
Previous owners (>20,000) are stored in a dataset called lifetime_census. Here is a sample of the data:
id squirrel_id reflo locx locy census_date gr year census_year lifespan sex
<chr> <chr> <chr> <dbl> <dbl> <date> <chr> <chr> <dbl> <dbl> <chr>
16161 5587 -310 -3 10 2001-05-15 KL 2001 2001 1810 F
17186 5587 -310 -3 10 2001-08-15 KL 2001 2001 1810 F
17723 5587 -310 -3 10 2002-05-15 KL 2002 2002 1810 F
18473 5587 -310 -3 10 2002-08-15 KL 2002 2002 1810 F
19345 5879 -310 -3 10 2003-05-15 KL 2003 2003 2109 M
20535 5879 -310 -3 10 2003-08-15 KL 2003 2003 2109 M
21812 6055 -310 -3 10 2004-05-15 KL 2004 2004 802 F
22663 6130 -310 -3 10 2004-08-15 KL 2004 2004 374 M
16848 5101 Q1 17.3 0.8 2001-06-21 KL 2001 2001 55 F
17033 5101 Q1 17.3 0.8 2001-08-15 KL 2001 2001 55 F
17836 6501 Q1 17.3 0.8 2002-05-15 KL 2002 2002 1598 F
18626 6501 Q1 17.3 0.8 2002-08-15 KL 2002 2002 1598 F
19439 6501 Q1 17.3 0.8 2003-05-15 KL 2003 2003 1598 F
20675 6501 Q1 17.5 0.8 2003-08-15 KL 2003 2003 1598 F
21815 6057 Q1 17.5 0.8 2004-05-15 KL 2004 2004 479 F
21815 6077 M2 13 1.8 2004-05-15 SU 2004 2004 979 M
I then have an owners dataset (here is another sample):
squirrel_id spr_census reflo.x spring_locx spring_locy spring_grid aut_census reflo.y autumn_locx autumn_locy autumn_grid
<chr> <dbl> <chr> <dbl> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <chr>
1 6391 2005 Q1 17.5 0.8 KL 2004 M4 13.3 4 SU
1 6130 2005 -310 -3 10 KL 2004 -310 -3.2 10.3 KL
Using a loop I'd like to create a new column for each squirrel_id in the owners dataset, showing who the most recent previous owner was (i.e., the previous owner's squirrel_id, their sex, lifespan, and reflo - which are all in the lifetime_census dataset).
I would like to do this for the spr_census and aut_census locations (reflo.x and reflo.y, respectively) for each squirrel_id in the owners dataset.
The complicating factors include:
- if there is not an owner with the exact same
reflo(i.e.,lifetime_census$reflo==owners$reflo.xandlifetime_census$reflo==owners$reflo.y), then the next closest owner will do (for spring:(30*owners$spring_locx[i]-30*lifetime_census$locx)^2+(30*owners$spring_locy[i]-30*lifetime_census$locy)^2<=(distance)^2and for autumn:(30*owners$autumn_locx[i]-30*lifetime_census$locx)^2+(30*owners$autumn_locy[i]-30*lifetime_census$locy)^2<=(distance)^2) - the previous owner has to be within two years of the
spr_censusandaut_censusyears - if even after these conditions are relaxed there is still no owner, then it should be marked with an
NA
For example, for squirrel_id 6391, the previous spr_census at reflo Q1 would have been squirrel_id 6057, but for the aut_census at reflo M4 there is no previous owner, but there is an owner nearby (at M2).
The goal is to have these columns added to the owners dataset:
squirrel_id spring_owner_id spring_owner_reflo spring_owner_sex spring_owner_lifespan autumn_owner_id autumn_owner_reflo autumn_owner_sex autumn_owner_lifespan
6391 6057 Q1 F 479 6077 M2 M 979
What I have currently tried is this (just for the spr_census):
n <- length(owners$squirrel_id)
distance <- 30
for(i in 1:n) {
print(i)
last_owner <- subset(lifetime_census,
lifetime_census$gr== owners$spring_grid[i] & #owners must be on same grid
lifetime_census$census_year <= owners$spr_census[i] & #owners can be in current or past year
lifetime_census$squirrel_id != owners$squirrel_id[i] & #previous owner cant be current owner
(30*owners$spring_locx[i]-30* lifetime_census$locx)^2+(30* owners$spring_locy[i]-30* lifetime_census$locy)^2<=(distance)^2) #pick neighbour within set distance
#lifetime_census$reflo==owners$reflo.x #not included
#Put it all together
owners[i,"spring_owner"] <- last_owner
else {
owners[i, "spring_owner"] <- NA
}
}
I am unable to figure out how to make this code find the previous spr_census and aut_census owners, nor restrict this to the last two years, and then use the distance filter after no reflo match was found.
Any ideas?
Aucun commentaire:
Enregistrer un commentaire