mercredi 18 décembre 2019

How to check whether all elements of a nested list are a subset of another list in R

I've tried a number of different methods for this including this stack but nothing is working quite correctly.

My dataframe "SiteVisits" (a small subset dput is at bottom) is made up of columns Date(class = date), TagID (class = numeric), SiteVisits (a list of character), and NumSites (class = numeric). Each row lists all sites where an individual organism (TagID) is found for each Date.

I'd like to assign whether a tag spent the entire day "inside", "outside", or "transiting" based on the sites it visited. It can only be "inside" if it never visits an outside site, and it can only be "outside" if it never visits an inside site

First, I'd like to determine whether ALL the sites for a TagID for a Date are included in this list:

inside <- list(c("Release","IC1", "IC2", "IC3","RGD1"))

If TRUE SiteVisit$Location = "INSIDE" ELSE test whether ALL sites for a TagID for a Date are contained within this list:

outside <- list(c("ORS1","WC1","WC2","WC3","RGU1","ORN1","ORN2","ORS3","GL1","CVP1","CLRS"))

If TRUE SiteVisit$Location = "OUTSIDE" ELSE SiteVisit$Location = "TRANSITING"

I've tried a number of different dplyr and base versions to accomplish this, but none seem to get it right. I think it's because I'm not correctly checking through each element of SiteVisit$SiteVisits

My current attempts are:

SiteVisit <- SiteVisit %>%
  mutate(Location = ifelse(all(SiteVisits[[]] %in% inside), "INSIDE",
                           ifelse(all(SiteVisits[[]] %in% outside),"OUTSIDE","TRANSITING")))

which yields all "INSIDE"

and

SiteVisit <- SiteVisit %>%
  mutate(Location = ifelse(all(SiteVisits[[]] %in% inside), "INSIDE",
                           ifelse(all(SiteVisits[[]] %in% outside),"OUTSIDE","TRANSITING")))

which yields all "TRANSITING"

Here's a small subset of my dataframe "SiteVisit" dput:

structure(list(Date = structure(c(15828, 15828, 15847, 15847, 
15847, 15847, 15847, 15847, 15848, 15848, 15848, 15848, 15848, 
15848, 15848, 15848, 15849, 15849, 15849, 15849, 15849, 15849, 
15849, 15850, 15850, 15850, 15850, 15850, 15850, 15850, 15851, 
15851, 15851, 15851, 15851, 15851, 15851, 15851, 15852, 15852, 
15852, 15852, 15852, 15852, 15852, 15853, 15853, 15853, 15853, 
15853, 15853, 15853, 15853, 15853, 15854, 15854, 15854, 15854, 
15854, 15854, 15854, 15854, 15855, 15855, 15855, 15855, 15855, 
15855, 15855, 15855, 15855, 15855, 15855, 15855, 15855, 15855, 
15856, 15856, 15856, 15856, 15856, 15856, 15856, 15856, 15856, 
15856, 15856, 15856, 15856, 15857, 15857, 15857, 15857, 15857, 
15857, 15857, 15857, 15857, 15857, 15857), class = "Date"), TagID = c(5717.06, 
6277.06, 5073.06, 5717.06, 11121.1, 11191.1, 11387.1, 11415.1, 
5717.06, 6277.06, 11121.1, 11191.1, 11219.1, 11289.1, 11387.1, 
11415.1, 5717.06, 11121.1, 11191.1, 11219.1, 11289.1, 11387.1, 
11415.1, 5717.06, 11121.1, 11191.1, 11219.1, 11289.1, 11387.1, 
11415.1, 5717.06, 11121.1, 11191.1, 11219.1, 11289.1, 11317.1, 
11387.1, 11415.1, 5717.06, 6277.06, 11191.1, 11219.1, 11289.1, 
11387.1, 11415.1, 5717.06, 6277.06, 9015.01, 9833.06, 11191.1, 
11219.1, 11289.1, 11387.1, 11415.1, 5717.06, 6277.06, 9015.01, 
11191.1, 11219.1, 11289.1, 11387.1, 11415.1, 5641.22, 5717.06, 
6221.06, 6277.06, 7909.22, 9015.01, 9833.06, 11121.1, 11191.1, 
11219.1, 11289.1, 11317.1, 11387.1, 11415.1, 5717.06, 6277.06, 
6529.06, 8119.01, 8545.06, 9015.01, 9497.06, 9833.06, 11191.1, 
11219.1, 11289.1, 11387.1, 11415.1, 5717.06, 6277.06, 6529.06, 
9015.01, 9497.06, 9833.06, 11191.1, 11219.1, 11289.1, 11387.1, 
11415.1), SiteVisits = list("Release", "Release", c("IC2", "IC1", 
"Release"), "IC3", "WC2", "RGD1", c("WC1", "WC3"), "WC3", "IC3", 
    "IC3", "WC2", "RGD1", "IC2", "IC1", "WC1", "WC3", "IC3", 
    "WC2", "RGD1", c("IC2", "IC1"), "IC1", "WC1", "WC3", "IC3", 
    "WC2", "RGD1", "IC2", "IC1", "WC1", "WC3", "IC3", "WC2", 
    "RGD1", "IC2", "IC1", "WC1", "WC1", "WC3", "IC3", "IC3", 
    "RGD1", "IC2", "IC1", "WC1", "WC3", "IC3", "IC3", c("IC3", 
    "Release"), c("IC3", "IC2", "IC1", "Release"), "RGD1", "IC2", 
    "IC1", "WC1", "WC3", "IC3", "IC3", c("IC3", "IC2"), "RGD1", 
    "IC2", "IC1", "WC1", "WC3", "Release", "IC3", "Release", 
    "IC3", c("RGD1", "Release"), c("IC3", "IC2"), c("IC3", "IC1"
    ), "WC2", "RGD1", "IC2", "IC1", "WC1", "WC1", "WC3", "IC3", 
    "IC3", c("RGD1", "Release"), c("RGD1", "Release"), "Release", 
    c("IC3", "IC2", "IC1"), "Release", c("IC3", "IC2", "IC1", 
    "RGD1"), "RGD1", "IC2", "IC1", "WC1", "WC3", "IC3", "IC3", 
    "RGD1", c("IC3", "IC2", "IC1"), "RGD1", c("IC3", "IC1", "RGD1"
    ), "RGD1", "IC2", c("IC2", "IC1"), "WC1", "WC3"), NumSites = c(1L, 
1L, 3L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 4L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 
3L, 1L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 3L, 1L, 1L, 
2L, 1L, 1L)), row.names = c(NA, -100L), groups = structure(list(
    Date = structure(c(15828, 15847, 15848, 15849, 15850, 15851, 
    15852, 15853, 15854, 15855, 15856, 15857), class = "Date"), 
    .rows = list(1:2, 3:8, 9:16, 17:23, 24:30, 31:38, 39:45, 
        46:54, 55:62, 63:76, 77:89, 90:100)), row.names = c(NA, 
-12L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

Aucun commentaire:

Enregistrer un commentaire