I'm trying to separate the list-col values when converting a dataframe from long to wide format. I know that this has been answered here, however I want these columns with values adjacent to one another, and by adding an id this won't be the case, and neither does distinct() work. Although, I believe this can be tackled using this logic.
Given the dataframe looks like this:
# A tibble: 10 x 4
year locality_id landcover pland
<chr> <chr> <int> <dbl>
1 2010 L452817 8 0.0968
2 2010 L452817 9 0.0323
3 2010 L452817 12 0.613
4 2010 L452817 13 0.194
5 2010 L452817 14 0.0645
6 2019 L452817 8 0.0645
7 2019 L452817 9 0.0645
8 2019 L452817 12 0.516
9 2019 L452817 13 0.194
10 2019 L452817 14 0.161
And the wide format looks like this:
#with lists in a single row
Rows: 7
Columns: 4
$ locality_id <chr> "L452817", "L452817", "L452817", "L452817", "L452817",...
$ pland <dbl> 0.09677419, 0.03225806, 0.61290323, 0.19354839, 0.0645...
$ `2010` <list> [8, 9, 12, 13, 14, 16, 16]
$ `2019` <list> [17, 17, 17, 13, <8, 9>, 12, 14]
I'm trying to wrap my head around the logic to get around this for my situation, I've thought of various ways to tackle it and many failed. My current solution (until I find a problem with it), is combing a previous solution to a question of mine.
Given the first table, luckily the landcover value are the same for each year relative to locality_id however, their pland values are different, so when converting it to pivot_wider I'll get lists. My two solutions:
- Minus the pland values for
2019from2010, and store those differences:
However, my dataset is very large and I'll find that some values are covered in 2010 and not in 2019 or vice versa. When such a case happens, fill those NA values by fulfiling the condition provided in one of my previous questions, where if 2010 is NA fill it with 16 otherwise, if 2019 is NA fill it with 17. However, assign the pland value a negative sign if its from 2010 and make it positive if its ``in 2019. To suggest that if the habitat was in 2010 and not 2019, then there's clearly been a decrease in that habitat type.
Although, I may find that I have completely different pland and landcover values for a specific locality_id, also of different length (however, I was only able to find a similar looking table to meet this condition given the amount of time I have available):
# A tibble: 7 x 4
year locality_id landcover pland
<chr> <chr> <int> <dbl>
1 2010 L596267 0 0.194
2 2010 L596267 9 0.0323
3 2010 L596267 11 0.0645
4 2010 L596267 13 0.710
5 2019 L596267 0 0.194
6 2019 L596267 11 0.0968
7 2019 L596267 13 0.710
2010 has length 4 whilst 2019 has length 3. Luckily, in this scenario, some values match so we just proceed by minus the pland values for those that match. Otherwise, replace NA with 17.
Here's a criteria for Pland, for each year relative to locality_id , then pland should sum to 1.
However, I'm trying to calculate the difference, so these values should be either minus, 0, or positive.
Finding the differences has also been mentioned here for a previous question of mine, however, implementing it when transitioning from long to wide format is the tricky part.
reproducible code:
structure(list(year = c(2010L, 2010L, 2010L, 2010L, 2010L, 2010L,
2010L, 2010L, 2010L, 2010L, 2010L, 2019L, 2019L, 2019L, 2019L,
2019L), locality_id = c("L452817", "L452817", "L452817", "L452817",
"L452817", "L910180", "L910180", "L910180", "L910180", "L910180",
"L910180", "L452817", "L452817", "L452817", "L452817", "L452817"
), landcover = c(8L, 9L, 12L, 13L, 14L, 0L, 8L, 9L, 10L, 11L,
13L, 8L, 9L, 12L, 13L, 14L), pland = c(0.0967741935483871, 0.032258064516129,
0.612903225806452, 0.193548387096774, 0.0645161290322581, 0.4375,
0.34375, 0.03125, 0.03125, 0.09375, 0.0625, 0.0645161290322581,
0.0645161290322581, 0.516129032258065, 0.193548387096774, 0.161290322580645
)), row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"
))

Aucun commentaire:
Enregistrer un commentaire