lundi 10 février 2020

condition for comparing two columns

I have a dataframe with four columns, the first one has the names of counties, the second one has periods in it and the third one has actual measured values(IPC class) in it and the fourth one has forecasted values(Forecast) in it. Both the actual values and the forecasted values have a range of 1 to 5. These are the 32 first rows of the dataframe sorted by county.:

structure(list(County = c("Baringo", "Baringo", "Baringo", "Baringo", 
"Baringo", "Baringo", "Baringo", "Baringo", "Baringo", "Baringo", 
"Baringo", "Baringo", "Baringo", "Baringo", "Baringo", "Baringo", 
"Baringo", "Baringo", "Baringo", "Baringo", "Baringo", "Baringo", 
"Baringo", "Baringo", "Baringo", "Baringo", "Baringo", "Baringo", 
"Baringo", "Baringo", "Baringo", "Baringo"), `Period of measurement Kenya` = c("2011-01", 
"2011-04", "2011-07", "2011-10", "2012-01", "2012-04", "2012-07", 
"2012-10", "2013-01", "2013-04", "2013-07", "2013-10", "2014-01", 
"2014-04", "2014-07", "2014-10", "2015-01", "2015-04", "2015-07", 
"2015-10", "2016-02", "2016-06", "2016-10", "2017-02", "2017-06", 
"2017-10", "2018-02", "2018-06", "2018-10", "2018-12", "2019-02", 
"2019-06"), `IPC class` = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 3, 2, 1, 1, 1, 1, 1, 2
), Forecast = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 1, 1, 2, 2, 1, 1, 2, 1, 2, 3, 1, 1, 1, 1, 2, 1)), row.names = c(1L, 
48L, 95L, 142L, 189L, 236L, 283L, 330L, 377L, 424L, 471L, 518L, 
565L, 612L, 659L, 706L, 753L, 800L, 847L, 894L, 941L, 988L, 1035L, 
1082L, 1129L, 1176L, 1223L, 1270L, 1317L, 1364L, 1411L, 1458L
), class = "data.frame") 

So for my report I need to know how many crisis transitions and how many misforecasted crisis transitions there were during the period I am researching. A crisis transition is when the values in the actual values column went from 1 or 2 to 3,4 or 5. In the part of the dataframe you can see that the county Baringo had 1 crisis transition. To count this the following code was used:

SUB_count_cristrans_KE <- long.SUB_dfCSKE_tot %>% mutate(crisis = ifelse(`IPC class` %in% 3:5, 1, 0)) %>%
  arrange(County, `Period of measurement Kenya`) %>%
  group_by(County) %>%
  summarize(SUB_crisis_trans_count = sum(diff(crisis) > 0))

A misforecasted crisis transition is when the forecast column doesn't show the same value as the IPC class column in the event of a crisis transition. As you can see in the part of the dataframe the crisis transition of Baringo was misforecasted, as the value in the Forecast column wasn't a 3, 4 or 5. So my question is: what would be a correct condition in the ifelse function to substract the misforecasted crisis periods by county? In words this would be: First it has to check if a period is a crisis transition, so that it went from a 1 or 2 to a 3,4 or 5. If that's the case, is the value in the forecast column a 3, 4 or 5. If that's not the case then it is a misforecasted crisis transition. The code I have right now is :

SUB_count_crismiss_KE <- long.SUB_dfCSKE_tot %>% mutate(crisis_miss = ifelse(`IPC class` %in% 3:5 & (!Forecast %in% 3:5), 1, 0)) %>%
  arrange(County, `Period of measurement Kenya`) %>%
  group_by(County) %>%
  summarize(SUB_crisis_miss_count_KE = sum(diff(crisis_miss) > 0))

Let me know if I have to add something or clarify! Thanks in advance.

Aucun commentaire:

Enregistrer un commentaire