samedi 15 août 2020

mutate function with nested ifelse statements creating two columns instead of one

I have some cumulative data on covid-19 cases for countries and i am trying to calculate the difference in a new column called Diff. I can't remove the NA values because it wouldn't show the dates when there were no tests carried out. So i have made it so that if there is an NA value, to set the Diff value to 0 to indicate there was no difference, hence no tests conducted that day.

I am also trying to make a statement which says that if Diff is also NA, indicating that there was no tests conducted the day before, then to set the difference to the confirmed cases value for that day.

As you can see from my results at the bottom, i am almost there but i am creating a new column called ifelse. I tried to fix this but i think there is a simple error i am making somewhere. If anyone could point it out to me i would really appreciate it, thank you.

library(tidyverse)

data <- data.frame(
          stringsAsFactors = FALSE,
                        CountryName = c("Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan"),
                     ConfirmedCases = c(NA,7L,NA,NA,NA,10L,16L,21L,
                                        22L,22L,22L,24L,24L,34L,40L,42L,
                                        75L,75L,91L,106L,114L,141L,166L,
                                        192L,235L,235L,270L,299L,337L,367L,
                                        423L),
                               Diff = c(NA,NA,NA,NA,NA,NA,6L,5L,1L,
                                        0L,0L,2L,0L,10L,6L,2L,33L,0L,16L,
                                        15L,8L,27L,25L,26L,43L,0L,35L,
                                        29L,38L,30L,56L)
                 )

data2 <- data %>%
  mutate(Diff = ifelse(is.na(ConfirmedCases) == TRUE, 0, ConfirmedCases - lag(ConfirmedCases)),
                       ifelse(is.na((ConfirmedCases - lag(ConfirmedCases))) == TRUE, ConfirmedCases, ConfirmedCases - lag(ConfirmedCases)))

head(data2, 10)
#>    CountryName ConfirmedCases Diff ifelse(...)
#> 1  Afghanistan             NA    0          NA
#> 2  Afghanistan              7   NA           7
#> 3  Afghanistan             NA    0          NA
#> 4  Afghanistan             NA    0          NA
#> 5  Afghanistan             NA    0          NA
#> 6  Afghanistan             10   NA          10
#> 7  Afghanistan             16    6           6
#> 8  Afghanistan             21    5           5
#> 9  Afghanistan             22    1           1
#> 10 Afghanistan             22    0           0

Created on 2020-08-15 by the reprex package (v0.3.0)

Aucun commentaire:

Enregistrer un commentaire