mardi 4 février 2020

Implementing Code Conditionally in R Based on Features of Dataset

I'm looking to streamline my code, and minimize manual tweaks depending on the data set I run through it. I.e. I receive batches of data by country - but each country is slightly different in terms of fields and field names, so requires tweaking each time I run a new country. I would like to eliminate the tweaks and do some selective coding. (Many of the challenges I handle easily with ifelse(), but haven't been able to do a conditional mutate for example).

This is a logic question, so please let me know if I should have uploaded a data set.

In this example, I need to create a month text (Calendar_Month_txt) and year (Calendar_Year) field from a date field (e.g. 2018-03-01) for the USA, but other countries already have these included, so I don't need to create this field, just rename() them, so they align with my common data set.

Keep in mind, this is part of a much larger block of code that I need all the countries to run though...this is just the illustrative part.

# Import Data and Align Fields and Column Names
P_Region <- Raw_Data %>%
# This is for USA only...I need to comment this out when not USA
  mutate(Calendar_Month_txt = ifelse(as.character(substr(Date, 6, 7)) == "01", "January",      
                                 ifelse(as.character(substr(Date, 6, 7)) == "02", "February",
                                 ifelse(as.character(substr(Date, 6, 7)) == "03", "March",
                                 ifelse(as.character(substr(Date, 6, 7)) == "04", "April",
                                 ifelse(as.character(substr(Date, 6, 7)) == "05", "May",
                                 ifelse(as.character(substr(Date, 6, 7)) == "06", "June",
                                 ifelse(as.character(substr(Date, 6, 7)) == "07", "July",
                                 ifelse(as.character(substr(Date, 6, 7)) == "08", "August",
                                 ifelse(as.character(substr(Date, 6, 7)) == "09", "September",
                                 ifelse(as.character(substr(Date, 6, 7)) == "10", "October",
                                 ifelse(as.character(substr(Date, 6, 7)) == "11", "November",
                                 ifelse(as.character(substr(Date, 6, 7)) == "12", "December", NA)))))))))))),
         Calendar_Year = as.character((substr(Date, 1,4)))) %>% 
# These I run only for non-USA, as I have created this above, so comment it out for USA
  rename(Calendar_Month_txt = CalendarMonthTextFull,                                         
         Calendar_Year = CalendarYear)

I tried to use if statements within the dplyr code (I know I can do this as two separate complete blocks, but that seems like a lot of repeat code). Example:

V_USA <- TRUE

P_Region <- Raw_Data %>%
  if(V_USA) {
  mutate(Calendar_Month_txt = ifelse(as.character(substr(Date, 6, 7)) == "01", "January",      
                                 ifelse(as.character(substr(Date, 6, 7)) == "02", "February",
                                 ifelse(as.character(substr(Date, 6, 7)) == "03", "March",
                                 ifelse(as.character(substr(Date, 6, 7)) == "04", "April",
                                 ifelse(as.character(substr(Date, 6, 7)) == "05", "May",
                                 ifelse(as.character(substr(Date, 6, 7)) == "06", "June",
                                 ifelse(as.character(substr(Date, 6, 7)) == "07", "July",
                                 ifelse(as.character(substr(Date, 6, 7)) == "08", "August",
                                 ifelse(as.character(substr(Date, 6, 7)) == "09", "September",
                                 ifelse(as.character(substr(Date, 6, 7)) == "10", "October",
                                 ifelse(as.character(substr(Date, 6, 7)) == "11", "November",
                                 ifelse(as.character(substr(Date, 6, 7)) == "12", "December", NA)))))))))))),
         Calendar_Year = as.character((substr(Date, 1,4)))) 
    } else {                                   ##### END U.S.
  rename(Calendar_Month_txt = CalendarMonthTextFull,                                         
         Calendar_Year = CalendarYear) } 

I tweaked various forms of this and this version was the most promising...I received the error:

Error in if (.) V_USA else { : argument is not interpretable as logical
In addition: Warning message:
In if (.) V_USA else { :
  the condition has length > 1 and only the first element will be used

I suspect the error is because each data set is only one country, not all countries...so doesn't have any else.

Does anyone know a solution for this that will allow me to keep my code simple enough? Again, I can do this as two separate blocks with the 'if' and 'else' outside the dplyr pipes. Any thoughts are greatly appreciated. (I've tried to understand if mutate_if would work here, but haven't been able to really find much illustrating that...forgive me if I missed something.)

Aucun commentaire:

Enregistrer un commentaire