vendredi 13 décembre 2019

R: Replace NA with date value. If date value exists in both columns choose the earliest date

I am working with two data frames and have merged them thanks to the advice on other StackOverflow threads. Now I'm struggling to get the values from one column into the other column IF they meet certain conditions.

The date_first_followup column is almost null but does have some values in it due to historical usage. The Activity.Date.x column always has a date in it but it isn't always the same date value as the date_first_followup column.

What I need to do is to have R replace all of the NA's in the followup column with the dates in the Activity column. Whenever a row has values in both columns take the earliest date.

I don't know if how the fields are formatted affect the results but they start of as factors and I've tried various formatting options under as.Date() or even just leaving it as factor.

#Trying out a coalesce based solution.
#Use Coalesce to include follow up dates from activities whenever there isn't data already
#present in the followup field.
#
testVlookup_df$Activity.Date.x <- as.Date(testVlookup_df$Activity.Date.x, format = "%/m%/d%/Y")
testVlookup_df$date_first_followup <- as.Date(testVlookup_df$date_first_followup, format = "%m/%d/%Y")
#
testVlookup_df$date_first_followup <- dplyr::coalesce(testVlookup_df$date_first_followup, testVlookup_df$ctivity.Date.x)

Other attempts

#12.10.19 attempting to ONLY pull activity dates in when the column is NULL OR when the column has a value that has a later date.
testVlookup_df$date_first_followup <- ifelse(is.null(testVlookup_df$date_first_followup) == TRUE,
                                             as.Date(testVlookup_df$Activity.Date.x, format = "%m/%d/%Y"),
                                             ifelse(isTRUE(as.Date(testVlookup_df$date_first_followup, format = "%m/%d/%Y") > as.Date(testVlookup_df$Activity.Date.x, format = "%m/%d/%Y")) == TRUE,
                                                    as.Date(testVlookup_df$Activity.Date.x, format = "%/d%/m%/Y"), NA)
)

Another attempt

#12.12.19 58th attempt. Invert the ifelses


testVlookup_df$date_first_followup <-
  ifelse(
    isTRUE(
      as.integer(gsub("-", "", testVlookup_df$date_first_followup)) > as.integer(testVlookup_df$Activity.Date.x, "%Y/%m/%d")
    ) == TRUE,
    as.Date(testVlookup_df$date_first_followup, format = "%m/%d/%Y"),
    ifelse(
      is.null(testVlookup_df$date_first_followup) == TRUE,
      as.Date(testVlookup_df$Activity.Date.x, format = "%m/%d/%Y"),
      NA
    )
  )
class(testVlookup_df$date_first_followup)

Yet another attempt

  #Attempts for if command  using elses. This seems to work but does not loop through the entire vector.                                            
    # testVlookup_df$date_first_followup <- if(is.null(testVlookup_df$date_first_followup)){ as.Date(testVlookup_df$Activity.Date.x, format = "%m/%d/%Y")
    # } else {if(as.integer(testVlookup_df$date_first_followup) > as.integer(testVlookup_df$Activity.Date.x)) {as.Date(testVlookup_df$Activity.Date.x, format = "%m/%d/%Y")
    # } else { NA }
    # }
    # Got this message 'Warning message:In if (as.integer(testVlookup_df$date_first_followup) > as.integer(testVlookup_df$Activity.Date.x)) 
    # { : the condition has length > 1 and only the first element will be used' Looked it up. Tells me if and else can only be used for 1 argument will not loop. Need sapply or ifelse

Aucun commentaire:

Enregistrer un commentaire