vendredi 8 juillet 2016

FOR Loop into nested IFELSE statements containing is.na

I have written a set of if statements into a FOR loop, however the loop takes in excess of 10mins to run and Ihave been attempting to speed this up after reading an article describing how to to adapt the IFELSE in place of the FOR loop.

The head of the data set is as such:

Destination.City.Name Booking.ID Creation.Date Cancellation.Date Arrival.Date Status.Name Nights Room.nights DI.flag Star.rating
1             Abu Dhabi   14418661    2015-02-16        2015-02-16   2015-04-15   Cancelled     90          90       N           4
2             Abu Dhabi   14418661    2015-02-16        2015-02-16   2015-04-14   Cancelled     90          90       N           4
3             Abu Dhabi   14418661    2015-02-16        2015-02-16   2015-04-06   Cancelled     90          90       N           4
4             Abu Dhabi   14418661    2015-02-16        2015-02-16   2015-04-02   Cancelled     90          90       N           4
5             Abu Dhabi   14418661    2015-02-16        2015-02-16   2015-03-29   Cancelled     90          90       N           4
6             Abu Dhabi    9634541    2013-06-11        2013-06-13   2013-09-13   Cancelled     90          90       N           5
  Future.Arrival.Flag Future.Creation.Flag Future.Arrival.Day Status.On.Model.Date
1                   1                    1                469                   NA
2                   1                    1                468                   NA
3                   1                    1                460                   NA
4                   1                    1                456                   NA
5                   1                    1                452                   NA
6                  NA                   NA                 NA                   NA

The FOR loop essentially populates the last column Status.On.Model.Date based on the simple logic:

If the Creation Date is after the Model Date, it's NA.

If the Cancellation Date is NA, it is confirmed.

If the Cancellation Date is >= Model Date, it is confirmed, else it is Cancelled.

The original FOR loop is as below and as mentioned, when executed, it works but takes in excess of 10mins (the data set is 600K+ rows):

i = 1
for (i in 1:length(bookingdata$Status.On.Model.Date)) {
  if (bookingdata$Creation.Date[i] > Model.Date){   
      bookingdata$Status.On.Model.Date[i] = NA     
    } else {
        if (is.na(bookingdata$Cancellation.Date[i])) {  #
            bookingdata$Status.On.Model.Date[i] = 'Confirmed'
        } else {
            if (bookingdata$Cancellation.Date[i] >= Model.Date){
                bookingdata$Status.On.Model.Date[i] = 'Confirmed'
            } else {
                if (bookingdata$Cancellation.Date[i] < Model.Date){
                    bookingdata$Status.On.Model.Date[i] = 'Cancelled'
            }
        }
    }
  }
}

The new IFELSE code I wrote in place of this is below:

bookingdata$Status.On.Model.Date = ifelse(bookingdata$Creation.Date > Model.Date, NA,
                                    ifelse(is.na(bookingdata$Cancellation.Date, 'Confirmed',
                                      ifelse(bookingdata$Cancellation.Date >= Model.Date, 'Confirmed', 'Cancelled'))))

but I am also getting the error:

Error in is.na(bookingdata$Cancellation.Date, "Confirmed", ifelse(bookingdata$Cancellation.Date >=  : 
  3 arguments passed to 'is.na' which requires 1

I'm not sure how to correct the error as I don't know how else the statements can be realigned.

Thanks!

Aucun commentaire:

Enregistrer un commentaire