jeudi 10 janvier 2019

Nested ifelse statement in a for loop

I am trying a nested ifelse statement within a for loop to create a new variable, the values of which are based on the frequency of occurrence of a factor variable (a list of postcodes).

The new variable should return a predefined series of numbers based on the frequency of a postcode (frequencies range between 1 and 4). Each of these number series must end in 800 and increase in increments of 200, the starting point of which depends on the frequency of each postcode: the higher the frequency, the lower the starting increment of 200.

For this I have defined a for loop, in which I first measure the frequency of each postcode, followed by a nested ifelse statement, specifying each series of numbers to be allocated to the NewVar based on the frequency.

A small intuitive example of what I want to achieve is written here, I want to apply this on a dataframe containing millions of postcodes.

DESIRED RESULT:

Postcode  NewVar
AA        600
AA        800
BB        400
BB        600
BB        800
CC        800
DD        200
DD        400
DD        600
DD        800

CODE:

DF$NewVar <- 0

DF$NewVar <- for (i in levels(DF$Postcode[i]))
ifelse((table(DF$Postcode[i]) == 4), DF$NewVar[i] <- c(200,400,600,800),
  (ifelse ((table(DF$Postcode[i]) == 3), DF$NewVar[i] <- c(400,600,800),
    (ifelse ((table(DF$Postcode[i]) == 2), DF$NewVar[i] <- c(600,800), 
      DF$NewVar[i] <- c(800))))))

PROBLEM 1:

Firstly, when running the entire code, I receive an error stating that there is a mismatch between the amount of rows in the replacement versus the data, whilst when manually checking for this, it is not the case (the mismatch is always limited to exactly 1 row).

Error in `$<-.data.frame`(`*tmp*`, NewVar, value = c("0", "0", "0",  : 
replacement has 11 rows, data has 10.

PROBLEM 2:

TESTING IF AN IFELSE WORKS ON ITS OWN (OUT OF THE LOOP):

When verifying if the ifelse clause works on its own (outside of the loop), I see that only the starting increment of 200 is copied on each line of NewVar, so it does not increment to 800. This is not what I want to achieve either:

CODE TESTING ONE IFELSE:

DF$NewVar[1:2] <- ifelse((sum(table(DF$Postcode)) == 2),                       
  DF$NewVar[1:2] <- c(600,800), "NA")

RESULT (not desired):

Postcode  NewVar
AA        200
AA        200

DESIRED RESULT:

Postcode  NewVar
AA        200
AA        400

Note: I predefined the NewVar column before trying to allocated the variable, and I have checked for NA´s already as well.

Thank you in advance for your time.

Aucun commentaire:

Enregistrer un commentaire