jeudi 25 octobre 2018

For loop identifying inexistent NAs in R

I have a big dataframe called z with 107310 rows and 8 columns. It has no NAs as it went through this function: z<-z[complete.cases(z),] to eliminate all rows containing NAs.

I created the following for loop to remove all rows if the value of a specific column where higher than that of another specific one.

Firstly, I tried the following code:

  for(row in 1:nrow(z)){
   i <- z[row, 1]
  j <- z[row, 2]
  ci<- z[row, 6]
  cj<- z[row, 7]
  year <- z[row, 8]
      if(cj>ci){
    z<-z[-row,]}
}

The loop would run through but stop at some row indicating the following error:

“ error: missing value where TRUE/FALSE needed "

Even though there where no NAs, I adapted the loop to avoid the error, making the values in the if statement always numeric:

for(row in 1:nrow(z)){
  i <- z[row, 1]
  j <- z[row, 2]
  ci<- z[row, 6]
  cj<- z[row, 7]
  year <- z[row, 8]
  temp<-ci-cj
  temp<-ifelse(!is.na(temp),temp,0)
  if(temp<=0){
    z<-z[-row,]}
}

However, the loop still stops before it eliminates all rows in which cj>ci. The last values it genereates for i and j is NA and for ci and cj are NA_real, even though those values are not NAs in the data set.

Does anyone know what is happening? Thanks

Aucun commentaire:

Enregistrer un commentaire