jeudi 13 février 2020

Loop a custom ifelse function in R using a string vector

I wrote a function to NA values in columns if values in associated columns are below a sample size threshold. The function works if applied to 1 variable at a time.

# Clear everything
rm(list = ls(all.names = TRUE))

# Create dataframe
DF <- data.frame(VehicleType = c("Car","Car","LuxeryCar","Car","Car","LuxeryCar","LuxeryCar"),
                 Brand = c("Honda","Audi","Bentley","Chevrolet","Hyundai","Maserati","Porsche"),
                 VarA_Low=c(15000, 30000, 50000, 40000, 15000, 100000, 100000),
                 VarA_Medium=c(40000, 70000, 100000, 90000, 25000, 200000, 180000),
                 VarA_High=c(20000, 150000, 500000, 190000, 80000, 1000000, 500000),
                 VarA_SampleSize=c(39,44,51,35,45,65,53),
                 VarB_Low=c(15000, 30000, 50000, 40000, 15000, 100000, 100000),
                 VarB_Medium=c(40000, 70000, 100000, 90000, 25000, 200000, 180000),
                 VarB_High=c(20000, 150000, 500000, 190000, 80000, 1000000, 500000),
                 VarB_SampleSize=c(2,40,92,47,51,39,40))

# NA values if associated SampleSize is below 40
NA_values <- function(m) {
  m <- deparse(substitute(m))
  Var_L <- paste0(as.character(m), "_Low")
  Var_M <- paste0(as.character(m), "_Medium")
  Var_H <- paste0(as.character(m), "_High")
  Count <- paste0(as.character(m), "_SampleSize")
  DF[,Var_L] <- ifelse(DF[,Count] < 40, NA, DF[,Var_L])
  DF[,Var_M] <- ifelse(DF[,Count] < 40, NA, DF[,Var_M])
  DF[,Var_H] <- ifelse(DF[,Count] < 40, NA, DF[,Var_H])
  return(DF)
}

# Apply function to one variable at a time
DF <- NA_values(VarA)
DF <- NA_values(VarB)

This works, but my solution is impractical as I usually have hundreds of variables, the column names change, and the number of variables. I would like to declare all variables as a string vector and apply the function to all of them.

# Declare variables as a string vector
Vars <- c("VarA", "VarB")

# Create dataframe to store results
DF_NA <- DF

# Loop over DF and store results in DF_NA
for (item in Vars) 
{
  DF_NA[, c(item)] <- NA_values(item)
}

Which results in an error message "undefined columns selected"

Aucun commentaire:

Enregistrer un commentaire