dimanche 13 août 2017

Conditional Lookup in R

I am trying to replace the blank (missing) zipcodes in the df table with the zipcodes in another table called zipless, based on names. What would be the best approach? A for loop is probably very slow.

I was trying with something like this, but it does not work.

df$zip_new <- ifelse(df, is.na(zip_new),
                     left_join(df,zipless, by = c("contbr_nm" = "contbr_nm")),
                     zip_new)

I was able to make it work using this approach, but I am sure it is not the best one. I first added a new column from the lookup table and in the next step selectively used it, where necessary.

#temporarly renaming the lookup column in the lookup table
zipless <- plyr::rename(zipless, c("zip_new"="zip_new_temp"))
#adding the lookup column to the main table
df <- left_join(df, zipless, by = c("contbr_nm" = "contbr_nm"))
str(test_dff)
#taking over the value from the lookup column zip_new_temp if the condition is met, else, do nothing.
test_dff$zip_new  <- ifelse((test_dff$zip_new == "") &
                              (test_dff$contbr_nm %in% zipless$contbr_nm), 
                            test_dff$zip_new_temp,
                            test_dff$zip_new)

What would be a proper way to do this?

Thank you very much!

Aucun commentaire:

Enregistrer un commentaire