dimanche 17 mai 2020

How to match incorrect strings and replace by correct strings

I have two dataframes.

One contains place names both in correct form and incorrect form:

place  <- data.frame(
  place_correct = c("London", "Birmingham", "Newcastle", "Brighton"),
  place_incorrect = c("Lundn", "Birmgham", "Nexcassle", "Briton"), stringsAsFactors = F)

The other contains a column with a mix of correct and incorrect place names:

set.seed(123)
df <- data.frame(town = sample(c("London", "Birmingham", "Newcastle", "Brighton", 
                                 "Lundn", "Birmgham", "Nexcassle", "Briton"), 20, replace = T), stringsAsFactors = F)

What I'd like to do is match the incorrect place names in df to the incorrect place names in place and replace them by the correct place name. Using an ifelse statement works fine as far as the replacements are concerned. But it fails to keep the correct place names, instead giving them as <NA>:

df$town_correct <- ifelse(match(df$town, city$city_incorrect),                     # condition
                          city$city_correct[match(df$town, city$city_incorrect)],  # TRUE
                          df$town)                                                 # FALSE

 df
         town town_correct
1      Briton     Brighton
2    Birmgham   Birmingham
3    Birmgham   Birmingham
4      Briton     Brighton
5    Birmgham   Birmingham
6    Birmgham   Birmingham
7       Lundn       London
8       Lundn       London
9   Newcastle         <NA>
10 Birmingham         <NA>
11     Briton     Brighton
12     Briton     Brighton
13   Birmgham   Birmingham
14  Nexcassle    Newcastle
15     London         <NA>
16   Brighton         <NA>
17  Nexcassle    Newcastle
18 Birmingham         <NA>
19  Newcastle         <NA>
20 Birmingham         <NA>

Where's the mistake in the ifelse statement? Or else, how could the transformation be achieved in dplyr?

Aucun commentaire:

Enregistrer un commentaire