mercredi 20 juin 2018

R: If a substring of 7 characters in colA is equal to a substring of 7 characters in colB add the value of colA to a new colC

In R, I need to compare the first 8 characters of one colA (Longitude.x) with the first 8 characters of a second colB (X.x). If the 8 characters are identical, then I want to write the value of colA (Longitude.x) to a new colC (XCoord). In other words, if colA contains a longitude value of -122.23538 and colB contains an X value of -122.235873, I want colC to take the value of colA -122.23538 because the first 8 characters (-122.235) match.

colA (Longitude.x) and colB (X.x) are both type double when first read in to R, so I have converted them to characters with the following code:

schools_merge$Longitude.x[] <- lapply(schools_merge$Longitude.x[], as.character)
schools_merge$X.x[] <- lapply(schools_merge$X.x[], as.character)

The class and type of both colA and B become "list."

I have tried the following code to write a new colC (XCoord):

schools_merge$XCoord <- if(substr(schools_merge$X.x,1,7) == substr(schools_merge$Longitude.x,1,8)) "yes" else "no"

While this code runs, it returns a warning--

Warning message:
In if (substr(schools_merge$X.x, 1, 8) == substr(schools_merge$Longitude.x,  
: the condition has length > 1 and only the first element will be used

--and not the desired outcome (for example, the second element in each list should result in a "yes" for colC (XCoord) because characters 1-8 of the number -122.23538 are equal to characters 1-8 of -122.235873).

head(schools_merge$XCoord)
head(schools_merge$Longitude.x)
head(schools_merge$X.x)

> head(schools_merge$XCoord)
[1] "no" "no" "no" "no" "no" "no"
> head(schools_merge$Longitude.x)
[[1]]
[1] "-120.76288"

[[2]]
[1] "-122.23538"

[[3]]
[1] "-122.19604"

[[4]]
[1] "-122.09222"

[[5]]
[1] "-121.77057"

[[6]]
[1] "-122.21629"

> head(schools_merge$X.x)
[[1]]
[1] "-120.763628"

[[2]]
[1] "-122.235873"

[[3]]
[1] "-122.197942"

[[4]]
[1] "-122.092998"

[[5]]
[1] "-121.770702"

[[6]]
[1] "-122.216899"

The possibilities I can think of are: 1) What I am assuming counts as a character (i.e. '-' and '.' and all numbers) is incorrect, but I have tried several different iterations of the number of characters to compare and I still get the same--either head() all "yes" or all "no," or 2) I may need to change to a convert the columns to vector instead of character. Any help is much appreciated!

Thank you, Anna

Aucun commentaire:

Enregistrer un commentaire