lundi 4 février 2019

Replacing Actual Data Points With NA

I'm looking at a large dataset of measurements taken of wild animals that is an excel file I read into R. Since there are thousands of individual animals measured, there are numerous mistakes in the data that make no logical sense. For example, the weight of an animal that usually weighs 22-32 grams was recorded as weighing 610 grams. In addition to being illogical, this throws off the scale of every graph I make.

I have tried numerous approaches to get these datapoints replaced with an NA. I created a new column from the original column of weights using the following code:

original.dataset[, weight_clean:= ifelse(Weight=="610.0", NA, Weight)]

I repeated this for every permutation from the original excel file I could think of (" 610.0", "610.0 ", "610", " 610", "610 ") for each of the errant datapoints. This hasn't worked: when I run unique() on the new "clean weight" column, all of the datapoints I tried to remove are back.

I installed naniar, and tried the code again this way:

original.dataset %>% replace_with_na(replace = list(weight_clean = c("610.0"," 610.0", "610.0 ", "610", "610 ", " 610")))

The full code has all permutations of all the errant datapoints.

This also has not worked. When I run unique() on the "clean weight" column after running this code, all the errant datapoints still appear.

I feel like the answer is right under my nose, but my research and coding attempts haven't helped. What am I missing?

Aucun commentaire:

Enregistrer un commentaire