My question is similar to other replace_na posts but I can't find the right combination of answers.
I have a dataframe with inflation rates for all countries over 8 years (wide format - countries as rows and years as columns).
- Some countries have NAs for all 8 years (columns 3:10), and in that case I want to replace all NAs with the column mean
library(tidyverse)
sample %>%
mutate_if((rowSums(is.na[,3:10]))!=8, replace_na = colMeans(na.rm=T))
This is close but something is wrong.
- Other countries only have NAs in some columns, in which case I want to replace NA with the previous year's value
library(zoo)
sample %>%
mutate_if((rowSums(is.na[,3:10]))!=8, replace_na = colMeans(na.rm=T)),
is.na[,4:10], na.locf(fromLast = TRUE)))
Tried using na.locf from the zoo package but can't get it right with the other conditions
- The final condition is that, if the NA is in the first year (2007), I want to replace it with the 2007 column mean instead of the next year (2008 was the financial crisis so all the inflation rates went nuts).
mutate_if((rowSums(is.na[,3:10]))!=8, replace_na = colMeans(na.rm=T)),
is.na[,4:10], na.locf(fromLast = TRUE)),
is.na("2007"), replace = colMeans("2007", na.rm = TRUE))
But this is full of errors and I'm stuck trying to link all these conditions together - pretty new to ifelse statements. I'm trying to find a dplyr solution as that's the syntax I'm most familiar with, but maybe it's easier in base R or data.table
running R 3.6.1
sample <- structure(list(`Country Name` = c("Aruba", "Afghanistan", "Angola",
"Albania", "Andorra", "Arab World", "United Arab Emirates", "Argentina",
"Armenia", "American Samoa", "Antigua and Barbuda", "Australia"
), `Country Code` = c("ABW", "AFG", "AGO", "ALB", "AND", "ARB",
"ARE", "ARG", "ARM", "ASM", "ATG", "AUS"), `2007` = c(5.39162036843645,
8.68057078513406, 12.2514974459487, 2.93268248162318, NA, 4.74356585295154,
NA, NA, 4.40736089644519, NA, 1.41605259409743, 2.32761128891476
), `2008` = c(8.95722105296535, 26.4186641547444, 12.4758291326398,
3.36313757366391, NA, 11.2706652380848, 12.2504202448139, NA,
8.94995335353386, NA, 5.33380639820232, 4.35029854990047), `2009` = c(-2.13630037272305,
-6.81116108898995, 13.7302839288409, 2.23139683475865, NA, 2.92089711805365,
1.55980098148558, NA, 3.40676682683799, NA, -0.550159995508869,
1.77111716621252), `2010` = c(2.07773902027782, 2.1785375238942,
14.4696564932574, 3.61538461538463, NA, 3.91106195534027, 0.879216764156813,
NA, 8.17636138473956, NA, 3.3700254022015, 2.91834002677376),
`2011` = c(4.31633194082721, 11.8041858089129, 13.4824679218511,
3.44283593170005, NA, 4.75316388885632, 0.877346595685083,
NA, 7.6500080785929, NA, 3.45674967234599, 3.30385015608744
), `2012` = c(0.627927921638161, 6.44121280934118, 10.2779049218839,
2.03642235579081, NA, 4.61184432206646, 0.662268900269082,
NA, 2.55802007757907, NA, 3.37688044338879, 1.76278015613193
), `2013` = c(-2.37226328015073, 7.38577178397857, 8.77781429332619,
1.92544399507649, NA, 3.23423783752364, 1.10111836375706,
NA, 5.78966778544654, NA, 1.05949782356168, 2.44988864142539
), `2014` = c(0.421637771012246, 4.67399603536339, 7.28038730361125,
1.61304235314414, NA, 2.77261158414198, 2.34626865671643,
NA, 2.98130868933673, NA, 1.08944157435363, 2.48792270531403
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-12L))
Aucun commentaire:
Enregistrer un commentaire