mardi 1 mai 2018

r - categorizing continuous data by existing groups

I'm a relative novice in R and I have a series of Census Tracts' socioeconomic scores (SES) over a 5-yr period and I'm trying to categorize each year's SES scores into three categories of "High", "Medium", and "Low" without having to subset the data.

     CT_ID_10 year SESindex SESindex_z SEStercile
1 42101009400 2012 11269.54 -1.0445502         NA
2 42101009400 2013 11633.63 -1.0256920         NA
3 42101009400 2014 15773.60 -0.8112616         NA
4 42101009400 2015 15177.28 -0.8421481         NA
5 42101009400 2016 21402.55 -0.5197089         NA
6 42101014000 2012 21448.06 -0.5173519         NA

I want to use the mean and standard deviations as my cutoff points (i.e. anything above the mean(x[per year]) + sd(x[per year]) is "High" while anything below the mean(x[per year]) - sd(x[per year]) is "Low". I tried the following code:

for (year in 2012:2016) {
  df$SEStercile <- ifelse(df$SESindex_z[which(df$year==year)] > (mean(df$SESindex_z[which(df$year==year)])+sd(df$SESindex_z[which(df$year==year)])), "HIGH",
  ifelse(df$SESindex_z[which(df$year==year)] < (mean(df$SESindex_z[which(df$year==year)])-sd(df$SESindex_z[which(df$year==year)])), "LOW","MEDIUM"))
}

However, I received the following error:

Error in `$<-.data.frame`(`*tmp*`, "SEStercile", value = c("LOW", "LOW", :  
replacement has 367 rows, data has 1839

Any advice or simple functions would be greatly appreciated!

Aucun commentaire:

Enregistrer un commentaire