vendredi 25 septembre 2020

R Using case_when to track changes in a column by group

I have a dataset of course enrollment where I am trying to track whether students dropped, added, or retained a course throughout the semester and identify their enrollment 'path'. I.e. I want to record if they were enrolled in BIOL101 and dropped it to take BIOL202. My dataframe looks like this:

YRTR    TECH_ID COU_ID  SUBJ    COU_NBR GENDER  RACE    sub_cou     status     path
20173   108      217    MUSC    2231    Male    White   MUSC 2231   retained
20173   108      218    MUSC    2281    Male    White   MUSC 2281   retained
20173   8429     574    ECON    2201    Male    White   ECON 2201   retained
20173   8429     720    BUSN    2120    Male    White   BUSN 2120   retained
20173   9883     60     ECON    2202    Male    White   ECON 2202   added
20173   15515    95     PHIL    1102    Female  White   PHIL 1102   retained
20183   8207     478    ART     1102    Female  White   ART 1102    retained
20183   8207     1306   ART     1130    Female  White   ART 1130    added
20183   8207     403    ART     1125    Female  White   ART 1125    dropped


I am trying to fill in the column on the far right, "path". The idea is that if a student is retained in a course like in the first row, the path would read 2231->2231. Specifically I am looking at course transfers WITHIN subjects. So, at the end of the data set, ID 8207 would have one path that looked like 1102->1102 and another path that looked like 1125->1130

I initially tried splitting the dataframe into two dataframes (one before, and one after the drop period) and then rejoining them like so:

data5 <- merge(x=post_drop, y=pre_drop, by=c("TECH_ID", "YRTR", "SUBJ"), all=TRUE)

And then using case_when to assign the path:

data5$status.x=="retained" ~ paste0(data5$COU_NBR.x, "->", data5$COU_NBR.x),
((data5$status.x=="added") & (data5$status.y=="dropped")) ~ paste0(data5$COU_NBR.y, "->", data5$COU_NBR.x),
((data5$status.x=="dropped") & (data5$status.y=="added")) ~ paste0(data5$COU_NBR.x, "->", data5$COU_NBR.y)                
)

But this doesn't get me where I want - it leaves a lot of NAs in paths and also doesn't tell me if a student dropped a course within a subject and didn't register for another (i.e. dropping BIOL101 and not taking another BIOL class) in which case I would want something like 101->NA or when a class is simply added (i.e. they weren't registered in a BIOL class initially but decided to register for BIOL101) which would be formatted like so NA->101

Aucun commentaire:

Enregistrer un commentaire