jeudi 12 juillet 2018

Filter or ifelse across multiple columns

I'm doing research of the communication lines to a patient when they get sick. So for example: A person gets sick and goes to the doctor (A), then gets to the hospital (B), gets into contact with insurance (C) etc. The order is different for each patient. For instance, one patient will directly go to the hospital while the other person will first check the insurance etc. We've followed patients through the whole process and after the came into contact with a different authority, we let them fill out another survey. So after each authority ("step") we got the score for a survey. This gives me the following dataset set-up (in reality it is a very large dataset):

Patient<-c(1,1,1,1,1,1,1,2,2,2,2)
sample6<-c("A","A","A","A","A","A","A","A","A","A","A")
sample5<-c("Stop","B","B","B","B","B","B","Stop","C","C","C")
sample4<-c(NA,"Stop","C","C","C","C","C",NA, "Stop","F","F")
sample3<-c(NA,NA,"Stop","D","D","D","D",NA, NA,"Stop","G")
sample2<-c(NA,NA,NA,"Stop","E","E","E",NA, NA,NA,"Stop")
sample1<-c(NA,NA,NA,NA, "Stop","F","F",NA,NA,NA, NA)
sample0<-c(NA,NA,NA,NA, NA,"Stop","G",NA,NA,NA, NA)
sample00<-c(NA,NA,NA,NA, NA,NA,"Stop",NA,NA,NA, NA)
Score<-c(90,88,65,44,78,98,66,38,93,88,80)
Time<-c("01-01-2018", "02-01-2018", "03-01-2018", "04-01-2018", "05-01-2018", "06-01-2018", "07-01-2018","01-02-2018", "02-02-2018", "05-02-2018", "06-02-2018")

df<-data.frame("Patient"=Patient, "step0"=sample6, "step1"=sample5, "step2"=sample4, "step3"=sample3, "step4"=sample2, 
               "step5"=sample1,"step6"= sample0, "step7"=sample00, "Score"=Score, "Time"=Time)

> df
   Patient step0 step1 step2 step3 step4 step5 step6 step7 Score       Time
1        1     A  Stop  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>    90 01-01-2018
2        1     A     B  Stop  <NA>  <NA>  <NA>  <NA>  <NA>    88 02-01-2018
3        1     A     B     C  Stop  <NA>  <NA>  <NA>  <NA>    65 03-01-2018
4        1     A     B     C     D  Stop  <NA>  <NA>  <NA>    44 04-01-2018
5        1     A     B     C     D     E  Stop  <NA>  <NA>    78 05-01-2018
6        1     A     B     C     D     E     F  Stop  <NA>    98 06-01-2018
7        1     A     B     C     D     E     F     G  Stop    66 07-01-2018
8        2     A  Stop  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>    38 01-02-2018
9        2     A     C  Stop  <NA>  <NA>  <NA>  <NA>  <NA>    93 02-02-2018
10       2     A     C     F  Stop  <NA>  <NA>  <NA>  <NA>    88 05-02-2018
11       2     A     C     F     G  Stop  <NA>  <NA>  <NA>    80 06-02-2018

So for example: row 1 has the survey score after authority A, row 2 is for the same patient and has the score of the survey after authority B etc. Now I want to make a column that indicates all the rows that have "F" as the final authority AND the row before so that I can compare them.

So I want to create this dataset:

   Patient step0 step1 step2 step3 step4 step5 step6 step7 Score       Time Indicator
1        1     A  Stop  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>    90 01-01-2018         0
2        1     A     B  Stop  <NA>  <NA>  <NA>  <NA>  <NA>    88 02-01-2018         0
3        1     A     B     C  Stop  <NA>  <NA>  <NA>  <NA>    65 03-01-2018         0
4        1     A     B     C     D  Stop  <NA>  <NA>  <NA>    44 04-01-2018         0
5        1     A     B     C     D     E  Stop  <NA>  <NA>    78 05-01-2018         Before
6        1     A     B     C     D     E     F  Stop  <NA>    98 06-01-2018         After
7        1     A     B     C     D     E     F     G  Stop    66 07-01-2018         0
8        2     A  Stop  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>    38 01-02-2018         0
9        2     A     C  Stop  <NA>  <NA>  <NA>  <NA>  <NA>    93 02-02-2018         Before
10       2     A     C     F  Stop  <NA>  <NA>  <NA>  <NA>    88 05-02-2018         After
11       2     A     C     F     G  Stop  <NA>  <NA>  <NA>    80 06-02-2018         0

I did manage to indicate the rows that contain "F" plus the previous:

ProcessColumns <- 2:9
d <- df[,ProcessColumns] == "F"
df$Indicator <- rowSums(d,na.rm=T)
df$filter[which(df$filter %in% 1)-1] <- "Before"
df$filter[which(df$filter %in% 1)] <- "After"

But now it indicates ALL the rows containing "F" not just in the end.. anyone who can help me?

Aucun commentaire:

Enregistrer un commentaire