jeudi 5 octobre 2017

R How to loop over paired columns to create new columns

I am trying to loop over specific pair columns (they have similar names) and create columns based on a conditional statement.

Example dataset:

    set.seed(2)
    df <- data.frame (id=rep(1:5),
                      s1=rnorm(5, 0, 3),
                      s2=rnorm(5, 0, 3),
                      s2a=rnorm(5, 0, 3),
                      st1=rnorm(5, 3, 3),
                      st2=rnorm(5, 3, 3),
                      st2a=rnorm(5, 3, 3))


> df
  id         s1         s2       s2a       st1        st2      st2a
1  1 -2.6907436  0.3972609  1.252952 -3.933207  9.2724576 -4.355119
2  2  0.5545476  2.1238642  2.945258  5.635814 -0.5997775  4.431712
3  3  4.7635360 -0.7190941 -1.178086  3.107420  7.7689146  1.210325
4  4 -3.3911270  5.9534218 -3.119007  6.038486  8.8639549  5.376610
5  5 -0.2407553 -0.4163610  5.346687  4.296795  3.0148133  3.868910

Column s1 is paired with column st1 etc. I want to indicate 1/0 if the equality between these columns is -3 to 0. E.g. df$ys1<-ifelse(df$s1<=-3 & df$st1>=0, 1, 0). The ultimate aim is to create the final variable yes_no (1/0) to indicate if any of the differences between the pairs of columns are 1 e.g. df$yes_no<-ifelse(df$ys1==1 | df$ys2==1 | df$ys2a==1, 1, 0)

The new dataset should look like this:

> df
  id         s1         s2       s2a       st1        st2      st2a ys1 ys2 ys2a yes_no
1  1 -2.6907436  0.3972609  1.252952 -3.933207  9.2724576 -4.355119   0   0    0      0
2  2  0.5545476  2.1238642  2.945258  5.635814 -0.5997775  4.431712   0   0    0      0
3  3  4.7635360 -0.7190941 -1.178086  3.107420  7.7689146  1.210325   0   0    0      0
4  4 -3.3911270  5.9534218 -3.119007  6.038486  8.8639549  5.376610   1   0    1      1
5  5 -0.2407553 -0.4163610  5.346687  4.296795  3.0148133  3.868910   0   0    0      0

I'm sure there is a way of doing a loop without actually creating all additional columns (i.e. just create the final column, yes_no ) but I would be interested in how to create these just to know how to do it, in addition to a neater method. I think a way of doing it would be to break up the dataset into two sets based on the pairs and then use in a loop:

firstt<-(df[,c(2:4)])
final<-(df[,c(5:7)])

or skip that and try directly in a loop

for(i in names(df[,c(2:4)])){
r<-(df[,c(5:7)])
df[i] <-ifelse(df$[i]<=-3 & df$[r]>=0, 1, 0)
}

Obviously that wont work but that is the idea of what I was trying. Any help would be appreciated.

Aucun commentaire:

Enregistrer un commentaire