lundi 8 mai 2017

apply over ifelse using an index from another data.frame

I would like to conditionally replace the values in one data frame with the values in another using a nested ifelse() statement. But I'm having trouble extending this to the whole data frame using apply. I want to avoid loops and non-base packages if possible.

The first is a data frame with six obs. of 10 character variables:

> snp_test
  L1 L2 L3 L4 L5 L6 L7 L8 L9 L10
1  1  2  -  0  2  0  0  0  0   2
2  1  0  -  0  -  1  0  -  -   2
3  -  -  -  0  -  -  0  -  -   1
4  2  0  0  0  0  -  0  0  0   0
5  2  0  -  0  2  -  0  0  0   1
6  1  0  -  0  0  0  0  0  -   0

The second contains three columns of data (characters; each is two letters separated by a space) relating to each variable:

> locus_test
   locus gt0 gt1 gt2
1     L1 G G A A G A
2     L2 T T G G T G
3     L3 A A C C A C
4     L4 T T A A T A
5     L5 G G C C G C
6     L6 C C A A C A
7     L7 T T C C T C
8     L8 A A G G A G
9     L9 A A G G A G
10   L10 G G A A G A

I would like to replace the values in snp_test with the values in locus_test. For example, when L1==1, the 1 is replaced with the corresponding value in locus_test$gt1 ("A A"). When L1==2, the value in the gt2 column is used ("G A").

I can do this for each variable separately:

ifelse(snp_test[,1]==1,locus_test$gt1[locus_test$locus =="L1"],snp_test[,1])

Then I would nest the ifelse, so that the three different values are replaced with their corresponding values in locus_test, e.g.:

ifelse(ifelse(snp_test[,1]==1,locus_test$gt1[locus_test$locus =="L1"],snp_test[,1])==2,locus_test$gt2[locus_test$locus =="L1"],ifelse(snp_test[,1]==1,locus_test$gt1[locus_test$locus =="L1"],snp_test[,1]))

And so on...

But when I apply this over all of the variables in snp_test, i.e.

apply(snp_test,2,function(x)ifelse(x==1,locus_test$gt1,x))

the first six values of locus_test$gt1 are being used as the replacement values, rather than the single value that relates to each column. So I would like to know how I can add the necessary index so that the value that gets replaced in, for example, the L1 column of snp_test can only ever be one of the three variables corresponding to L1 in locus_test.

In other words, how can I specify the subset part of the ifelse:

locus_test$gt1[locus_test$locus =="L1"]

in apply?

Aucun commentaire:

Enregistrer un commentaire