jeudi 15 février 2018

IF statement for filtering down a data set

I am writing this post in relation to another one I posted but got no answer to and that is because I don't think I was specific enough (I hope this is ok!) Essentially, I think I have figured out that the best way to do what I am trying to do is to use an IF function, but a little bit snagged on how to do it, as I am looking for a very specific set of rules.

I have a data matrix of samples (columns) and genes (rows). Each set of five columns belong to one one sample type, say, one time point for example repeated 5 times , the next five columns are the second time point and so on.

I would like to be able to look at genes that change from one time point to another only if it has a difference of a minimum change of 50 counts or more. So if the change from one gene to another were 45 counts (for example), it would be rejected. Is there any way of doing this and if so, would somebody be kind enough to share some code for this?.. I don't just want a statement of true and false (this would be a great start) but then I would like to make a data matrix of the TRUE statement, so that I only have a list of genes that change by a minimum of 50 counts (in either direction, whether up or down)

Pease see example data matrix code attached. Many thanks for your time!

**X51378P3 X51378P4 X48275P5 X48277P1 X48277P2** X28046 X23154 X23156 
    X23157 X23241 **X8657 X10459  X8302 X8726 X8727** X8309 X5260 X47471 
    X51394   X18
    ENSMUSG00000042096        0        2        0        1        3      
    2     13      5      3      6   238    211    149   182   214   843   
    831   1072    815   971
    ENSMUSG00000033208       91       47      100       41       79    
    764    848    744    491    671  2361   2888   2323  2297  2778  4613  
    6634   6603   5477  4924
    ENSMUSG00000021750       46       51       28       28       34     
    89     90     81     88     73  9083   6238   3876  6754  7066 11727 
    10135  16857  10669 12581
    ENSMUSG00000041205      290      141      156      122      146    
    431    432    377    310    388  1514   1714   1363  1428  1677  1492  
    2036   1465   1573  1585
    ENSMUSG00000026556     4260     3486     3545     2315     3090   
    2818   2039   2204   2139   2241   807    973    689   787  1094   
    466   660    460    457   579
    ENSMUSG00000032908      112       77       78       76       98    
    399    286    359    218    282  1451   1266    897  1183  1416  1881  
    2243   2281   1862  2144
    ENSMUSG00000045246        7        4       11        7       11     
    13     29     36     19     14   762    958    810   905   720  2950  
    2390   2916   2684  2878
    ENSMUSG00000023019      159      108      104       96      116     
    68     94     94     62    132   878   1039    774   941   829  3164  
    3191   3405   2671  3019
    ENSMUSG00000029054        9        1       13        2        4     
    27     39     49     13     35  1834   2277   1054  1744  2449  3905  
    4228   3240   2941  3489
    ENSMUSG00000010476     9380     8541     8906     5609     7406   
    4478   4422   4865   3739   4003   886   1473    979   956  1199   
    247   380    434    297   375
    ENSMUSG00000020788       79      109       93       53       91    
    124    163    212    128    135  3561   3396   1944  3128  3754  6632  
    6844   5198   5595  6646
    ENSMUSG00000047945    18196    14417    16349    10746    14262  
    19114  13732  13902  12339  13406  4224   7321   4514  5056  6271   
    702   899    630    883   741
    ENSMUSG00000022096      183      120      156       76      159    
    384    205    160    225    189  2466   2488   1958  2504  2921  2955  
    3255   3218   2442  2928
    ENSMUSG00000020734      233       85      157      150      108    
    183    204    253    187    182  5854   4614   2719  4949  6563 12011 
    14573  10291   9136 12527

If you look at ENSMUSG00000029054, the average value of the first 5 columns is 5.8 and the average of the second 5 (which would represent another sample) is 32.6. So the difference between the 2 is 26.8. So what I would like to do is filter this matrix such that the average change between each sample is a minimum of 50..

What I am truly stuck with is defining this argument that says, I want to define a specific delta change between samples as well as saying that first 5 samples are actually he same condition and I want to take the mean of these values and compare them to the mean of the next 5 values and so on.

Many thanks again all!

Aucun commentaire:

Enregistrer un commentaire