vendredi 16 juillet 2021

How to check if value is above threshold for each row, but the column to check changes each time

I have a df with 7 columns, the first 6 are log2FC values for gene expression, the 7th contains the name of the column with the highest log2FC value for each gene.

I'm trying to create a second df containing a column with either "yes" or "no", if the value in the column defined in df$max is above >=0.58.

                   BA1_BA2_10.5.log2FC BA1_BA2_11.5.log2FC BA1_PBA_10.5.log2FC BA1_PBA_11.5.log2FC BA2_PBA_10.5.log2FC
ENSMUSG00000000001          0.43381231          0.37756860          0.33088193           0.2408001        -0.102930373
ENSMUSG00000000028          0.15446012          0.19279549          0.36452020           0.4408701         0.210060088
ENSMUSG00000000031          0.63865957          0.33683563         -0.54218709          -0.6777467        -1.180846658
ENSMUSG00000000037         -0.02683548          0.01858643         -0.02372156           0.3757510         0.003113918
ENSMUSG00000000056         -0.06259867          0.12463577         -0.01061768           0.1271518         0.051980989
                   BA2_PBA_11.5.log2FC                 max
ENSMUSG00000000001         -0.13676846 BA1_BA2_10.5.log2FC
ENSMUSG00000000028          0.24807462 BA1_PBA_11.5.log2FC
ENSMUSG00000000031         -1.01458229 BA1_BA2_10.5.log2FC
ENSMUSG00000000037          0.35716453 BA1_PBA_11.5.log2FC
ENSMUSG00000000056          0.00251603 BA1_PBA_11.5.log2FC
signif <- function(x,y){
  ifelse(y[[x]] >= 0.58, "yes", "no")
}
df_sig <- as.data.frame(apply(df[,7,drop=FALSE], 1, signif, y=df))
  ENSMUSG00000000001 ENSMUSG00000000028 ENSMUSG00000000031 ENSMUSG00000000037 ENSMUSG00000000056
1                 no                 no                 no                 no                 no
2                 no                 no                 no                 no                 no
3                yes                 no                yes                 no                 no
4                 no                 no                 no                 no                 no
5                 no                 no                 no                 no                 no

The code I've tried creates a yes/no value for every gene in the df, rather than just the one gene per row, for the column specified in df$max, giving me a 5x5 df as a result. Instead I want it to be like this

ENSMUSG00000000001  no
ENSMUSG00000000028  no
ENSMUSG00000000031  yes
ENSMUSG00000000037  no
ENSMUSG00000000056  no

Any help appreciated :)

Aucun commentaire:

Enregistrer un commentaire