vendredi 12 avril 2019

Creating new, variable-name dependent columns in a function (to indicate levels of significance in expression data)

In dfs containing results of differentially expressed proteins, I would like to mark which proteins exceed certain thresholds of significance (eg logFC>1 & p<0.05 as up_0.05 or p<0.01 as up_0.01). Using ifelse I can do this for each df individually, but it would be much cleaner to have a function as I have many dfs to process this way.

A similar question has been asked (dplyr - mutate: use dynamic variable names) but I was not able to translate this into solving my problem, so I would appreciate it very much if you could correct my functions code to work (example data provided)

Thanks a lot!

sample data

p.vals <- seq(from=0, to=1, by=.0001)
logFCs <- seq(from=0, to=4, by=.1)


diffEx_proteins <- data.frame(protein=LETTERS[1:1000],
                          adj.P.Val=sample(p.vals, size=1000, replace=TRUE),
                          logFC=sample(logFCs, size=1000, replace=TRUE))

function

mark_significants <- function(comparison){
comparison$paste0(comparison, "up_0.05") <- ifelse(comparison$adj.P.Val <= 0.05 & comparison$logFC >= 1, TRUE, FALSE)
comparison$paste0(comparison, "down_0.05") <- ifelse(comparison$adj.P.Val <= 0.05 & comparison$logFC <= -1, TRUE, FALSE)
comparison$paste0(comparison, "up_0.01") <- ifelse(comparison$adj.P.Val <= 0.01 & comparison$logFC >= 1, TRUE, FALSE)
comparison$paste0(comparison, "down_0.01") <- ifelse(comparison$adj.P.Val <= 0.01 & comparison$logFC <= -1, TRUE, FALSE)
}

usage

mark_significants(diffEx_proteins)

I get the error "Error in mark_significants(diffEx_proteins) : invalid function in complex assignment"

I would like to get the df with 4 added logical columns, indicating wether proteins reach the defined threshold levels.

Aucun commentaire:

Enregistrer un commentaire