threshold <- .3
l <- list()
for(i in 1:length(mitochondrial_genes)) {
if(i < length(mitochondrial_genes)) {
for(j in (i+1) : length(mitochondrial_genes)) {
a <- cor.test(mitochondrial_genes[,i], mitochondrial_genes[,j])
if(grepl("*",names(mitochondrial_genes)[i])|grepl("*",names(mitochondrial_genes)[j])) {
if(a$estimate > threshold){
l <- c(l, list(c(names(mitochondrial_genes)[i], names(mitochondrial_genes)[j],a$estimate)))
}
}
}
}
}
mitochondrial_genes is a big dataset consisting of a columns representing genes. Some of the gene names (column names) have the * symbol next to them. I essentially want to return a list of gene pairs that meet some minimum threshold value for a correlation test.
The code runs successfully, but it generates all possible pairs instead of just the ones that have at least one gene with a * next to its name. Basically, this part of the code seems to be the issue:
if(grepl("*",names(mitochondrial_genes)[i])|grepl("*",names(mitochondrial_genes)[j]))
Am I doing anything wrong? the if statement works in isolation when I test it out on the terminal, and the code seems to be generating all the pairs instead of filtering them based on the if statement which is confusing.
This is an example of what mitochondrial_genes looks like. https://i.stack.imgur.com/elMka.png
Aucun commentaire:
Enregistrer un commentaire