vendredi 22 novembre 2019

How to calculate correlation against 2 data frames and output by condition using R?

I am going to calculate correlation between 2 gene expression data frames, which are protein and RNA data frames. They are showed in pic here enter image description here Rownames are gene, and colnames are samples. let's say, gene1 represents gene in 1st df, gene2 represents gene in 2nd df. A. I need to calculate correlation of gene1 and gene2. B. And I want to filter that |correlation(gene1, gene1)|<0.3 & |correlation(gene1, gene2)|>0.6, which means correlation of same gene in 2 df smaller than 0.3, and correlation of different gene in 2 df higher than 0.6 C. Return a table with columns are 'gene, gene, correlation'

This is my code, which cannot make what I need, and there are more than 10,000 rows and 77 cols of each data frames, the job was been killed in GPU, which run more than 2 hours, please make code easy and use less memory as less as possible.

func.cor3 <- function(x,y){#func.cor
na1 <- which(is.na(x)==TRUE)
na2 <- which(is.na(y)==TRUE)
nas <- union(na1,na2)
if(length(nas)!=0){
  x <- x[-nas]
  y <- y[-nas]
}


nn <- cor(x,y)
nn <- round(nn,5)
nn <- format(nn, nsmall = 5)

if(nn>0.6 & nn< -0.6){
  w <-as.character(str_c(nn,"_r1p2>0.6"))
  return(w)
}
else if(nn<0.3 & nn> -0.3){
  w <-as.character(str_c(nn,"_r1p1<0.3"))
  return(w)
}else{

}

P2 <- P[,-1]
rownames(P2) <- P[,1]

nn <- setdiff(colnames(P2),colnames(R2))
ns <- vector();n=1
for (i in 1:length(nn)) {
nn.i <- nn[i]
w.i <- which(colnames(P2)==nn.i)
ns[n] <- w.i
n=n+1
}
P2 <- P2[,-ns]

nk <- colnames(P2)
ns <- order(nk)
P2 <- P2[,ns]


P2_1000 <- P2[1:1000,1:77]# try first 1000 rows of data

R <- read.csv(file = "RNA_Breast_2.csv",header = TRUE)
R2 <- R[,-1]
rownames(R2) <- R[,1]

nk <- colnames(R2)
ns <- order(nk)
R2 <- R2[,ns]

R2_1000 <- R2[1:1000,]

D_M <- matrix(rep(NA,3*nrow(R2_1000)*nrow(P2_1000)),ncol =3 )
colnames(D_M) <- c("gene1_RNA","gene_Protein","correlation")

n=1
for (i in 1:nrow(R2_1000)) {
for (j in 1:nrow(P2_1000)) {
  D_M[n,1] <- rownames(R2_1000)[i]
  D_M[n,2] <- rownames(P2_1000)[j]
  D_M[n,3] <- func.cor3(as.numeric(R2_1000[i,]),as.numeric(P2_1000[j,]))
  n=n+1
}
}

D_M

Aucun commentaire:

Enregistrer un commentaire