mercredi 27 février 2019

combining ks.test, var.test, t.test and wilcox.test into a decision-tree like function or if else function in r

I have my data like:

df1 <- read.table(text = "A1 A2 A3 A4 B1 B2 B3 B4
1 2 4 12 33 17 77 69
34 20 59 21 90 20 43 44
11 16 23 24 19 12 55 98
29 111 335 34 61 88 110 320
51 58 45 39 55 87 55 89", stringsAsFactors = FALSE, header = TRUE, row.names=c("N1","N2","N3","N4","N5"))

I want to compare the values between A and B, by row. First I want to test whether the distribution of A and B is normal distributed by ks.test. Second I will test whether the variation between A and B is different by var.test. For non-normal distributed results (p ks.test <0.05), I will conduct the wilcox test by wilcox.test. For normal distributed results, I will conduct the ttest by separating them into equal and unequal variance ttest by t.test. Finally I combine all the results.

What I have done is, first, I set up five functions of ks.test, var.test, wilcox.test and two t.test:

kstest<-function(df, grp1, grp2) {
  x = df[grp1]
  y = df[grp2]
  x = as.numeric(x)
  y = as.numeric(y)  
  results = ks.test(x,y,alternative = c("two.sided"))
  results$p.value
}
vartest<-function(df, grp1, grp2) {
  x = df[grp1]
  y = df[grp2]
  x = as.numeric(x)
  y = as.numeric(y)  
  results = var.test(x,y,alternative = c("two.sided"))
  results$p.value
}
wilcox<-function(df, grp1, grp2) {
  x = df[grp1]
  y = df[grp2]
  x = as.numeric(x)
  y = as.numeric(y)  
  results = wilcox.test(x,y,alternative = c("two.sided"))
  results$p.value
}
ttest_equal<-function(df, grp1, grp2) {
  x = df[grp1]
  y = df[grp2]
  x = as.numeric(x)
  y = as.numeric(y)  
  results = t.test(x,y,alternative = c("two.sided"),var.equal = TRUE)
  results$p.value
}

ttest_unequal<-function(df, grp1, grp2) {
  x = df[grp1]
  y = df[grp2]
  x = as.numeric(x)
  y = as.numeric(y)  
  results = t.test(x,y,alternative = c("two.sided"),var.equal = FALSE)
  results$p.value
}

Then I calculated the p value of ks.test and var.test for subsetting the data:

ks_AB<-apply(df1,1,kstest,grp1=grepl("^A",colnames(df1)),grp2=grepl("^B",colnames(df1)))

ks_AB
[1] 0.02857143 0.69937420 0.77142857 0.77142857 0.21055163

var_AB<-apply(df1,1,vartest,grp1=grepl("^A",colnames(df1)),grp2=grepl("^B",colnames(df1)))

var_AB
[1] 0.01700168 0.45132827 0.01224175 0.76109048 0.19561742

df1$ks_AB<-ks_AB
df1$var_AB<-var_AB

Then I subset the data by what I have described above:

df_wilcox<-df1[df1$ks_AB<0.05,]
df_ttest_equal<-df1[df1$ks_AB>=0.05 & df1$var_AB>=0.05,]
df_ttest_unequal<-df1[df1$ks_AB>=0.05 & df1$var_AB<0.05,]

Finally I calculate the corresponding test to the new dataframes, and merge the results

wilcox_AB<-as.matrix(apply(df_wilcox,1,wilcox,grp1=grepl("^A",colnames(df_wilcox)),grp2=grepl("^B",colnames(df_wilcox))))

ttest_equal_AB<-as.matrix(apply(df_ttest_equal,1,ttest_equal,grp1=grepl("^A",colnames(df_ttest_equal)),grp2=grepl("^B",colnames(df_ttest_equal))))

ttest_unequal_AB<-as.matrix(apply(df_ttest_unequal,1,ttest_unequal,grp1=grepl("^A",colnames(df_ttest_unequal)),grp2=grepl("^B",colnames(df_ttest_unequal))))

p_value<-rbind(wilcox_AB,ttest_equal_AB,ttest_unequal_AB)
colnames(p_value)<-c("pvalue")

df<-merge(df1,p_value,by="row.names")

df
  Row.names A1  A2  A3 A4 B1 B2  B3  B4      ks_AB     var_AB     pvalue
1        N1  1   2   4 12 33 17  77  69 0.02857143 0.01700168 0.02857143
2        N2 34  20  59 21 90 20  43  44 0.69937420 0.45132827 0.39648631
3        N3 11  16  23 24 19 12  55  98 0.77142857 0.01224175 0.25822839
4        N4 29 111 335 34 61 88 110 320 0.77142857 0.76109048 0.85703939
5        N5 51  58  45 39 55 87  55  89 0.21055163 0.19561742 0.06610608

I know my code is tedious and stupid, but it works for my data very well. I am now want to know I do I combine my above code to a new decision-tree-like function of if else function, which will like: enter image description here

Aucun commentaire:

Enregistrer un commentaire