My data frame consists of variable gene expression values recorded at two time points for one of three groups ( either control which is 0, ulcerative colitis which is 1 and Crohns which is 2). Where the ID is the same number, the top value is the first time point and the second value is the second time point.
Please find below my data frame to help explain; Dput(data)
structure(list(X = c(0, 0, 1, 1, 2, 2, 0, 0, 1, 1, 0, 0, 2, 2,
1, 1, 0, 0, 0, 0, 2, 2, 1, 1, 2, 2), ID = c(44, 44, 68, 68, 77,
77, 119, 119, 453, 453, 654, 654, 776, 776, 888, 888, 876, 876,
899, 899, 901, 901, 987, 987, 990, 990), Gene1 = c(5.54e-05,
5.58e-06, 9.74e-05, 1.33e-06, 1.29e-05, 7.22e-06, 0.000215899,
3.6e-06, 0.000146724, 1.53e-05, 0.000913187, 1.9e-06, 0.007421464,
0.000648006, 5.1e-06, 6.15e-06, 4.73e-06, 0.000119899, 0.000884487,
0.000850632, 0.000236607, 7.36e-06, 8.48e-06, 2.63e-05, 0.001368493,
1.12e-05), Gene2 = c(0.006338532, 0.006162866, 0.040695132, 0.013255055,
0.033086619, 0.074158811, 0.004967497, 0.01247423, 0.043201417,
0.011470285, 0.038447751, 0.018825124, 0.027701807, 0.063373762,
0.005374513, 0.048876252, 0.009959848, 0.004434078, 0.004176856,
0.015288913, 0.060226053, 0.05128922, 0.006557554, 0.017460326,
0.007684784, 0.002107577), Gene3 = c(0.076186393, 0.037631043,
0.052159393, 0.012179365, 0.047199766, 0.022458838, 0.030261613,
0.00626629, 0.028664896, 0.02285845, 0.02801855, 0.017681676,
0.040563592, 0.029791175, 0.034778056, 0.019318473, 0.011847912,
0.009614177, 0.064027542, 0.035334149, 0.041638955, 0.056015014,
0.03304865, 0.017660205, 0.030187166, 0.057919531), Gene4 = c(0.000112884,
0.000920886, 0.001081748, 0.000195159, 0.001678445, 0.000171612,
0.000191702, 0.000560035, 0.000384056, 0.000454783, 0.000723385,
0.000203897, 0.000973337, 0.000822171, 0.000620526, 0.000260769,
0.000214607, 0.002077443, 0.00065843, 0.000403672, 0.000378651,
0.000409306, 0.001722587, 0.000213785, 0.000176643, 0.002022878
), Gene5 = c(0.053029236, 0.022594965, 0.011967636, 0.026851113,
0.03773798, 0.031356268, 0.10410326, 0.063265216, 0.018028454,
0.116038001, 0.00572817, 0.053635968, 0.059126941, 0.011835241,
0.004639624, 0.014302911, 0.082948853, 0.015202238, 0.021295431,
0.043342, 0.008153675, 0.015613747, 0.043289609, 0.048834321,
0.019144763, 0.059809871), Gene6 = c(0.04082966, 0.02986135,
0.061405171, 0.006142619, 0.009767602, 0.035427993, 0.03729329,
0.01309739, 0.00221718, 0.040211393, 0.006303841, 0.030146612,
0.032033879, 0.024590398, 0.077991721, 0.017215666, 0.014731147,
0.04802582, 0.03168714, 0.03244771, 0.032278613, 0.017301885,
0.013450667, 0.040207755, 0.042669615, 0.03456749), Gene7 = c(1.93e-05,
4.72e-06, 5.41e-05, 0, 1.91e-05, 9.33e-07, 5.98e-06, 0, 1.05e-06,
4.1e-07, 7.72e-05, 4.07e-07, 0.000585154, 0.000246992, 7.86e-06,
3.13e-06, 2.14e-06, 7.56e-06, 9.29e-05, 0.000116024, 5.51e-05,
7.79e-06, 6.65e-06, 2.06e-06, 0.000104342, 4.16e-06), Gene8 = c(0.000197502,
0.00015135, 0.000107306, 6.54e-05, 0.000225564, 0.000142631,
0.000168873, 3.5e-05, 0.000365242, 0.000174254, 0.000339327,
8.7e-05, 0.000136679, 0.000156634, 0.000224181, 0.000205305,
8.87e-05, 0.000305774, 0.000133615, 0.00015118, 0.000107229,
0.000162579, 0.000152249, 6.88e-05, 0.000113864, 0.000249258),
Gene9 = c(0.00079296, 0.007640951, 0.004937327, 0.000422361,
0.000953513, 0.000951187, 0.000671306, 0.001106406, 0.002606568,
0.003006867, 0.001911646, 0.00135411, 0.012461738, 0.000434917,
0.00237646, 0.007857561, 0.000436889, 0.00048816, 0.000348146,
0.000931449, 0.000323974, 0.004945321, 0.000693845, 0.000479572,
0.000843415, 0.001419675), Gene10 = c(8.16e-05, 6.63e-05,
0.000101583, 3.08e-05, 0.000147039, 5.13e-05, 0.000109479,
2.39e-05, 0.000225475, 4.28e-05, 0.000230785, 2.1e-05, 0.0001356,
0.000124173, 0.000245128, 0.000275446, 3.18e-05, 0.00017516,
0.000180192, 0.000246669, 0.000378708, 4.35e-05, 0.000267824,
7.2e-05, 7.65e-05, 8.79e-05), Gene11 = c(0.000111462, 3.17e-05,
0.000200096, 3.12e-06, 8.75e-05, 3.11e-06, 6.89e-06, 0.000165936,
5.98e-05, 0.000201355, 5.92e-06, 2.57e-05, 2.53e-05, 3.27e-05,
0.000137446, 0.000134402, 5.86e-07, 3.9e-05, 0.018886909,
0.050343466, 4.15e-05, 1.67e-05, 0.000172614, 4.95e-05, 1.27e-05,
9.85e-05), Gene12 = c(0.002708402, 0.003215586, 0.00457116,
0.001713549, 0.024353184, 0.006660748, 0.003198887, 0.003094386,
0.004789163, 0.002816955, 0.021587313, 0.002084725, 0.00378062,
0.021751495, 0.009097143, 0.012216225, 0.001125765, 0.013043534,
0.005514773, 0.008323962, 0.026898764, 0.002149135, 0.008021623,
0.006673567, 0.005391139, 0.018578559), Gene13 = c(0.00080595,
0.001289505, 0.002451416, 0.000234107, 0.001694733, 0.000288175,
0.002357478, 0.000856129, 0.00159752, 0.000117538, 0.000166581,
0.000367288, 0.001039841, 0.001779528, 0.000438092, 0.001012515,
0.000529936, 0.003193086, 0.002562702, 0.00277401, 0.003013136,
0.001349197, 0.001646296, 0.001114222, 0.001207882, 0.002804949
)), class = "data.frame", row.names = c(NA, 26L))
I have calculated the distance between the samples using this code;
distances <- as.matrix(vegdist(data[,3:15], method="euclidean", diag=F))
I now need to formulate a table, where each of the three columns (corresponding to either control, ulcerative colitis or Crohn’s) contains the distances between the first and second-time points. So control will have 5, UC will have 4 and Crohn’s will have 4.
I understand that I will need a for loop and will need to also use if. I am struggling as to where to start with this. Please could anybody advise?