In this code, I am trying to append the Gene description of the Gene IDs with the top 50 R-values. I am given two lists top50 and geneanno.
the list top50 looks like [0.5218951111466706, 'AHY40589'] and the list geneanno looks like ['AHY39293', 'Disulfide bond formation protein DsbB\n']
my code below shows how I append the list but I just can't seem to append the correct gene description corresponding to the gene id.
r_sorted = sorted(r_value)
unsorted = []
for s in r_sorted:
if s not in unsorted:
unsorted.append(s)
top50 = unsorted[::-1][:50] #top 50 with highest correlation
print(top50)
annotation = []
path = '/content/drive/abc_gene_anno.txt'
with open(path,'r') as f:
geneanno = [l.split('\t') for l in f] #obtain gene ids
for d in geneanno:
for c in top50:
if c[1] in d[0]:
annotation.append(d[1][:-1])
print(d)
print(c)
print(annotation)
output is
[[0.9999999999999999, 'AHY39286'], [0.939173984187146, 'AHY39284']]
['AHY39293', 'Disulfide bond formation protein DsbB\n']
[0.5218951111466706, 'AHY40589']
['Cysteine-rich domain', 'Cysteine-rich domain', 'FdhD/NarQ family', 'Protein of unknown function (DUF979)']
Using the Gene ID (AHY39286) with the highest r value for example, 
However, my code prints this instead 
Aucun commentaire:
Enregistrer un commentaire