dimanche 24 octobre 2021

Make a function with nested for loop to get all overlaps between DNA sequences , bioinformatic

I have written function get_overlap to evaluate the overlap between all pairs of reads in both left-right and right-left orientations. And now I have to use this function to write a function get_all_overlaps, that must return:

A dictionary of dictionaries, specifying the number of overlapping bases for a pair of reads in a specific left-right orientation. Computing the overlap of a read to itself is meaningless and must not be included. Assuming the resulting dictionary of dictionaries is called d, then d['Read2'] will be a dictionary where keys are the names of reads that have an overlap with read 'Read2' when 'Read2' is put in the left position, and the values for these keys are the number of overlapping bases for those reads.

Example usage: assuming that reads is a dictionary returned by read_data then:

get_all_overlaps(reads)

should return the following dictionary of dictionaries (but not necessarily with the same ordering of the key-value pairs):

{'Read1': {'Read3': 0, 'Read2': 1, 'Read5': 1, 'Read4': 0, 'Read6': 29},
'Read3': {'Read1': 0, 'Read2': 0, 'Read5': 0, 'Read4': 1, 'Read6': 1},
'Read2': {'Read1': 13, 'Read3': 1, 'Read5': 21, 'Read4': 0, 'Read6': 0},
'Read5': {'Read1': 39, 'Read3': 0, 'Read2': 1, 'Read4': 0, 'Read6': 14},
'Read4': {'Read1': 1, 'Read3': 1, 'Read2': 17, 'Read5': 2, 'Read6': 0},
'Read6': {'Read1': 0, 'Read3': 43, 'Read2': 0, 'Read5': 0, 'Read4': 1}}

Below is a dictionary, where the keys are the names of reads and the values are the associated read sequences and my code to get_overlap

    read_map = {'Read1': 'GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC',
'Read3': 'GTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGTCGTGAACACATCAGT',
'Read2': 'CTTTACCCGGAAGAGCGGGACGCTGCCCTGCGCGATTCCAGGCTCCCCACGGG',
'Read5': 'CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC',
'Read4': 'TGCGAGGGAAGTGAAGTATTTGACCCTTTACCCGGAAGAGCG',
'Read6': 'TGACAGTAGATCTCGTCCAGACCCCTAGCTGGTACGTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGT'}


def get_overlap (left,right):
   max_overlap = min(len(left), len(right))
   for i in range(max_overlap):
      ovl = max_overlap - I
      if left[-ovl:] == right[:ovl]:
        return left[-ovl:]
return ''

The hints I got from the book: I have to use the get_overlap function I just made to find the overlap between a pair of reads. To generate all combinations of reads I need two for-loops. One looping over reads in left positions and another (inside the first one) looping over reads in right position. But we do not want the overlap of a read to itself, so there should be an if-statement in the checking of the left and right reads are the same.

Even though we've got those hints, I have to admit that I'm still confused and lost.Hope someone can help me :D

Aucun commentaire:

Enregistrer un commentaire