I have the following sequence:
seq = [['ATG','ATG','ATG','ATG'],['GAC','GAT','GAA','CCT'],['GCC','GCG','GCA','GCT']]
Here is a dictionary key that stores the value of amino acid for each of the codons (Triplet bases like ATG, GCT
etc).
aminoacid = {'TTT' : 'F','TTC' : 'F','TTA' : 'L','TTG' : 'L','CTT' : 'L','CTC' : 'L','CTA' : 'L','CTG' : 'L','ATT' : 'I','ATC' : 'I','ATA' : 'I','ATG' : 'M','GTT' : 'V','GTC' : 'V','GTA' : 'V','GTG' : 'V','TCT' : 'S','TCC' : 'S','TCA' : 'S','TCG' : 'S','CCT' : 'P','CCC' : 'P','CCA' : 'P','CCG' : 'P','ACT' : 'T','ACC' : 'T','ACA' : 'T','ACG' : 'T','GCT' : 'A','GCG' : 'A','GCA' : 'A','GCG' : 'A','TAT' : 'Y','TAC' : 'Y','TAA' : 'STOP','TAG' : 'STOP','CAT' : 'H','CAC' : 'H','CAA' : 'Q','CAG' : 'Q','AAT' : 'N','AAC' : 'N','AAA' : 'K','AAG' : 'K','GAT' : 'D','GAC' : 'D','GAA' : 'E','GAG' : 'E','TGT' : 'C','TGC' : 'C','TGA' : 'STOP','TGG' : 'W','CGT' : 'R','CGC' : 'R','CGA' : 'R','CGG' : 'R','AGT' : 'S','AGC' : 'S','AGA' : 'R','AGC' : 'R','CGT' : 'G','GGC' : 'G','GGA' : 'G','GGG' : 'G',}
As one can see several codons can code for the same aminoacid (eg. GGT,GGC,GGA, GGG etc all code for Glycine (G)
). These are Synonymous (PSyn) and if codons code for different amino acids they are Non-Synonymous (PNonsyn)
In this code, I need to do the following:
-
For each element in the list of lists, if there is a change in the bases AND they all code for the same amino acid, then increase count of PSyn by 1 and if it codes for different amino acids increment count PNonsyn by 1
Here,
ATG all code for M #However, all are ATG's no change in bases. So no increment in count
GAC, GAT for D; GAA for E; and CCT for P #Codes for three different amino acids, increment count by 1
GGT,GGC,GGA, GGG for G #Different bases but all code for same amino acids, increment count by 1
OutPut: CountPsyn = 1
CountPNonsyn = 1
-
Generate a list of amino acids that corresponds to the above seq. such that:
Output : ['ATG','nonsyn','G'] #For sites with different aminoacids, the list should say nonsyn and for sites which had identical bases it should list the bases
I need help modifying the following code to get the program to work. I am not confident on how to call values from dictionary and check them against all the elements. Code Attempted:
countPsyn = 0
countPnonsyn = 0
listofaa =[]
for i in seq:
for base, value in enumerate(i):
if value[i] == value[i+1]: #eg. ['ATG','ATG','ATG','ATG']
listofaa.append(value)
if value[i] != value[i+1]:
if aminoacid[value][i] == aminoacid[value][i+1]: #eg.['GCC','GCG','GCA','GCT']
countPsyn =+ 1
listofaa.append(aminoacid)
else: #eg. ['GAC','GAT','GAA','CCT']
countPnonsyn =+ 1
listofaa.append('nonsyn')
File Output can be found [here][1]