This is kinda more complicated than that, but for short, I am trying to create a consensus sequences for families of sequences (strings made only out of A, C, G and T characters), and I can't identify where the function I made fails. Here it is :
SEQUENCE calculate_cons(FAMILY FAM)
{
SEQUENCE cons;
int maxlenght=seq_with_max_char(FAM);
for(int i=0; i<maxlenght; i++)
{
int nA=0;
int nC=0;
int nG=0;
int nT=0;
for(int j=0; j<FAM.size; j++)
{
if(FAM.seq[j].c[i]=='A')
{
nA++;
}
if(FAM.seq[j].c[i]=='C')
{
nC++;
}
if(FAM.seq[j].c[i]=='G')
{
nG++;
}
if(FAM.seq[j].c[i]=='T')
{
nT++;
}
}
if((nA==nC) || (nA==nG) || (nA==nT))
{
cons.c[i]='.';
}
else
{
if((nA>nC) && (nA>nG) && (nA>nT))
{
cons.c[i]='A';
}
if((nC>nA) && (nC>nG) && (nC>nT))
{
cons.c[i]='C';
}
if((nG>nA) && (nG>nC) && (nG>nT))
{
cons.c[i]='G';
}
if((nT>nA) && (nT>nC) && (nT>nG))
{
cons.c[i]='T';
}
}
}
cons.lenght=maxlenght;
cons.ispartfam=true;
return cons;
}
The issue : With this code, consensus sequences are only be made out of 'A' and '.'. As an exemple, if a family contains :
TCCTATGGAATCTTTTTA
TTCTATGGAATCTTTTTA
The consensus sequence will be :
....A...AA.......A
The function writes '.' when there is not 2 times A, and it writes 'A' otherwise. The line where it fails is probably if((nA==nC) || (nA==nG) || (nA==nT)) since if I compare with nC, the consensus family will only contain 'C' and '.'.
Aucun commentaire:
Enregistrer un commentaire