dimanche 27 novembre 2016

How to break-continue in for-loop (with if-else) when values changes in a row/column of a database?

In the given data set I need to multiply values from different blocks. In the original data I only had first 5 columns. I was able to update and add the last two columns using a for (with if-elif inside) loop. But, I also need to multiply the values from the different set (X and Y) based on their set status. But, the problem is I multiply values between rows.

So,for X(set) I have to multiply: 0.25*0.1*0.83 (since they belong to same block, set(index i.e before the pipe (|) sign, and group (X group)

X            value_X                Y             value_Y               block    set      X(set)       Y(set)
A,T,C        0.25,0.6,0.15          A,C,G         0.3,0.3,0.4           2480     A|C    0.25|0.15     0.3|0.3
G,T,C,A      0.25,0.15,0.1,0.5      G,T,C,A       0.21,0.3,0.19,0.3     2480     C|T    0.1|0.15      0.19|0.3
A,T,G        0.3,0.6,0.1            A,C,T         0.4,0.5,0.1           2480     C|T    0.083|0.6     0.5|0.1
C,A,T        0.26,0.31,0.43         C,T,G         0.39,0.43,0.18        2651     T|C    0.43|0.26     0.43|0.39
T,A,G        0.55,0.11,0.34         T,A,C,G       0.21,0.17,0.26,0.36   2651     A|C    0.11|0.083    0.17|0.26
G,A,C,T      0.23,0.23,0.34,0.2     A,T,C         0.21,0.36,0.43        2651     G|A    0.23|0.23     0.083|0.21

My syntax was as follows:

haplotype_freq = 1.0

haplotype_set_A = []

haplotype_set_B = []

hapA_freq_X = []

hapB_freq_X = []

hapA_freq_Y = []

hapB_freq_Y = []

test_data02 = open("multiply_test_level02.txt", "r+")
header = test_data02.readline()
#this separates the header (which is the very first line in the file)`

#load the rest of the data onto a variable called 'data'`
data = test_data02.read().rstrip("\n")
#rstrip removes the very last (\n) of the file`

lines = data.split("\n")

for each_line in lines:
    column = each_line.split("\t")

#Read different allele variables and frequency for both populations`
    X_list = column[0].split(","); X_values = column[1].split(",")
    Y_list = column[2].split(","); Y_values = column[3].split(",")


# Read the genotype of the phased haplotype in that line (position)
genotype = column[5]

#Read the first alleles from the genotype
allele01 = genotype[0]
allele02 = genotype[2]

# Analyzing allele01 from X_list
# check the index value of the first allele in X_list
# and its corresponding value_X based on index
if allele01 in X_list:
    allele01_index =  X_list.index(allele01)

    # now, find the value at that index position
    # at another column (value_X)
    allele01_value = value_X[allele01_index]

elif allele01 not in X_list:
    allele01_value = str((1/12).__round__(3))
    # This value (1/12) will be default if value for the matching set isn't found in the group (X or Y)

# Analyzing allele02 from X_list
if allele02 in X_list:
    allele02_index = X_list.index(allele02)
    allele02_value = value_X[allele02_index]

elif allele02 not in X_list:
    allele02_value = str((1/12).__round__(3))


## Repeat the above procedure for allele and frequency for group Y
    if allele01 in Y_list:
    allele01_index =  Y_list.index(allele01)

    # now, find the value at that index position
    # at another column (value_Y)
    allele01_value = value_Y[allele01_index]

elif allele01 not in Y_list:
    allele01_value = str((1/12).__round__(3))

# Analyzing allele02 from Y_list
if allele02 in Y_list:
    allele02_index = Y_list.index(allele02)
    allele02_value = value_Y[allele02_index]

elif allele02 not in Y_list:
    allele02_value = str((1/12).__round__(3))


# Generating the haplotype (i.e column 6 and 7)
# But, I am not showing the code to on how to generate 6 and 7

haplotype_set_A = haplotype_set_A + allele01.split()
haplotype_set_B = haplotype_set_B + allele02.split()
hapA_freq_X = hapA_freq_X + allele01_freq_My.split()
hapB_freq_X = hapB_freq_X + allele02_freq_My.split()
hapA_freq_Y = hapA_freq_Y + allele01_freq_Sp.split()
hapB_freq_Y = hapB_freq_Y + allele02_freq_Sp.split()

print(haplotype_set_A)

print(haplotype_set_B)

## and so on for frequency........

So, how to inject the break and continue function to generate separate haplotype and, or frequency when block value changes in the for loop? I tried to look through examples but it isn't helping. Also, i don't need solution on pandas since my find and matching of allele, allele index and then frequency based on this index is best done using if-else.

Thanks much in advance !

Aucun commentaire:

Enregistrer un commentaire