In the given data set I need to multiply values from different blocks. In the original data I only had first 5 columns. I was able to update and add the last two columns using a for (with if-elif inside) loop. But, I also need to multiply the values from the different set (X and Y) based on their set status. But, the problem is I multiply values between rows.
So,for X(set) I have to multiply: 0.25*0.1*0.83 (since they belong to same block, set(index i.e before the pipe (|) sign, and group (X group)
X value_X Y value_Y block set X(set) Y(set)
A,T,C 0.25,0.6,0.15 A,C,G 0.3,0.3,0.4 2480 A|C 0.25|0.15 0.3|0.3
G,T,C,A 0.25,0.15,0.1,0.5 G,T,C,A 0.21,0.3,0.19,0.3 2480 C|T 0.1|0.15 0.19|0.3
A,T,G 0.3,0.6,0.1 A,C,T 0.4,0.5,0.1 2480 C|T 0.083|0.6 0.5|0.1
C,A,T 0.26,0.31,0.43 C,T,G 0.39,0.43,0.18 2651 T|C 0.43|0.26 0.43|0.39
T,A,G 0.55,0.11,0.34 T,A,C,G 0.21,0.17,0.26,0.36 2651 A|C 0.11|0.083 0.17|0.26
G,A,C,T 0.23,0.23,0.34,0.2 A,T,C 0.21,0.36,0.43 2651 G|A 0.23|0.23 0.083|0.21
My syntax was as follows:
haplotype_freq = 1.0
haplotype_set_A = []
haplotype_set_B = []
hapA_freq_X = []
hapB_freq_X = []
hapA_freq_Y = []
hapB_freq_Y = []
test_data02 = open("multiply_test_level02.txt", "r+")
header = test_data02.readline()
#this separates the header (which is the very first line in the file)`
#load the rest of the data onto a variable called 'data'`
data = test_data02.read().rstrip("\n")
#rstrip removes the very last (\n) of the file`
lines = data.split("\n")
for each_line in lines:
column = each_line.split("\t")
#Read different allele variables and frequency for both populations`
X_list = column[0].split(","); X_values = column[1].split(",")
Y_list = column[2].split(","); Y_values = column[3].split(",")
# Read the genotype of the phased haplotype in that line (position)
genotype = column[5]
#Read the first alleles from the genotype
allele01 = genotype[0]
allele02 = genotype[2]
# Analyzing allele01 from X_list
# check the index value of the first allele in X_list
# and its corresponding value_X based on index
if allele01 in X_list:
allele01_index = X_list.index(allele01)
# now, find the value at that index position
# at another column (value_X)
allele01_value = value_X[allele01_index]
elif allele01 not in X_list:
allele01_value = str((1/12).__round__(3))
# This value (1/12) will be default if value for the matching set isn't found in the group (X or Y)
# Analyzing allele02 from X_list
if allele02 in X_list:
allele02_index = X_list.index(allele02)
allele02_value = value_X[allele02_index]
elif allele02 not in X_list:
allele02_value = str((1/12).__round__(3))
## Repeat the above procedure for allele and frequency for group Y
if allele01 in Y_list:
allele01_index = Y_list.index(allele01)
# now, find the value at that index position
# at another column (value_Y)
allele01_value = value_Y[allele01_index]
elif allele01 not in Y_list:
allele01_value = str((1/12).__round__(3))
# Analyzing allele02 from Y_list
if allele02 in Y_list:
allele02_index = Y_list.index(allele02)
allele02_value = value_Y[allele02_index]
elif allele02 not in Y_list:
allele02_value = str((1/12).__round__(3))
# Generating the haplotype (i.e column 6 and 7)
# But, I am not showing the code to on how to generate 6 and 7
haplotype_set_A = haplotype_set_A + allele01.split()
haplotype_set_B = haplotype_set_B + allele02.split()
hapA_freq_X = hapA_freq_X + allele01_freq_My.split()
hapB_freq_X = hapB_freq_X + allele02_freq_My.split()
hapA_freq_Y = hapA_freq_Y + allele01_freq_Sp.split()
hapB_freq_Y = hapB_freq_Y + allele02_freq_Sp.split()
print(haplotype_set_A)
print(haplotype_set_B)
## and so on for frequency........
So, how to inject the break and continue function to generate separate haplotype and, or frequency when block value changes in the for loop? I tried to look through examples but it isn't helping. Also, i don't need solution on pandas since my find and matching of allele, allele index and then frequency based on this index is best done using if-else.
Thanks much in advance !