I have the following tab-separated file:
A1 A1 0 0 1 1 0 0 0 0 2 2 1 2
A2 A2 0 0 1 1 1 1 1 1 0 0 1 2
A3 A3 0 0 1 2 1 1 1 1 0 0 2 2
A4 A4 0 0 1 1 1 1 0 0 0 0 1 2
The idea is to modify the information between column 7 (included) and the end in the way that, for every row, if column 7 and 8:
-
equal “0 0”: don’t modify
-
equal “1 1”: don’t modify
-
equal “1 2” or “2 1”: change to “2 2”
-
equal “2 2”: don’t modify
And the same for the following columns (9 and 10, then 11 and 12, 13 and 14, and so on..).
I started to extract the columns I want to work on using the command:
awk '{for (i = 7; i <= NF; i++) printf $i " "; print ""}' test.ped > tmp_test.txt
Then I was thinking to use a for loop with If statements, with this general format:
for i between 7 and the end (for (i = 7; i <= NF)):
if i and i+1 == “1 2”:
replace by “2 2”
elif i and i+1 == “2 1”:
replace by “2 2”
else
pass
i=i+2 (increase i to do the same for the next double columns)
But I am stuck here. Is the general format logical or is there a faster way to do the same? Am I going in the right direction?
The expected output (after merging the first 6 columns from the initial file and the ones that I subsetted and modified) is:
A1 A1 0 0 1 1 0 0 0 0 2 2 2 2
A2 A2 0 0 1 1 1 1 1 1 0 0 2 2
A3 A3 0 0 1 2 1 1 1 1 0 0 2 2
A4 A4 0 0 1 1 1 1 0 0 0 0 2 2
Thank you for your help!
Aucun commentaire:
Enregistrer un commentaire