mardi 30 août 2016

BASH - Change information in columns 2 by 2 using for loop and If statements

I have the following tab-separated file:

A1    A1    0       0       1       1       0 0     0 0     2 2     1 2
A2    A2    0       0       1       1       1 1     1 1     0 0     1 2
A3    A3    0       0       1       2       1 1     1 1     0 0     2 2
A4    A4    0       0       1       1       1 1     0 0     0 0     1 2

The idea is to modify the information between column 7 (included) and the end in the way that, for every row, if column 7 and 8:

  • equal “0 0”: don’t modify

  • equal “1 1”: don’t modify

  • equal “1 2” or “2 1”: change to “2 2”

  • equal “2 2”: don’t modify

And the same for the following columns (9 and 10, then 11 and 12, 13 and 14, and so on..).

I started to extract the columns I want to work on using the command:

awk '{for (i = 7; i <= NF; i++) printf $i " "; print ""}' test.ped > tmp_test.txt

Then I was thinking to use a for loop with If statements, with this general format:

for i between 7 and the end (for (i = 7; i <= NF)):
       if i and i+1 == “1 2”:
        replace by “2 2”
    elif i and i+1 == “2 1”:
        replace by “2 2”
    else
        pass
    i=i+2 (increase i to do the same for the next double columns)

But I am stuck here. Is the general format logical or is there a faster way to do the same? Am I going in the right direction?

The expected output (after merging the first 6 columns from the initial file and the ones that I subsetted and modified) is:

A1    A1    0       0       1       1       0 0     0 0     2 2     2 2
A2    A2    0       0       1       1       1 1     1 1     0 0     2 2
A3    A3    0       0       1       2       1 1     1 1     0 0     2 2
A4    A4    0       0       1       1       1 1     0 0     0 0     2 2

Thank you for your help!

Aucun commentaire:

Enregistrer un commentaire