I have the following tab-separated file:
A1 A1 0 0 2 1 1 1 1 1 1 1 2 1 1 1
A2 A2 0 0 2 1 1 1 1 1 1 1 1 1 1 1
A3 A3 0 0 2 2 1 1 2 2 1 1 1 1 1 1
A5 A5 0 0 2 2 1 1 1 1 1 1 1 2 1 1
The idea is to summarise the information between column 7 (included) and the end in a new column that is added at the end of the file.
To do so, these are the rules:
-
If the total number of “2”s in the row (between column 7 and the end) is 0: add “1 1” to the new last column
-
If the total number of “2”s in the row (between column 7 and the end) is 1: add “1 2” to the new last column
-
If the total number of “2”s in the row (between column 7 and the end) is 2 or more: add “2 2” to the new last column
I started to extract the columns I want to work on using the command:
awk '{for (i = 7; i <= NF; i++) printf $i " "; print ""}' myfile.ped > tmp_myfile.txt
Then I count the number of occurrence in each row using:
sed 's/[^2]//g' tmp_myfile.txtt | awk '{print NR, length }' > tmp_occurences.txt
Which outputs:
1 1
2 0
3 2
4 1
Then my idea was to write a for loop that loops through the lines to add the new summary column. I was thinking in this kind of structure, based on what I found here: http://ift.tt/2c16YTR:
while read line ;
do
set $line
If ["$2"==0]
then
$3=="1 1"
elif ["$2"==1 ]
then
$3=="1 2”
elif ["$2">=2 ]
then
$3==“2 2”
else
print ["error"]
fi
done < tmp_occurences.txt
But I am stuck here. Do I have to create the new column before starting the loop? Am I going in the right direction?
Ideally, the final output (after merging the first 6 columns from the initial file and the summary column) would be:
A1 A1 0 0 2 1 1 2
A2 A2 0 0 2 1 1 1
A3 A3 0 0 2 2 2 2
A5 A5 0 0 2 2 1 2
Thank you for your help!
Aucun commentaire:
Enregistrer un commentaire