mercredi 28 mars 2018

awk to create new variables based on conditions of other columns

I have a file input.txt (~100'000 lines) with this structure:

Z0        Z1        Z3
0.9746    0.0254    0.0000     
0.0032    0.0000    0.9433  
0.2464    0.5603    0.9008 
0.4034    0.4982    0.0069 
0.0072    0.9996    0.0472 
...       ...       ...

And I want to create a new file output.txt with an additional column named SCORE based on the following conditions:

  • SCORE = 1 if: 0.17 ≤ Z0 ≤ 0.33 and 0.40 ≤ Z1 ≤ 0.60
  • SCORE = 2 if: 0.40 ≤ Z0 ≤ 0.60 and 0.40 ≤ Z1 ≤ 0.60
  • SCORE = 3 if: Z0 ≤ 0.05 and Z1 ≥ 0.95 and Z2 ≤ 0.05
  • SCORE = 4 if: Z0 ≤ 0.05 and Z1 ≤ 0.05 and Z2 ≥ 0.95
  • SCORE = 5 if the other 4 conditions did not apply.

output.txt would look like this:

Z0        Z1        Z3         SCORE
0.9746    0.0254    0.0000     5
0.0032    0.0000    0.9433     4
0.2464    0.5603    0.9008     1
0.4034    0.4982    0.0069     2
0.0072    0.9996    0.0472     3           
...       ...       ...

Here is what I tried:

awk 'NR==1{$4="SCORE";print;next} \
  0.17<$1 && $1<0.33 && 0.40<$2 && $2<0.60 {$4="1"} \
  0.40<$1 && $1<0.60 && 0.40<$2 && $2<0.60 {$4="2"} \
  $1<=0.05 && $2>=0.95 && $3<=0.05 {$4="3"} \
  $1<=0.05 && $2<=0.05 && $3>=0.95 {$4="4"} \
  *other* 1' input.txt > output.txt

However, something is wrong in the first 5 command lines and I don't know how to write the last condition (for score 5) in the last line.

Aucun commentaire:

Enregistrer un commentaire