I have a file input.txt
(~100'000 lines) with this structure:
Z0 Z1 Z3
0.9746 0.0254 0.0000
0.0032 0.0000 0.9433
0.2464 0.5603 0.9008
0.4034 0.4982 0.0069
0.0072 0.9996 0.0472
... ... ...
And I want to create a new file output.txt
with an additional column named SCORE
based on the following conditions:
- SCORE = 1 if:
0.17 ≤ Z0 ≤ 0.33
and0.40 ≤ Z1 ≤ 0.60
- SCORE = 2 if:
0.40 ≤ Z0 ≤ 0.60
and0.40 ≤ Z1 ≤ 0.60
- SCORE = 3 if:
Z0 ≤ 0.05
andZ1 ≥ 0.95
andZ2 ≤ 0.05
- SCORE = 4 if:
Z0 ≤ 0.05
andZ1 ≤ 0.05
andZ2 ≥ 0.95
- SCORE = 5 if the other 4 conditions did not apply.
output.txt
would look like this:
Z0 Z1 Z3 SCORE
0.9746 0.0254 0.0000 5
0.0032 0.0000 0.9433 4
0.2464 0.5603 0.9008 1
0.4034 0.4982 0.0069 2
0.0072 0.9996 0.0472 3
... ... ...
Here is what I tried:
awk 'NR==1{$4="SCORE";print;next} \
0.17<$1 && $1<0.33 && 0.40<$2 && $2<0.60 {$4="1"} \
0.40<$1 && $1<0.60 && 0.40<$2 && $2<0.60 {$4="2"} \
$1<=0.05 && $2>=0.95 && $3<=0.05 {$4="3"} \
$1<=0.05 && $2<=0.05 && $3>=0.95 {$4="4"} \
*other* 1' input.txt > output.txt
However, something is wrong in the first 5 command lines and I don't know how to write the last condition (for score 5) in the last line.
Aucun commentaire:
Enregistrer un commentaire