Today I am working on a problem correcting data errors in files that have a few unknowns. The unknowns are the number of fields in each file, and which fields and records have the string "---".
An example of the data is:
1 2 1 39.6406 1 38.8512 1 38.3479 1 37.9744
2 1 4 39.1527 3 38.7329 2 38.3479 2 37.9744
3 3 3 39.5186 2 38.8512 3 38.2079 3 37.6385
4 4 2 39.6406 4 38.4964 --- 37.7414 --- 36.7149
5 5 --- 40.2504 --- 39.0286 --- 38.4879 --- 38.1004
The desired output is:
1 2 1 39.6406 1 38.8512 1 38.3479 1 37.9744
2 1 4 39.1527 3 38.7329 2 38.3479 2 37.9744
3 3 3 39.5186 2 38.8512 3 38.2079 3 37.6385
4 4 2 39.6406 4 38.4964 --- --- --- ---
5 5 --- --- --- --- --- --- --- ---
I have tried using for-loops, such as:
awk '{for (i = NF; i >= 1; i--){if ($i=="---")$(i-1)="---"}{print $0}}' file
which resulted in:
1 2 1 39.6406 1 38.8512 1 38.3479 1 37.9744
2 1 4 39.1527 3 38.7329 2 38.3479 2 37.9744
3 3 3 39.5186 2 38.8512 3 38.2079 3 37.6385
---
---
and I also tried:
awk '{for (i=1;i<=NF;i++){if ($i=="---")$(i+1)="---"}{print $0}}' file
which resulted in the error:
"awk: program limit exceeded: maximum number of fields size=32767"
FILENAME="file" FNR=4 NR=4
1 2 1 39.6406 1 38.8512 1 38.3479 1 37.9744
2 1 4 39.1527 3 38.7329 2 38.3479 2 37.9744
3 3 3 39.5186 2 38.8512 3 38.2079 3 37.6385
In my first attempt, the for-loop went all the way to the first field, and in the second attempt, the records with the desired string had an infinite loop.
My gut feeling is I need to apply a break statement, yet after many hours of searching, I can't find an example that has helped me. I know there is more then one way to skin a cat, so if you know a better way to accomplish my goal, keeping in mind that there are multiple files with different field counts, or if you can provide an example of a break statement with one of my for-loops, I, and others looking for an example, will be extremely grateful.
Thank you
Aucun commentaire:
Enregistrer un commentaire