jeudi 5 avril 2018

Numbered every row in a file based on the string found in one column in UNIX

I would like to numbered a tab file depends on the string within third column. So, if third column is like "X" the numbered is taking account a set of columns and if is like "Y" is numbered taking account other set of columns and so on. In this case I'm try to do this by the script below introducing if conditions by different ways but does not works correctly. It would be possible to do through the script below? or is there other more simple ways to do it in UNIX environment? Thanks in advance.

The input

rs868289783      355364  frameshift_variant      *       1004    S       del=1   dbSNP
rs868289783      355364  frameshift_variant      *       1004    S       del=1   dbSNP
                 180595  chemical-modification   R       18     D-R              PMD
rs747393379      264033  deletion_inframe                108             del=12  dbSNP
                 296037  inframe_deletion     NQMTGQISM  1405            del=9   ExAC
                 296037  inframe_deletion     NQMTGQISM  348             del=9   ExAC

The output would be

1    rs868289783      355364  frameshift_variant      *       1004    S       del=1   dbSNP
1    rs868289783      355364  frameshift_variant      *       1004    S       del=1   dbSNP
2                     180595  chemical-modification   R       18     D-R              PMD
3    rs747393379      264033  deletion_inframe                108             del=12  dbSNP
4                     296037  inframe_deletion     NQMTGQISM  1405            del=9   ExAC
4                     296037  inframe_deletion     NQMTGQISM  348             del=9   ExAC

The script I have used is something like

awk 'function intern(sym) { if (sym in table && $3 ~/frameshift_variant/)
                          return table[sym]
                        return table[sym] = ++counter
 { print intern($2"\t"$3"\t"$4"\t"$5"\t"$6), $0 };
                         if (sym in table && $3 ~/inframe_deletion/)
                          return table[sym]
                        return table[sym] = ++counter
 { print intern($2"\t"$3"\t"$4"\t"$7), $0 }' "input" > "output"

Aucun commentaire:

Enregistrer un commentaire