I would like to numbered a tab file depends on the string within third column. So, if third column is like "X" the numbered is taking account a set of columns and if is like "Y" is numbered taking account other set of columns and so on. In this case I'm try to do this by the script below introducing if conditions by different ways but does not works correctly. It would be possible to do through the script below? or is there other more simple ways to do it in UNIX environment? Thanks in advance.
The input
rs868289783 355364 frameshift_variant * 1004 S del=1 dbSNP
rs868289783 355364 frameshift_variant * 1004 S del=1 dbSNP
180595 chemical-modification R 18 D-R PMD
rs747393379 264033 deletion_inframe 108 del=12 dbSNP
296037 inframe_deletion NQMTGQISM 1405 del=9 ExAC
296037 inframe_deletion NQMTGQISM 348 del=9 ExAC
The output would be
1 rs868289783 355364 frameshift_variant * 1004 S del=1 dbSNP
1 rs868289783 355364 frameshift_variant * 1004 S del=1 dbSNP
2 180595 chemical-modification R 18 D-R PMD
3 rs747393379 264033 deletion_inframe 108 del=12 dbSNP
4 296037 inframe_deletion NQMTGQISM 1405 del=9 ExAC
4 296037 inframe_deletion NQMTGQISM 348 del=9 ExAC
The script I have used is something like
awk 'function intern(sym) { if (sym in table && $3 ~/frameshift_variant/)
return table[sym]
return table[sym] = ++counter
{ print intern($2"\t"$3"\t"$4"\t"$5"\t"$6), $0 };
if (sym in table && $3 ~/inframe_deletion/)
return table[sym]
return table[sym] = ++counter
{ print intern($2"\t"$3"\t"$4"\t"$7), $0 }' "input" > "output"
Aucun commentaire:
Enregistrer un commentaire