I am having the following two files
BC.txt
"PB.50262.10"; UMI=AGCGGCCT; BC=TTTCAGCGCCGA;
"PB.50262.10"; UMI=AAGCGGCC; BC=TTTCAGCGCCGA;
"PB.50262.10"; UMI=ATGGGCCC; BC=GTGTAAGGGGCT;
"PB.50262.10"; UMI=AAAAGACG; BC=ACCTGTAGGAAC;
"PB.50262.10"; UMI=TTGTATTG; BC=TTTCAAGCGCCA;
PB.txt
c4 PB tr 41258945 41270445 . + . g_i "PB.50262"; t_i "PB.50262.10";
c4 PB Ex 41258945 41259026 . + . g_i "PB.50262"; t_i "PB.50262.10";
c4 PB Ex 41259626 41259754 . + . g_i "PB.50262"; t_i "PB.50262.10";
c4 PB Ex 41262664 41262814 . + . g_i "PB.50262"; t_i "PB.50262.10";
c4 PB Ex 41263732 41263817 . + . g_i "PB.50262"; t_i "PB.50262.10";
c4 PB Ex 41263893 41263940 . + . g_i "PB.50262"; t_i "PB.50262.10";
c4 PB Ex 41265242 41265308 . + . g_i "PB.50262"; t_i "PB.50262.10";
c4 PB Ex 41266120 41266178 . + . g_i "PB.50262"; t_i "PB.50262.10";
c4 PB Ex 41270004 41270445 . + . g_i "PB.50262"; t_i "PB.50262.10";
I am trying to compare Col1 of BC.txt with Col12 of PB.txt and print the matches next to each other. For same value in col1 of BC.txt has different value in col2 and Col3. So while comparing I am getting output for only one entry of BC.txt. But I want for all.
awk 'BEGIN {OFS=FS} NR==FNR {a[$1]=($2" "$3);next} $12 in a {print $0,a[$12]}' BC.txt PB.txt
Aucun commentaire:
Enregistrer un commentaire