dimanche 8 septembre 2019

Compare Two Files with duplicate input value

I am having the following two files

BC.txt

    "PB.50262.10"; UMI=AGCGGCCT; BC=TTTCAGCGCCGA;
    "PB.50262.10"; UMI=AAGCGGCC; BC=TTTCAGCGCCGA;
    "PB.50262.10"; UMI=ATGGGCCC; BC=GTGTAAGGGGCT;
    "PB.50262.10"; UMI=AAAAGACG; BC=ACCTGTAGGAAC;
    "PB.50262.10"; UMI=TTGTATTG; BC=TTTCAAGCGCCA;

PB.txt

    c4 PB tr 41258945 41270445 . + . g_i "PB.50262"; t_i "PB.50262.10";
    c4  PB  Ex  41258945    41259026    .   +   .   g_i "PB.50262"; t_i "PB.50262.10";
    c4  PB  Ex  41259626    41259754    .   +   .   g_i "PB.50262"; t_i "PB.50262.10";
    c4  PB  Ex  41262664    41262814    .   +   .   g_i "PB.50262"; t_i "PB.50262.10";
    c4  PB  Ex  41263732    41263817    .   +   .   g_i "PB.50262"; t_i "PB.50262.10";
    c4  PB  Ex  41263893    41263940    .   +   .   g_i "PB.50262"; t_i "PB.50262.10";
    c4  PB  Ex  41265242    41265308    .   +   .   g_i "PB.50262"; t_i "PB.50262.10";
    c4  PB  Ex  41266120    41266178    .   +   .   g_i "PB.50262"; t_i "PB.50262.10";
    c4  PB  Ex  41270004    41270445    .   +   .   g_i "PB.50262"; t_i "PB.50262.10";

I am trying to compare Col1 of BC.txt with Col12 of PB.txt and print the matches next to each other. For same value in col1 of BC.txt has different value in col2 and Col3. So while comparing I am getting output for only one entry of BC.txt. But I want for all.

    awk 'BEGIN {OFS=FS} NR==FNR {a[$1]=($2" "$3);next} $12 in a {print $0,a[$12]}' BC.txt PB.txt

Aucun commentaire:

Enregistrer un commentaire