jeudi 25 février 2021

Unix replacing values with 2 conditions from 2 files

I am having an issue that I almost solved thanks to this post. Using a dataset in the same format:

File 1

     32074_32077     1        0.008348          834830 G A
     32082_32085     1        0.008349          834928 A G
     32085_32088     2        0.008350          834928 G A
     32903_32906     5        0.008468          846808 C T

File 2

       rs3094315     1        0.020130          752566 G A
      rs12124819     1        0.020242          834928 A G
      rs28765502     2        0.022137          834928 T C
       rs7419119     3        0.022518          846808 T G

I would like to change the 1st column of file one only IF $4 and $2 are the same in FILE2. If it is not I would like to keep the line as it is.

Expected output:

     32074_32077     1        0.008348          834830 G A
     rs12124819      1        0.008349          834928 A G
     rs28765502      2        0.008350          834928 G A
     32903_32906     5        0.008468          846808 C T

Using the answer from the linked post, I cannot have the expected output. I tried this:

awk 'FNR==NR{a[$4]=$1; b[$2]=$1; next} ($4 in a && $2 in b){$1=a[$4]} 1' file1 file2

It doesn't work as expected because the condition $2 in b is always true.. I understand but I don't know how I can work around this.

Thank you.

Aucun commentaire:

Enregistrer un commentaire