lundi 26 décembre 2016

How to read two files in a for-loop and update values in one files based on matching-values in another file?

I want to update a values in the column by reading two files simultaneously.

main_file has following data:

contig  pos GT  PGT_phase   PID PG_phase    PI
2   1657    ./. .   .   ./. .
2   1738    0/1 .   .   0|1 935
2   1764    0/1 .   .   1|0 935
2   1782    0/1 .   .   0|1 935
2   1850    0/0 .   .   0/0 .
2   1860    0/1 .   .   1|0 935
2   1863    0/1 .   .   0|1 935
2   2969    0/1 .   .   1|0 3352
2   2971    0/0 .   .   0/0 .
2   5207    0/1 0|1 5185    1|0 1311
2   5238    0/1 .   .   0|1 1311
2   5241    0/0 .   .   0/0 .
2   5258    0/1 .   .   1|0 1311
2   5260    0/0 .   .   0/0 .
2   5319    0/0 .   .   0/0 .
2   5398    0/1 0|1 5398    1|0 1311
2   5403    0/1 0|1 5398    1|0 1311
2   5426    0/1 0|1 5398    1|0 1311
2   5427    0/1 0|1 5398    0/1 .
2   5434    0/1 0|1 5398    1|0 1311
2   5454    0/1 0|1 5398    0/1 .
2   5457    0/0 .   .   0/0 .
2   5467    0/1 0|1 5467    0|1 1311
2   5480    0/1 0|1 5467    0|1 1311
2   5483    0/0 0|1 5482    0/0 .
2   6414    0/1 .   .   0|1 1667
2   6446    0/1 0|1 6446    0|1 1667
2   6448    0/1 0|1 6446    0|1 1667
2   6465    0/1 0|1 6446    0|1 1667
2   6636    0/1 .   .   1|0 1667
2   6740    0/1 .   6740    0|1 1667
2   6748    0/1 .    6740   0|1 .

The another match_file has following type of info:

**PI      PID**
1309    3617741,3617753,3617788,3618156,3618187,3618289
131     11793586
1310    
1311    5185,5398,5467,5576
1312    340692,340728
1313    18503498
1667    6740,12237,12298

What I am trying to do:

  • I want to create a new column(new_PI) with updated PI values.

How the updating works:

  • So, if there a PI value in the line of main_file, its simple: new_PI value = main_PI and then continue
  • If in main_file both main_PI and main_PID is ., new_PI = . and continue
  • But, if the PI value is '.' but PID value is some integer, now we look in the match_file for the PI value that contains that value in the list of PID. If a matching PID is found new_PI = PI_match_file and then continue

I have written the below code:

main_file = open("2ms01e_chr2_table.txt", 'r+')
match_file = open('updated_df_table.txt', 'r+')

main_header = main_file.readline()
match_header = match_file.readline()

main_data = main_file.read().rstrip('\n').split('\n')
match_data = match_file.read().rstrip('\n').split('\n')

file_update = open('PI_updates.txt', 'w')
file_update.write('contig   pos GT  PGT_phase   PID PG_phase    PI  new_PI\n')
file_update.close()

for line in main_data:
    main_column = line.split('\t')
    PID_main = main_column[4]
    PI_main = main_column[6]
    if PID_main == '.' and PI_main == '.':
        new_PI = '.'
        continue

    if PI_main != '.':
        new_PI = PI_main
        continue

    if PI_main == '.' and PID_main != '.':
        for line in match_data:
            match_column = line.split('\t')
            PI_match = match_column[0]
            PID_match = match_column[1].split(',')
            if PID_main in PID_match:
                new_PI = PI_match
                continue

    file_update = open('PI_updates.txt', 'a')
    file_update.write(line + '\t' + str(new_PI)+ '\n')
    file_update.close()

I am not getting any error but looks like I am not writing appropriate code to read the two files.

My output should be something like this:

contig  pos    GT    PGT       PID     PG      PI     new_PI
2      5426    0/1   0|1       5398   1|0   1311       1311 
2      5427    0/1   0|1       5398   0/1   .          1311
2      5434    0/1   0|1       5398   1|0   1311       1311
2      5454    0/1   0|1       5398   0/1   .          1311
2      5457    0/0   .          .     0/0   .          .
2      5467    0/1   0|1       5467   0|1   1311       1311
2      5480    0/1   0|1       5467   0|1   1311       1311
2      5483    0/0   0|1       5482   0/0   1667       1667
2      5518    1/1   1|1       5467   1/1   .          1311
2      5519    0/0   .         .      0/0   .          .
2      5547    1/1   1|1       5467   1/1   .          1311
2      5550    ./.   .         .      ./.   .          .
2      5559    1/1   1|1       5467   1/1   .          1311
2      5561    0/0   .         .      0/0   .          .
2      5576    0/1   0|1       5576   1|0   1311       1311
2      5599    0/1   0|1       5576   1|0   1311       1311
2      5602    0/0   .         .      0/0   .          .
2      5657    0/1   .         .      1|0   1311       1311
2      5723    0/1   .         .      1|0   1311       1311
2      6414    0/1   .         .      0|1   1667       1667
2      6446    0/1  0|1      6446     0|1   1667       1667
2      6448    0/1  0|1      6446     0|1   1667       1667
2      6465    0/1  0|1      6446     0|1   1667       1667
2      6636    0/1  .          .      1|0   1667       1667
2      6740    0/1  .        6740     0|1   1667       1667
2      6748    0/1  .        6740     0|1   .          1667

Thanks in advance !

Aucun commentaire:

Enregistrer un commentaire