vendredi 5 juin 2020

Looping through an array to compare two values with Perl

I have a large dataset (29 columns by 19000 rows) and I want to be able to compare values on each line and print a descriptive output.

Specifically, I want to query the value in column A (@WTcall) which is effectively a pass/fail statement. If the data fails I want to print the 'fail statement' and move on to the next line, but if the data passes I want to continue describing the data.

The next questions are to determine which classification the data in columns X (@positive) and Y (@negative) fall into:

(Ex:

If column X and column Y >= 0.6 then print "ABC"

If column X and column Y < 0.6 then print "CBA"

If column X >= 0.6 but column Y is < 0.6 print "DEF"

If column X < 0.6 but column Y is >= 0.6 print "FED"

else print "missing data". )

I have included the code that I wrote below as well as a subset of sample data.

The tests I ran before posting are commented out in the code. Briefly, if I comment out the list of 'if and elsif statements', print "@WTcall\t@positive\t@negative\n" and pipe it through a head command - my variables appear to be pulling out the correct information.

The trouble arises in the actual comparisons as every line gets classified with the "Methylated\tMethylated\n" description. It's not clear to me why this is. I know I have entries where the @WTcall column should match $BadPosition (the pass/fail check). Further, if I comment out the 'if statements' again, print "@WTcall\n$BadPosition" and pipe it through sort and uniq - I only get one value for "No_WT_Concensus" and so there should be no typo there or problems matching these values.

I'm sure the issue is obvious and staring me right in the face, so I really appreciate any help.

Thank you.

Code:

#!/usr/bin/perl

use strict;
use warnings;
use autodie;
die "Usage: $0 Filename\n" if @ARGV != 1;

my $file = shift;
my @line;
my @positive; 
my @negative; 
my @WTcall;
my $BadPosition = 'No_WT_Concensus';
my $i;

open my $infh, '<', $file;

while (<$infh>) {
    chomp;
    @line = split(/\t/,$_);
    $WTcall[0]=$line[0];
    $positive[0]=$line[14];
    $negative[0]=$line[29];

    #foreach $1 (<$infh>) {
    foreach $1 (@WTcall) {
        if (@WTcall eq $BadPosition){
            print "No_WT_Concensus\tNo_WT_Concensus\n";
        } elsif (@positive >= 0.6 && @negative >= 0.6){
            print "Methylated\tMethylated\n";
        } elsif (@positive <= 0.6 && @negative <= 0.6){
            print "Under-methylated\tUnder-methylated\n";
        } elsif(@positive >= 0.6 && @negative <=0.6){
            print "Hemimethylated (m6A)\tHemimethylated (A)\n";
        } elsif(@positive <= 0.6 && @negative >= 0.6){
            print "Hemimethylated (A)\tHemimethylated (m6A)\n";
        } else{
            print "Missing_Site\tMissing_Site\n";
        }
        #print "@WTcall\n$BadPosition\n";
        #print "@WTcall\t@positive\t@negative\n"
        #print "@negative\n";
    }
}

close $infh;

Sample Data:

Methylated  coding gene 619 thrA    NC_000913.3 pos 3   1   0.9535  1   NC_000913.3 619 +   18  0.8889  Methylated  coding gene 620 thrA    NC_000913.3 neg 3   0.9429  0.9756  0.9714  NC_000913.3 620 -   14  1
No_WT_Concensus coding gene 195410  ispU    NC_000913.3 pos 2   0.5789  0.766   0.6071  NC_000913.3 195410  +   39  0.5897  Methylated  coding gene 195411  ispU    NC_000913.3 neg 3   0.75    0.9074  0.9306  NC_000913.3 195411  -   21  0.8095
Under-methylated    pseudogene  3632965 yhiL    NC_000913.3 pos 0   0.0323  0.1429  0.0962  NC_000913.3 3632965 +   22  0.0909  Under-methylated    pseudogene  3632966 yhiL    NC_000913.3 neg 0   0.1463  0.175   0.1429  NC_000913.3 3632966 -   23  0
Methylated  intergenic  164636  hrpB-mrcB   NC_000913.3 pos 3   0.7381  0.7647  0.7273  NC_000913.3 164636  +   25  0.8 Methylated  intergenic  164637  hrpB-mrcB   NC_000913.3 neg 3   0.7 0.7931  0.7213  NC_000913.3 164637  -   25  0.4
Methylated  intergenic  269287  ykfA-perR   NC_000913.3 pos 3   0.875   0.8833  0.931   NC_000913.3 269287  +   22  0.8182  Methylated  intergenic  269288  ykfA-perR   NC_000913.3 neg 3   0.8077  0.6866  0.6491  NC_000913.3 269288  -   17  0.5294
Methylated  coding gene 4397856 mutL    NC_000913.3 pos 3   0.9245  0.9831  0.9661  blank   blank   blank   blank   blank   Methylated  coding gene 4397857 mutL    NC_000913.3 neg 3   0.9783  0.9808  0.9683  NC_000913.3 4397857 -   1   0
Methylated  coding gene 4397969 mutL    NC_000913.3 pos 3   0.9643  0.9524  1   blank   blank   blank   blank   blank   Methylated  coding gene 4397970 mutL    NC_000913.3 neg 3   1   1   1   blank   blank   blank   blank   blank
Methylated  coding gene 2761    thrA    NC_000913.3 pos 3   0.9259  0.8654  0.9242  NC_000913.3 2761    +   31  1   Methylated  coding gene 2762    thrA    NC_000913.3 neg 3   0.913   0.9636  0.9767  NC_000913.3 2762    -   29  0.9655
Methylated  coding gene 3073    thrB    NC_000913.3 pos 3   0.9677  0.8983  1   NC_000913.3 3073    +   29  1   Methylated  coding gene 3074    thrB    NC_000913.3 neg 3   1   0.9038  0.9778  NC_000913.3 3074    -   31  1

Aucun commentaire:

Enregistrer un commentaire