I have a large dataset (29 columns by 19000 rows) and I want to be able to compare values on each line and print a descriptive output.
Specifically, I want to query the value in column A (@WTcall) which is effectively a pass/fail statement. If the data fails I want to print the 'fail statement' and move on to the next line, but if the data passes I want to continue describing the data.
The next questions are to determine which classification the data in columns X (@positive) and Y (@negative) fall into:
(Ex:
If column X and column Y >= 0.6 then print "ABC"
If column X and column Y < 0.6 then print "CBA"
If column X >= 0.6 but column Y is < 0.6 print "DEF"
If column X < 0.6 but column Y is >= 0.6 print "FED"
else print "missing data". )
I have included the code that I wrote below as well as a subset of sample data.
The tests I ran before posting are commented out in the code. Briefly, if I comment out the list of 'if and elsif statements', print "@WTcall\t@positive\t@negative\n" and pipe it through a head command - my variables appear to be pulling out the correct information.
The trouble arises in the actual comparisons as every line gets classified with the "Methylated\tMethylated\n" description. It's not clear to me why this is. I know I have entries where the @WTcall column should match $BadPosition (the pass/fail check). Further, if I comment out the 'if statements' again, print "@WTcall\n$BadPosition" and pipe it through sort and uniq - I only get one value for "No_WT_Concensus" and so there should be no typo there or problems matching these values.
I'm sure the issue is obvious and staring me right in the face, so I really appreciate any help.
Thank you.
Code:
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
die "Usage: $0 Filename\n" if @ARGV != 1;
my $file = shift;
my @line;
my @positive;
my @negative;
my @WTcall;
my $BadPosition = 'No_WT_Concensus';
my $i;
open my $infh, '<', $file;
while (<$infh>) {
chomp;
@line = split(/\t/,$_);
$WTcall[0]=$line[0];
$positive[0]=$line[14];
$negative[0]=$line[29];
#foreach $1 (<$infh>) {
foreach $1 (@WTcall) {
if (@WTcall eq $BadPosition){
print "No_WT_Concensus\tNo_WT_Concensus\n";
} elsif (@positive >= 0.6 && @negative >= 0.6){
print "Methylated\tMethylated\n";
} elsif (@positive <= 0.6 && @negative <= 0.6){
print "Under-methylated\tUnder-methylated\n";
} elsif(@positive >= 0.6 && @negative <=0.6){
print "Hemimethylated (m6A)\tHemimethylated (A)\n";
} elsif(@positive <= 0.6 && @negative >= 0.6){
print "Hemimethylated (A)\tHemimethylated (m6A)\n";
} else{
print "Missing_Site\tMissing_Site\n";
}
#print "@WTcall\n$BadPosition\n";
#print "@WTcall\t@positive\t@negative\n"
#print "@negative\n";
}
}
close $infh;
Sample Data:
Methylated coding gene 619 thrA NC_000913.3 pos 3 1 0.9535 1 NC_000913.3 619 + 18 0.8889 Methylated coding gene 620 thrA NC_000913.3 neg 3 0.9429 0.9756 0.9714 NC_000913.3 620 - 14 1
No_WT_Concensus coding gene 195410 ispU NC_000913.3 pos 2 0.5789 0.766 0.6071 NC_000913.3 195410 + 39 0.5897 Methylated coding gene 195411 ispU NC_000913.3 neg 3 0.75 0.9074 0.9306 NC_000913.3 195411 - 21 0.8095
Under-methylated pseudogene 3632965 yhiL NC_000913.3 pos 0 0.0323 0.1429 0.0962 NC_000913.3 3632965 + 22 0.0909 Under-methylated pseudogene 3632966 yhiL NC_000913.3 neg 0 0.1463 0.175 0.1429 NC_000913.3 3632966 - 23 0
Methylated intergenic 164636 hrpB-mrcB NC_000913.3 pos 3 0.7381 0.7647 0.7273 NC_000913.3 164636 + 25 0.8 Methylated intergenic 164637 hrpB-mrcB NC_000913.3 neg 3 0.7 0.7931 0.7213 NC_000913.3 164637 - 25 0.4
Methylated intergenic 269287 ykfA-perR NC_000913.3 pos 3 0.875 0.8833 0.931 NC_000913.3 269287 + 22 0.8182 Methylated intergenic 269288 ykfA-perR NC_000913.3 neg 3 0.8077 0.6866 0.6491 NC_000913.3 269288 - 17 0.5294
Methylated coding gene 4397856 mutL NC_000913.3 pos 3 0.9245 0.9831 0.9661 blank blank blank blank blank Methylated coding gene 4397857 mutL NC_000913.3 neg 3 0.9783 0.9808 0.9683 NC_000913.3 4397857 - 1 0
Methylated coding gene 4397969 mutL NC_000913.3 pos 3 0.9643 0.9524 1 blank blank blank blank blank Methylated coding gene 4397970 mutL NC_000913.3 neg 3 1 1 1 blank blank blank blank blank
Methylated coding gene 2761 thrA NC_000913.3 pos 3 0.9259 0.8654 0.9242 NC_000913.3 2761 + 31 1 Methylated coding gene 2762 thrA NC_000913.3 neg 3 0.913 0.9636 0.9767 NC_000913.3 2762 - 29 0.9655
Methylated coding gene 3073 thrB NC_000913.3 pos 3 0.9677 0.8983 1 NC_000913.3 3073 + 29 1 Methylated coding gene 3074 thrB NC_000913.3 neg 3 1 0.9038 0.9778 NC_000913.3 3074 - 31 1
Aucun commentaire:
Enregistrer un commentaire