I think I have the same issue posted here: R Programming: Condition giving always TRUE but I do not know how to apply the answer to my situation. The ifelse statement below searches for a match in the major_allele column for REF and vice-versa and then does the same for minor_allele and ALT columns. If both are true then I create three new columns. If either is false then I see if the other possibility is true (major_allele in ALT and vice-versa and minor_allele in REF and vice-versa) in a nested ifelse statement and the three new columns are different. The problem is that it always evaluates to true as shown by the rows with astericks. The previous answer at the above link stated that ifelse always produces a vector so I am guessing that it is only looking at the first observation, evaluates to true and then moves to doing the same across all rows. However, I would like it to go row-by-row. I tried two separate if statements as well and I get the same problem, however, it only does the second if statement (see below, last column is the most obvious to examine difference)
for (i in files){
rsid.tmp <- read.table(i, header = TRUE, sep=" ", stringsAsFactors = FALSE, fill = TRUE)
ifelse(((sapply(lapply(rsid.tmp$REF, grepl, rsid.tmp$major_allele),any)) ||
(sapply(lapply(rsid.tmp$major_allele, grepl, rsid.tmp$REF),any)))
&& ((sapply(lapply(rsid.tmp$ALT, grepl, rsid.tmp$minor_allele),any)) ||
(sapply(lapply(rsid.tmp$minor_allele, grepl, rsid.tmp$ALT),any))),
{rsid.tmp$new_gwas_a1 <- rsid.tmp$REF;
rsid.tmp$new_gwas_a2 <- rsid.tmp$ALT;
rsid.tmp$new_minor_allele_frequency <- rsid.tmp$minor_allele_frequency},
ifelse(((sapply(lapply(rsid.tmp$minor_allele, grepl, rsid.tmp$REF),any)) ||
(sapply(lapply(rsid.tmp$REF, grepl, rsid.tmp$minor_allele),any)))
&& ((sapply(lapply(rsid.tmp$major_allele, grepl, rsid.tmp$ALT),any)) ||
(sapply(lapply(rsid.tmp$ALT, grepl, rsid.tmp$major_allele),any))),
{rsid.tmp$new_gwas_a2 <- rsid.tmp$REF;
rsid.tmp$new_gwas_a1 <- rsid.tmp$ALT;
rsid.tmp$new_minor_allele_frequency <- (1 - rsid.tmp$minor_allele_frequency)},
))
new_file <- sub("(\\.txt)$", "_updated\\1", i)
write.table(rsid.tmp, new_file, quote = FALSE, row.names = FALSE)
}
REF ALT minor_allele_frequency minor_allele major_allele new_gwas_a1 new_gwas_a2
new_minor_allele_frequency
A G 0.000219914 G A A G 0.000219914
C T 0.0144844 T C C T 0.0144844
C T 0.0445486 T C C T 0.0445486
C T 0.00647968 T C C T 0.00647968
C T 0.222656 T C C T 0.222656
**A G,T 0.12189 A G A G,T 0.12189
A G,T 0.305252 A G A G,T 0.305252**
C T 0.00210762 T C C T 0.00210762
C A 0.00139373 A C C A 0.00139373
BELOW EXAMPLE WITH TWO IF STATEMENTS_____________________________________________________
for (i in files){
rsid.tmp <- read.table(i, header = TRUE, sep=" ", stringsAsFactors = FALSE, fill = TRUE)
if(((sapply(lapply(rsid.tmp$REF, grepl, rsid.tmp$major_allele),any)) ||
(sapply(lapply(rsid.tmp$major_allele, grepl, rsid.tmp$REF),any)))
&& ((sapply(lapply(rsid.tmp$ALT, grepl, rsid.tmp$minor_allele),any)) ||
(sapply(lapply(rsid.tmp$minor_allele, grepl, rsid.tmp$ALT),any))))
{rsid.tmp$new_gwas_a1 <- rsid.tmp$REF;
rsid.tmp$new_gwas_a2 <- rsid.tmp$ALT;
rsid.tmp$new_minor_allele_frequency <- rsid.tmp$minor_allele_frequency}
if(((sapply(lapply(rsid.tmp$minor_allele, grepl, rsid.tmp$REF),any)) ||
(sapply(lapply(rsid.tmp$REF, grepl, rsid.tmp$minor_allele),any)))
&& ((sapply(lapply(rsid.tmp$major_allele, grepl, rsid.tmp$ALT),any)) ||
(sapply(lapply(rsid.tmp$ALT, grepl, rsid.tmp$major_allele),any))))
{rsid.tmp$new_gwas_a2 <- rsid.tmp$REF;
rsid.tmp$new_gwas_a1 <- rsid.tmp$ALT;
rsid.tmp$new_minor_allele_frequency <- (1 - rsid.tmp$minor_allele_frequency)}
new_file <- sub("(\\.txt)$", "_updated\\1", i)
write.table(rsid.tmp, new_file, quote = FALSE, row.names = FALSE)
REF ALT minor_allele_frequency minor_allele major_allele new_gwas_a1 new_gwas_a2 new_minor_allele_frequency
A G 0.000219914 G A G A 0.999780086
C T 0.0144844 T C T C 0.9855156
C T 0.0445486 T C T C 0.9554514
C T 0.00647968 T C T C 0.99352032
C T 0.222656 T C T C 0.777344
A G,T 0.12189 A G G,T A 0.87811
A G,T 0.305252 A G G,T A 0.694748
C T 0.00210762 T C T C 0.99789238
C A 0.00139373 A C A C 0.99860627
Aucun commentaire:
Enregistrer un commentaire