lundi 18 octobre 2021

How can I get all the rows that pass a filter that I specified from a VCF file in R?

I am very new to R so, sorry if I make some mistakes.

I want to obtain all the rows from a VFC file (.xlsx-like file but with some metadata) that satisfies a filter condition that I put and store that data in a vector and in a recursive way (with a for loop because I have 5 samples).

I'm doing this:

library('vcfR')
library('tidyr')
library("writexl")
library("readxl")
library('stringr')    
samples <- c('21','22','50','65','79')
    results <- c()
    for (mysample in samples){
      reference_vcf_file <- paste0('/Volumes/WD_de_Abel/PCT/results/PCT',mysample,'P_Cleaned.vcf')
      reference_vcf <- read.vcfR( reference_vcf_file, verbose = FALSE )
      
      if (reference_vcf@fix[,7] == "PASS"){
        variants_positive <- cbind(reference_vcf@fix[,c(1,2,4,5)])
        variants_positive <- data.frame(variants_positive)
        mydf_positive <- separate_rows(variants_positive, ALT, sep=',')
      }
      else{
        variants_negative <- cbind(reference_vcf@fix[,c(1,2,4,5)])
        variants_negative <- data.frame(variants_negative)
        mydf_negative <- separate_rows(variants_negative, ALT, sep=',')
      }
      
      reference_vector_Positive <- paste(mydf_positive$CHROM, mydf_positive$POS, mydf_positive$REF, mydf_positive$ALT, sep=',')
      reference_vector_Negative <- paste(mydf_negative$CHROM, mydf_negative$POS, mydf_negative$REF, mydf_negative$ALT, sep=',') 

I cannot pass through this loop because I get this error:

Error in paste(mydf_positive$CHROM, mydf_positive$POS, mydf_positive$REF, : object 'mydf_positive' not found. In addition, Warning message: In if (reference_vcf@fix[, c(7)] == "PASS") { : the condition have length > 1 and only the first element will be used.

Can someone help me to obtain these rows that have the word "PASS" in the 7th column from my VCF file?

If you need some more info, please tell me!

Thx!!

Aucun commentaire:

Enregistrer un commentaire