vendredi 25 mars 2016

PCA analysis with python pandas with many columns

I have a .vcf file, where

column1 = chrom
column2 = pos
column3 = ID
column4 = reference
column5 = Alt
column6 = qual
column7 = filter
column8 = info
column9 = format    
column 10 - 99 = 100 columns that have a number of either zero or one

I read in the file:

#!/usr/bin/env python
import pandas as pd
vcf=open('/Users/cmdb/Desktop/Lab6_GWAS/variants.vcf', 'r')

and have this that shouldnt be used

for line in vcf:
    fields=line.strip().split()
    A01=fields[9]
    A02=fields[10]
    A03=fields[11]

However, this would take way too long, because I want to save all those zero and ones, so I can run a PCA analysis through Python later on. PCA (python component analysis). I would like to use Pandas but not sure how I can do so for so many columns.

Aucun commentaire:

Enregistrer un commentaire