vendredi 12 novembre 2021

why 'expand' in rule all not working and why snakemake delete my files

I just start to learn snakemake. So, I test a very simple job. It's a rename of orginal fastq files to new name using 'cp'.

First I make a csv file involving "original filename[FileName col]" and "new filenam[reFileName2]".

I make a FILEMAP to determine file mapping list from the original and the new one. (value= original file names, key= new file name)

In the rule all, I used expand function to generate list of final out files based on the FILEMAP key values (i.e. list of original filename).

However, my snake file used all items in FILEMAP (key and values) then stoped with many errors.

I can't figure out what's wrong.... I would appreciate your help.

Here I'll paste my code and print out results from the code.

enter code here

'''

*******import pandas as pd
from operator import add 
from pathlib import Path
#MYDF=pd.read_csv("fileList2_bamDir.csv")
MYDF=pd.read_csv("testfilelist.csv")
FILES=MYDF["FileName"].tolist()
PATHS=MYDF["Path"].tolist()
#print(FILES)
REFILES2=MYDF["reFileName2"].tolist()
PATHS2=MYDF["Path2"].tolist()
def _get_fileFullPath(p, f): 
    p=list(map(add, p, ["/"]*len(p)))
    return list(map(add, p, f)) 
DEFAULT="/home/hyojung/binR/bin_GRIN2BMITsnake/empty_fileRename.touch"
#Path(DEFAULT).touch()
print(DEFAULT)
#make dictrionary: original file(value), targeted out file(key)
myInFiles= _get_fileFullPath(PATHS, FILES)
myOutFiles= _get_fileFullPath(PATHS2, REFILES2)
FILEMAP=dict(zip(myOutFiles, myInFiles)) #zip(key, values)
print(FILEMAP)
print(FILEMAP.keys())
print(len(FILEMAP.keys()))
keys = []
for key in FILEMAP.keys():
    keys.append(key)
print(keys)
#get original file(value) based on targeted out file(key)
def _get_originalfile(wildcards):
    print("origin call from outfile:::"+wildcards.file2)
    infile=""
    if (wildcards.file2 in keys):
        print("filekey correct"+":::"+wildcards.file2+":::"+FILEMAP[wildcards.file2])
        infile=FILEMAP[wildcards.file2]
    else:
        print("DEFAULTreturn"+":::"+wildcards.file2+":::")
        infile=DEFAULT
    
    print("return value"+infile) #found error: all infile values are DEFAULT. if/else is not working and finally infile=DEFAULT for every cases.
    return(infile)
rule all:
    input:
        expand("{file2}", file2=keys) #expand use all items of FILEMAP (keys, values)
        
rule rename:
    input:
        _get_originalfile
        #FILEMAP[""]
    output:
        "{file2}"
    run:
        if "{input}"!=DEFAULT:
            shell("""ll {input}""")
        else:
            shell("""ll {input}""")*******

<Here is my printout on screen>
home/hyojung/binR/bin_GRIN2BMITsnake/empty_fileRename.touch
{'/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/fastq/20w-PFC-HT1_r1.fastq.gz': '/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/bam/test20w-PFC-H1.transcript_r1.fastq.gz', '/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/fastq/20w-PFC-HT1_r2.fastq.gz': '/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/bam/test20w-PFC-H1.transcript_r2.fastq.gz', '/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/fastq/20w-PFC-HT2_r1.fastq.gz': '/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/bam/test20w-PFC-H2.transcript_r1.fastq.gz', '/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/fastq/20w-PFC-HT2_r2.fastq.gz': '/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/bam/test20w-PFC-H2.transcript_r2.fastq.gz'}
dict_keys(['/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/fastq/20w-PFC-HT1_r1.fastq.gz', '/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/fastq/20w-PFC-HT1_r2.fastq.gz', '/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/fastq/20w-PFC-HT2_r1.fastq.gz', '/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/fastq/20w-PFC-HT2_r2.fastq.gz'])
4
['/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/fastq/20w-PFC-HT1_r1.fastq.gz', '/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/fastq/20w-PFC-HT1_r2.fastq.gz', '/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/fastq/20w-PFC-HT2_r1.fastq.gz', '/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/fastq/20w-PFC-HT2_r2.fastq.gz']
Building DAG of jobs...
origin call from outfile:::/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/fastq/20w-PFC-HT1_r1.fastq.gz
filekey correct:::/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/fastq/20w-PFC-HT1_r1.fastq.gz:::/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/bam/test20w-PFC-H1.transcript_r1.fastq.gz
return value/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/bam/test20w-PFC-H1.transcript_r1.fastq.gz
origin call from outfile:::/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/bam/test20w-PFC-H1.transcript_r1.fastq.gz
DEFAULTreturn:::/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/bam/test20w-PFC-H1.transcript_r1.fastq.gz:::
return value/home/hyojung/binR/bin_GRIN2BMITsnake/empty_fileRename.touch
origin call from outfile:::/home/hyojung/binR/bin_GRIN2BMITsnake/empty_fileRename.touch
DEFAULTreturn:::/home/hyojung/binR/bin_GRIN2BMITsnake/empty_fileRename.touch:::
return value/home/hyojung/binR/bin_GRIN2BMITsnake/empty_fileRename.touch
origin call from outfile:::/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/fastq/20w-PFC-HT1_r2.fastq.gz
filekey correct:::/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/fastq/20w-PFC-HT1_r2.fastq.gz:::/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/bam/test20w-PFC-H1.transcript_r2.fastq.gz
return value/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/bam/test20w-PFC-H1.transcript_r2.fastq.gz
origin call from outfile:::/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/bam/test20w-PFC-H1.transcript_r2.fastq.gz
DEFAULTreturn:::/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/bam/test20w-PFC-H1.transcript_r2.fastq.gz:::
return value/home/hyojung/binR/bin_GRIN2BMITsnake/empty_fileRename.touch
origin call from outfile:::/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/fastq/20w-PFC-HT2_r1.fastq.gz
filekey correct:::/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/fastq/20w-PFC-HT2_r1.fastq.gz:::/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/bam/test20w-PFC-H2.transcript_r1.fastq.gz
return value/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/bam/test20w-PFC-H2.transcript_r1.fastq.gz
origin call from outfile:::/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/bam/test20w-PFC-H2.transcript_r1.fastq.gz
DEFAULTreturn:::/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/bam/test20w-PFC-H2.transcript_r1.fastq.gz:::
return value/home/hyojung/binR/bin_GRIN2BMITsnake/empty_fileRename.touch
origin call from outfile:::/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/fastq/20w-PFC-HT2_r2.fastq.gz
filekey correct:::/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/fastq/20w-PFC-HT2_r2.fastq.gz:::/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/bam/test20w-PFC-H2.transcript_r2.fastq.gz
return value/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/bam/test20w-PFC-H2.transcript_r2.fastq.gz
origin call from outfile:::/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/bam/test20w-PFC-H2.transcript_r2.fastq.gz
DEFAULTreturn:::/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/bam/test20w-PFC-H2.transcript_r2.fastq.gz:::
return value/home/hyojung/binR/bin_GRIN2BMITsnake/empty_fileRename.touch
Using shell: /usr/bin/bash
Provided cluster nodes: 1
Job stats:
job       count    min threads    max threads
------  -------  -------------  -------------
all           1              1              1
rename        8              1              1 
-->my csv file only 4 files.. but it's 8 (key+ vlaue= 8... so, something wrong in here)
total         9              1              1

Select jobs to execute...

[Fri Nov 12 18:57:45 2021]
rule rename:
    input: /blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/bam/test20w-PFC-H2.transcript_r1.fastq.gz
    output: /blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/fastq/20w-PFC-HT2_r1.fastq.gz
    jobid: 5
    wildcards: file2=/blues/ngs/data/RNAseq/EJKim/GRIN2B_MIT/fastq/20w-PFC-HT2_r1.fastq.gz
    resources: tmpdir=/tmp

Submitted job 5 with external jobid 'Submitted batch job 129204'.
[Fri Nov 12 18:57:55 2021]

'''

Aucun commentaire:

Enregistrer un commentaire