I am developing what should be a simple script to read a file line by line, assess the contents of each line, and process the line data depending on it line number. For some reason, I cannot get a regex that matches white spaces. [:space:], [[:space:]], [:blank:], \s, \ , , and " " have all failed.
My data is formatted as follows (fastq format):
@SRR573708.2 2 length=100
AAAACGTTAATATTTATTGAAATTGTT
+SRR573708.2 2 length=100
HHHHHHHHHHHHHHHHHHHHHHHHHHH
I would like to reformat it to:
@SRR573708.2/2
AAAACGTTAATATTTATTGAAATTGTT
+SRR573708.2/2
HHHHHHHHHHHHHHHHHHHHHHHHHHH
It is important, however, that I check each line to make sure it is formatted correctly before printing it to a new file. My last attempt at generating a reformatted file produced some really bizzare results at the end of the file. My code is:
i=1
while read LINE; do
if (( $i > 4 )); then break; fi
if (( $i % 4 == 1 )); then
if [[ $data =~ ^@SRR[0-9]{6}[[:blank:]] ]]; then
awk -v IFS=" " -v OFS="" -v ORS="" -v SUFFIX=$SUFFIX -v OUTPUT_FILE=$OUTPUT_FILE ' {print $1,SUFFIX,"\n" } ' <<< $data
i=$(( $i + 1 ))
else
echo -e "error at line ${i}"; echo "${data}"; exit 1; fi
elif (( $i % 4 == 2 )); then echo -e "$LINE"
i=$(( $i + 1 ))
elif (( $i % 4 == 3 )); then
echo $data
awk -v IFS=" " -v OFS="" -v ORS="" -v SUFFIX=$SUFFIX -v OUTPUT_FILE=$OUTPUT_FILE ' {print $1,SUFFIX,"\n" } ' <<< $data
i=$(( $i + 1 ))
elif (( $i % 4 == 0 )); then echo -e "$LINE"
i=$(( $i + 1 ))
else
echo -e "number of liness is not divisible by 4. Program Terminated.\nProblem encountered at line ${i}."
exit 1
fi
done < $INPUT_FILE
I get the error message:
error at line 1
@SRR573708.2 2 length=100
Any suggestions as to how to match a whitespace in a regex if-statement, preferable matching only space and tab characters and not newline characters.
Aucun commentaire:
Enregistrer un commentaire