lundi 13 juillet 2020

awk compare adjacent lines and print based on if statements

I have one file with multiple lines (reads from a genome) and they are sorted (based on their locations). Now I want to loop over these lines and if multiple lines have the same ID (column 4), I want to keep either keep the first, if column 3 is a plus or the last, if column three is a minus. This is m code but it seems like my variable (lastID) is not properly updated after each line. Tips are much appreciated.

awk 'BEGIN {lastline=""; lastID=""}
{if ($lastline != "" && $4 != $lastID)
        {print $lastline; lastline=""};
if ($3 == "+" && $4 != $lastID)
        {print $0; lastline=""}
else if ($3 == "+" && $4 == $lastID)
        {lastli=""}
else if ($3 == "-")
        {lastline=$0}; 
lastID=$4
}' file

Aucun commentaire:

Enregistrer un commentaire