I'm trying to extract only the lines in a file that contain a string within a list, for example MSTRG.2 is in my list, I want to have the line which contains this in my outfile. I've used the code below, but for some reason the lines that are extracted don't necessarily contain a string in the list.
id_list = []
for line in gff_compare:
split_line = line.strip().split('\t')
class_code = split_line[2]
if class_code == 'u':
if split_line[3] not in id_list:
id_list.append(split_line[3])
for line in feature_counts:
split_line_2 = line.strip().split('\t')
string_ids = split_line_2[0]
if any(s in string_ids for s in id_list):
outfile.write(line)
outfile.close()
id_list contains only 1511 elements whereas outfile has over 30,000 lines (contains lines which have a string in the list and lines which don't have a string in the list). Can't work out why it's not only pulling out the lines I want based on strings in the list.
Any help appreciated! Thanks!
Aucun commentaire:
Enregistrer un commentaire