mercredi 4 septembre 2019

Python list comprehension with if else conditions

I am trying to get my head around list comprehension, and mainly for the purposes of learning, trying to use them wherever I can.

From a dataframe of detected chemicals, I have extracted a series as a list (because I couldn't work out how to iterate over a Pandas series within a list comprehension, but that's a separate topic). The list is called detected_chems and to get that I first make nan results into "none_detected" strings, and then use pd.series.to_list()

df.loc[df.chem_name.isnull()] = "none_detected"

detected_chems = df['chem_name'].to_list()

The second much larger iterable is a dictionary chem_db which contains the chemical names as the key, and a dictionary of chemical properties of the value. Like this:

{'chemicalx':{'property1':'smells','property2':'poisonous'},'chemicaly':{'property1':'stinks','property2':'toxic'}}

I am trying to match all the detected chemicals with those in the database and pull their properties.

I have studied these questions/answers but can't seem to apply it to my case (sorry) Is it possible to use 'else' in a list comprehension? if/else in a list comprehension? if/else in a list comprehension? Python Nested List Comprehension with If Else

So I am making a list of results res, but instead of nested for loops with an if x in condition, I've created this.

res = [{chem:chem_db[chem]} for det_chem in detected_chems for chem in 
        chem_db.keys() if det_chem in chem]

This works to an extent!

What I (think) am doing here is creating a list of dictionaries, which will have the key:value pair of chemical names (keys) and information about the chemicals (as a dictionary itself, as values), if the detected chemical is found somewhere in the chemical database (chem_db).

An example would be:

[{'cyprodinil': {'adi':'0.01','arfd':'0.1', 'carcinogen':'known_not_to_cause_a_problem', 'cas name':'o,o-dimethyl_o-(3,5,6-trichloro-2-pyridinyl)_phosphorothioate',

'cas rn': '5598-13-0'}}]

The problem is not all the detected chemicals are found in the database. This is probably because of misspelling or name variation (e.g. they include numbers) or something similar.

So to solve the problem I need to identify which detected chemicals are not being matched. I thought this might be a solution:

not_matched=[]
res = [{chem:chem_db[chem]} for det_chem in detected_chems for chem in chem_db.keys() if det_chem in chem else not_matched.append(det_chem)]

I am getting a syntax error.

I have two questions:

1) Where should I put the else condition to avoid the syntax error?

2) Can the not_matched list be built within the list comprehension, so I don't create that empty list first.

res = [{chem:chem_db[chem]} for det_chem in detected_chems for chem in chem_db.keys() if det_chem in chem else print(det_chem)]

What I'd like to achieve is something like:

in: len(detected_chems)
out: 20
in: len(res)
out: 18
in: len(not_matched)
out: 2

in: print(not_matched)
out: ['chemical_strange_character$$','chemical___WeirdSPELLING']

That will help me find trouble shoot the matching.

Aucun commentaire:

Enregistrer un commentaire