samedi 1 février 2020

if else conditions in pandas dataframe and extract column value

I have this dataframe(df), that looks like

+-----------------+-----------+----------------+---------------------+--------------+-------------+
|      Gene       | Gene name |     Tissue     |      Cell type      |    Level     | Reliability |
+-----------------+-----------+----------------+---------------------+--------------+-------------+
| ENSG00000001561 | ENPP4     | adipose tissue | adipocytes          | Low          | Approved    |
| ENSG00000001561 | ENPP4     | adrenal gland  | glandular cells     | High         | Approved    |
| ENSG00000001561 | ENPP4     | appendix       | glandular cells     | Medium       | Approved    |
| ENSG00000001561 | ENPP4     | appendix       | lymphoid tissue     | Low          | Approved    |
| ENSG00000001561 | ENPP4     | bone marrow    | hematopoietic cells | Medium       | Approved    |
| ENSG00000002586 | CD99      | adipose tissue | adipocytes          | Low          | Supported   |
| ENSG00000002586 | CD99      | adrenal gland  | glandular cells     | Medium       | Supported   |
| ENSG00000002586 | CD99      | appendix       | glandular cells     | Not detected | Supported   |
| ENSG00000002586 | CD99      | appendix       | lymphoid tissue     | Not detected | Supported   |
| ENSG00000002586 | CD99      | bone marrow    | hematopoietic cells | High         | Supported   |
| ENSG00000002586 | CD99      | breast         | adipocytes          | Not detected | Supported   |
| ENSG00000003056 | M6PR      | adipose tissue | adipocytes          | High         | Approved    |
| ENSG00000003056 | M6PR      | adrenal gland  | glandular cells     | High         | Approved    |
| ENSG00000003056 | M6PR      | appendix       | glandular cells     | High         | Approved    |
| ENSG00000003056 | M6PR      | appendix       | lymphoid tissue     | High         | Approved    |
| ENSG00000003056 | M6PR      | bone marrow    | hematopoietic cells | High         | Approved    |
+-----------------+-----------+----------------+---------------------+--------------+-------------+

Expected output:


+-----------+--------+-------------------------------+
| Gene name | Level  |            Tissue             |
+-----------+--------+-------------------------------+
| ENPP4     | Low    | adipose tissue, appendix      |
| ENPP4     | High   | adrenal gland, bronchus       |
| ENPP4     | Medium | appendix, breast, bone marrow |
| CD99      | Low    | adipose tissue, appendix      |
| CD99      | High   | bone marrow                   |
| CD99      | Medium | adrenal gland                 |
| ...       | ...    | ...                           |
+-----------+--------+-------------------------------+

code used (took help from multiple if else conditions in pandas dataframe and derive multiple columns):

def text_df(df):
    if (df[df['Level'].str.match('High')]):
        return (df.assign(Level='High') + df['Tissue'].astype(str))
    elif (df[df['Level'].str.match('Medium')]):
        return (df.assign(Level='Medium') + df['Tissue'].astype(str))
    elif (df[df['Level'].str.match('Low')]):
        return (df.assign(Level='Low') + df['Tissue'].astype(str))

df = df.apply(text_df, axis = 1)

Error: KeyError: ('Level', 'occurred at index 172') I can't understand what am I doing wrong. any suggestion?

Aucun commentaire:

Enregistrer un commentaire