mardi 12 novembre 2019

Python: How to translate multiple excel IF conditions to python coding?

I have a very huge excel dataset with multiple columns containing multiple IF conditions. One such columns has the following formula:

**bold =IF(A2532="Merlin",IF(LEFT(D2532,8)="RMZ.JPWK",RIGHT(D2532,LEN(D2532)-8),IF(LEFT(D2532,6)="RMZ.JP",RIGHT(D2532,LEN(D2532)-6),IF(LEFT(D2532,4)="RMZ.",RIGHT(D2532,LEN(D2532)-4),D2532))),IF(AND(A2532<>"Merlin",LEFT(D2532,4)="RMZ."),"",D2532)) **bold

Sample dataframe is as follows:

Country code
Merlin  010566
Merlin  RMZ.JP828802
Merlin  RMZ.JP828804
Merlin  RMZ.JP828806
Merlin  RMZ.JPNS9002
```code


The above condition is simple to be implemented in excel. Gist of the condition is:

```code
    If Country is Merlin, then:
        If first 8 characters of D2532 is RMZ.JPWK, then write first 8 characters of D2532
        else if first 6 characters of D2532 is RMZ.JP, then write first 6 characters of D2532
        else if first 4 characters of D2532 is RMZ., then write first 4 characters of D2532

    If country is not Merlin then:
        if first 4 characters of D2532 is RMZ., then write 'NAN'

```code

Expected output Dataframe:

```code
Country code    Result
Merlin  RMZ.JPWK821517  821517
Merlin  RMZ.JPWK14202   14202
Merlin  RMZ.JPWK14324   14324
Merlin  RMZ.JPWK4003175001  4003175001
Merlin  RMZ.JP828802    828802
Merlin  RMZ.JP828804    828804
Merlin  RMZ.JP828806    828806
Merlin  RMZ.02029182001 02029182001
Merlin  RMZ.02031790001 02031790001
Merlin  RMZ.02519593001 02519593001
Sorex   RMZ.02519593001 NAN
```code


I can write the solution for this by creating multiple columns satisfying each condition and then using if else loop, but that is taking a long time to run since the dataset is very huge.

Also it increases my work of cleaning up the dataframe because of the multiple columns created  for each and every condition which is not actually evaluated every time.

Please note the above dataframe is just a small part and actual database contains 100s of columns and 100 thousand rows.

Solution i am looking for is :
    1. Eliminate the need to create new columns for each condition and still be able to evaluate it in a single expression.
    2. Eliminate the use of If loop





Aucun commentaire:

Enregistrer un commentaire