lundi 28 décembre 2020

How to speed up conditional statement in python

I am trying to generate a new column in a pandas dataframe by loop over >100,000 rows and setting the value of the row conditional on an already existing row.

My current code is:

# if charging set to 1, if discharging set to -1.
# if -1 < IT100 < 1 then set CD to previous cells value
# Charging is defined as IT100 > 1 and Discharge is defined as IT100 < -1
def CD(dataFrame):


    for x in range(0,len(dataFrame.index)):
        print(x)
        current = dataFrame.loc[x,"IT100"]

        if x == 0:
            if dataFrame.loc[x+5,"IT100"] > -1:
                dataFrame.loc[x,"CD"] = 1
            else:
                dataFrame.loc[x,"CD"] = -1
        else:
            if current > 1:
                dataFrame.loc[x,"CD"] = 1
            elif current < -1:
                dataFrame.loc[x,"CD"] = -1
            else:
                dataFrame.loc[x,"CD"] = dataFrame.loc[x-1,"CD"]

Using if/Else loops is extremely slow. I see that people have suggested to use np.select() or pd.apply(), but I do not know if this will work for my example. I need to be able to index the column because one of my conditions is to set the value of the new column to the value of the previous cell in the column of interest.

Thanks for any help!

Aucun commentaire:

Enregistrer un commentaire