mardi 23 juillet 2019

create two new columns based on the series of events of another column

I have a dataset that looks like this (just a sample and also how the orange highlight comments should look like):

enter image description here

The two different colors against the rows are just to emphasize that they are different serial_num. The highlighted orange columns num_of_fails and run_number just mean they are new columns I made.

So I have to make sure everything is according to the same serial_num. In other words, in the sample dataset above, the serial_num is 846 and has 2 runs, as seen in run_number. If there happens to be another serial_num, lets say 847, then the run_number would start at 1.

Additionally, number of fails increases by 1 going up to a sum of 2. Then the counter for num_of_fails, restarts at 0 if num_of_fails is 2 or is a brand new run or is a different serial_num.

Here is my code:

df["num_of_fails"] = np.nan
df["run_number"] = np.nan
filter_list = ['846', '847']
sample_df = df[df.serial_num.isin(filter_list)]
sample_df_group = sample_df_group.groupby('serial_num')
num_of_fails = 0
for name, group in sample_df_group:
    if group.iloc[2] == False
        num_of_fails = 1
    if (group.iloc[2] == False and group.iloc[10] == 1):
        num_of_fails = 2
    else:
        num_of_fails 0

However, I get this error:

  File "<ipython-input-72-bfd213949d82>", line 3
    if group.iloc[2] == False
                             ^
SyntaxError: invalid syntax

I dont know if I am starting this logic correct to represent how num_of_fails and run_number are populated based on what serial_num it is at and the pass_fail.

Any advice?

Aucun commentaire:

Enregistrer un commentaire