lundi 27 juin 2016

Python: How to "NaN" a range of data when a given criteria is met for a given period of time

I have a pd.dataframe which contains activity count data from a Philips Actiwatch. When there is no activity count for a period of more than 60 minutes, the user was probably not wearing the device, and this range should be removed.

How do I detect periods of >60 min (each line is 1 minute) in my Dataframe and remove that complete period. Thus, if the activity count is 0 for 59 lines or less, nothing happens, but if the activity count is 0 for 60 lines or more (let's say 80 lines), this data should be NaN.

The csv file with the data can be found here: http://ift.tt/28Ye7QT

Pretty useless as it is, this is where I got stuck:

# remove all data where Activity = 0 for 60 or more consecutive minutes: 

zero_count = 0
for n in range(len(data)):
    if data['Activity'].loc[n] == NaN:
        continue
    elif data['Activity'].loc[n] > 0:
        continue
    elif data['Activity'].loc[n] = 0:
        while data['Activity'].loc[n] = 0:
            zero_count = zero_count + 1
        if zero_count >60: 
            # NaN last zero_count number of lines.
            zero_count = 0
            break
        else:
            zero_count = 0
            break
    else:
        print "Non-wear detection error"
        break

What I was trying to do is check each line, if it is 0, it should add +1 to the "zero_count" and when a non-zero digit is read, it should check whether the zero_count is >60, if it is, it should NaN the whole range and reset the zero_count. If it is <60, the zero_count should just be reset without NaN-ing any data.

I hope anyone understand what I am trying to do and either: 1) make the code above work, or 2) have a better idea for doing what I am trying to do.

Thanks everyone who is even reading this post.

Best regards,

Rob

Aucun commentaire:

Enregistrer un commentaire