I have a variable in my dataframe called "duration", but it is expressed in an unconventional format. Here are some examples:
df[:,["duration]]
>>>
**duration**
PT1H34M9S
PT25M10S
PT3H
PT19M20S
PT45M
...
The point is to transfrom these values into their respected duration in seconds (the transformations to minutes and hours is something easier to do once we have the numeric terms) on all values of the columns. Regular expressions and conditionals I think is one way forward, but there some complex considerations that I do not know how to introduce in the formula. For example, how to specifiy in the code that sometimes the integers to the left of M may be one or two, same with seconds (S).
Pseudocode could be something like: in each of these values df["duration"], you can find three patterns;H, M and S, from which we will split and take the +1 integers to their left and convert them to integers. Once splitted (maybe it is not even necessary to split), for H integers calculate the product for second transofrmation (HHx60x60), for M integers take the product of MMx60, and for S just take the integers. Then, sum these three values and we will have our total seconds of the duration.
I finally decided to share with you if someone has once faced something like this and can shed some light.
Aucun commentaire:
Enregistrer un commentaire