lundi 2 août 2021

python to get substring of an excel cell with a condition

I'm trying to concatenate several csv files and create a 'DATE_TIME' column based on the specific column 'NAME'. I apply an IF condition for extract the substring of DATE_TIME from NAME based on the appearance of a string 'RE'.

Here's my code:

import pandas as pd
import numpy as np
import os

path_dataset = r'C:\Users\test'

def get_file(path_dataset):
    files = os.listdir(path_dataset) #check file list
    files.sort() #sort file
    file_list = []
    for file in files:
        path = path_dataset + "\\" + file
        if (file.startswith("test")) and (file.endswith(".csv")):
        file_list.append(path)
    return (file_list)

read_columns = ['NAME']

read_files = get_file(path_dataset)

all_df = []

for file in read_files:
    df = pd.read_csv(file, usecols = read_columns)

    if (str(df['NAME'].astype(str).str[23:25]) == 'RE-'):
        df['DATE_TIME'] = df['NAME'].astype(str).str[26:40]
    else:
        df['DATE_TIME'] = df['NAME'].astype(str).str[22:37]
    
all_df.append(df)

Concat_table = pd.concat(all_df, axis=0)
Concat_table = Concat_table.sort_values(['DATE_TIME'])

Concat_table.head()
Concat_table.to_csv(os.path.join(path_dataset, 'Concate_all.csv'), index=False)

My question is at the IF-ELSE statement. They return the same position of the substring, so that I get result below. This is not what I want, with the name 'RE' I still want to extract the date and time. What did I make a mistake for the code? Thank you.

enter image description here

Aucun commentaire:

Enregistrer un commentaire