jeudi 22 octobre 2020

Comparing Multiple Sets of Duplicates - Odd Loop Logic Python

I have a list of many reports with duplicate names. Each one has a unique path (or reportFullName in code). Each also has a last updated date. A data set sample is as follows:

Report Name,Report Last Updated,Report Full Name
201RPT data only,2017-10-10T07:51:27.479-04:00,CAMID(
201RPT data only,2016-12-14T10:30:16.466-05:00,CAMID(
201RPT data only,2016-12-14T10:30:16.466-05:00,CAMID(
201RPT data only,2019-08-13T10:34:52.171-04:00,CAMID(
201RPT for Actuals Budget & Forecast (with Children),2020-09-08T09:02:28.379-04:00,/content/folder
201RPT for Actuals Budget & Forecast (with Children),2019-02-21T08:43:16.832-05:00,CAMID(
201RPT for Actuals Budget & Forecast (with Children),2020-03-26T09:15:56.957-04:00,/content/folder
201RPT for Actuals Budget & Forecast (with Children),2020-09-03T17:44:40.512-04:00,/content/folder
201RPT for Actuals Budget & Forecast (with Children),2019-01-11T15:32:26.263-05:00,CAMID(
201RPT for Actuals Budget & Forecast (with Children),2019-02-21T08:43:16.832-05:00,CAMID(
201RPT for Actuals Budget & Forecast (with Children),2020-04-28T12:33:27.978-04:00,/content/folder
201RPT for Actuals Budget & Forecast (with Children),2016-12-05T08:29:21.853-05:00,CAMID(
201RPT for Actuals Budget & Forecast (with Children),2016-12-02T14:56:10.854-05:00,CAMID(
201RPT for Actuals Budget & Forecast (with Children),2020-08-31T13:32:38.548-04:00,/content/folder

I want to cycle through the reports with duplicate names only - so, in the sample above, check 201RPT data only and 201RPT for Actuals Budget & Forecast (with Children) for the following condition. I want to check if Report Full Name starts with the string '/content/folder/'. If it does, compare all the dates and use the report with the most recent date. Otherwise, use the report with the most recent date. So, I have tried to accomplish this by creating a goodReportList where the 'winning report' of the set of duplicates will be stored. In addition, reports are custom objects so, I have used 2 lists; one, to store the object, one to store the name for comparison. My code is as follows:

def reportHandle(reportList):

    goodReportList = []   
    reportObjectList = []
    singleReportDupList = []
    initialReport = reportList[0].reportName
    for r in reportList:
        
        if r.reportName == initialReport:
            singleReportDupList.append(r.reportName)
            reportObjectList.append(r)
        else:
            if len(reportObjectList) > 1:
                for rep in reportObjectList:
                    dummyList = []
                    dates = []
                    dates.append(rep.lastUpdate)
                    if rep.reportFullName.startswith("/content/folder"):
                        dummyList.append(rep)
                        for d in dummyList:
                            if d.lastUpdate == max(dates):
                                goodReportList.append(d)
                            
                    elif r.lastUpdate == max(dates):
                        goodReportList.append(rep)                
            
                
            print(len(reportObjectList))   
            initialReport = r.reportName
            reportObjectList = []
            singleReportDupList = []
            singleReportDupList.append(r.reportName)
            reportObjectList.append(r)      
            
    return goodReportList

So, I am initializing a few lists and checking the the first name. At the first loop, if the name is the first name, add it to our singleReportDupList which is where I am storing duplicates of the same name. The first if creates our object and name list, the else starts looping through the list of report objects that all have the same name. Store the date, check the string, and add if most recent. Otherwise, check the dates. Then, I am clearing out the lists and re-naming report name. At least that's how it should work.

The function breaks in a few different spots. It compares all the dates, instead of just going with content/folder max date. But, I am not sure how to get compare dates in this case. It also does not return reports that are not duplicated names - it seems like it should just add them and go on but it does not. For me, it is a complicated loop and comparison, so I am not 100% sure everywhere it is breaking. The problem lies in the fact that you have to break and re-do the comparison everytime the report name changes. But, my ideal output for the sample would be:

Report Name,Report Last Updated,Report Full Name
201RPT data only,2019-08-13T10:34:52.171-04:00,CAMID(
201RPT for Actuals Budget & Forecast (with Children),2020-09-08T09:02:28.379-04:00,/content/folder

Any ideas?

Thank you for reading a long question

Aucun commentaire:

Enregistrer un commentaire