dimanche 1 août 2021

Optimize retrieval of data from a dictionary dataset

I am using a dataset structured as following:

{
"TableName": "T60300C",
"SeriesCode": "B4186C",
"LineNumber": "86",
"LineDescription": "Government enterprises",
"TimePeriod": "1999",
"METRIC_NAME": "Current Dollars",
"CL_UNIT": "Level",
"UNIT_MULT": "6",
"DataValue": "38,275",
"NoteRef": "T60300C,T60300C.1"
},
{
"TableName": "T60300C",
"SeriesCode": "B4186C",
"LineNumber": "86",
"LineDescription": "Government enterprises",
"TimePeriod": "2000",
"METRIC_NAME": "Current Dollars",
"CL_UNIT": "Level",
"UNIT_MULT": "6",
"DataValue": "40,810",
"NoteRef": "T60300C,T60300C.1"
},

Here's another section (same thing just showing what changes)

{
"TableName": "T60300C",
"SeriesCode": "A4183C",
"LineNumber": "83",
"LineDescription": "General government",
"TimePeriod": "2000",
"METRIC_NAME": "Current Dollars",
"CL_UNIT": "Level",
"UNIT_MULT": "6",
"DataValue": "543,989",
"NoteRef": "T60300C"
},

As you can see the LineDescription occurs as many times as there are entries for it, Series Code and Line Numbers are identifiers for the LineDescription and TableName remains the same.

I would like to restructure this information for easier processing into a dataframe. Since all of the entries correspond to the same subject matter, I wish to create a dictionary as follows:

dataEntries = {}
dataEntries['T60300C'] = {'B4186C':[(DataValue,TimePeriod),(DataValue,TimePeriod), (etc.) ], 'A4183C':[(DataValue,TimePeriod),(DataValue,TimePeriod), (etc.)], etc.}

Right now I'm iterating through the whole dataset and adding the data using an if statement (in a for loop, from a set of unique SeriesCodes).

tupe = tuple()
listoftupes = list()
for uniqueSeriesCode in datasetSeriesCodes:
     if dataset['SeriesCode'] == uniqueSeriesCode:
         tupe = (dataset['DataValue'],dataset['TimePeriod])
         listoftupes.append(tupe)

My question is: Is this the fastest (most performant) way to traverse the dataset? Is there a better way? This is a very large dataset.

Aucun commentaire:

Enregistrer un commentaire