I can't understand why the list doesn't append in pyspark. Can someone help look at my code?
import json
input_file = sc.textFile("data.json")
def extract_func(data):
c_list = []
neighborhoods = data.get('neighborhoods', None)
for n in neighborhoods:
if n == []:
c_list.append('Unknown')
else:
c_list.append(n)
return c_list
Example data entry:
{'attributes': {'Accepts Credit Cards': True,
'city': 'Edinburgh',
'name': 'Conan Doyle',
'neighborhoods': [],
'stars': 3.5,
'state': 'EDH'}
This example entry doesn't have a neighborhood shown, so I want to append 'unknown' to the list. Some other data entries have multiple neighborhoods, so I want to append them individually by the for loop.
When I run dat = input_file.map(lambda line: json.loads(line)) followed by dat = dat.flatMap(extract_func), it doesn't give me the Unknown neighborhood entries.
Being checking for hours, can't figure out what's wrong, what am I missing here?
Aucun commentaire:
Enregistrer un commentaire