dimanche 24 septembre 2017

if else statement in a for loop to append to a list

I can't understand why the list doesn't append in pyspark. Can someone help look at my code?

import json

input_file = sc.textFile("data.json")

def extract_func(data):
    c_list = []
    neighborhoods = data.get('neighborhoods', None)

    for n in neighborhoods:
        if n == []:
            c_list.append('Unknown')
        else:
            c_list.append(n)

    return c_list

Example data entry:

{'attributes': {'Accepts Credit Cards': True,
 'city': 'Edinburgh',
 'name': 'Conan Doyle',
 'neighborhoods': [],
 'stars': 3.5,
 'state': 'EDH'}

This example entry doesn't have a neighborhood shown, so I want to append 'unknown' to the list. Some other data entries have multiple neighborhoods, so I want to append them individually by the for loop.

When I run dat = input_file.map(lambda line: json.loads(line)) followed by dat = dat.flatMap(extract_func), it doesn't give me the Unknown neighborhood entries.

Being checking for hours, can't figure out what's wrong, what am I missing here?

Aucun commentaire:

Enregistrer un commentaire