dimanche 9 février 2020

Python code to extract similar strings from 2 lists

I have 2 lists of Names of players from 2 different sources.

names1 = ['C.J. McCollum', 'Metta World Peace', 'LeBron James', 'Stephen Curry']

names2 = ['Metta World Peace', 'Steph Curry', 'Kevin Durant', 'CJ McCollum']

The problem here is that though they are the same players, there are some difference in the way their names are mentioned in the 2 sources. I used the following code the find the similar names:

idx = np.zeros(3)
i = 0
for x, y in enumerate(names1):
    for z, w in enumerate(names2):
        if y in z:
            idx[i] = x
            i = i+1

For each iteration of names1 the code among all the iterations of names2 and outputs the index of the entry that is similar the the entry in names2. idx is a list that should contain the index of the similar strings. i is the the index of idx. Every time a similar string is found, it is stored in idx and i is increased by 1 so that the next entry found would be recorded in index i+1 of idx.

Expected Answer: idx = [0, 1, 3]

However I get the following error: list assignment index out of range

How can I fix the code and is there a better way to solve this problem?

Aucun commentaire:

Enregistrer un commentaire