mardi 2 janvier 2018

Not able to use a list in a if-statement

I'm back with another question.

I'm trying to make a loop that allowed me to retrieve tokenized data values in a list, check if there's stop words inside the tokenized cell value and append it to a new list.

# Importing the packages to be used

import xlrd
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords

# Declaration of file path of the data and opening of workbook and worksheet

file_path = "C:/Users/L31101/Documents/Data/Copy_1.xlsx"
workbook = xlrd.open_workbook(file_path)
worksheet = workbook.sheet_by_name("ConsolidateModuleQnComment")

# Grabs the numbers of rows and columns of the worksheet

rowcount = worksheet.nrows
columncount = worksheet.ncols

# Prints the number of row and columns

print("\nRow count: %d" % rowcount)
print("Column count: %d" % columncount)

# Grabbing the cell values and placing them inside an array named data_value

data_value = []

for rowindex in range(2, rowcount):
    # print("\nCurrent row number: %d" % rowindex)
    # print(worksheet.cell_value(rowindex, 6))
    data_value.append(worksheet.cell_value(rowindex, 6))

# Grabbing the values inside data_value cell and tokenizes them, and then adds them into the data_tokenized array

data_tokenized = []

for valueindex in range(0, len(data_value)):
    data_tokenized.append(word_tokenize(data_value[valueindex]))

# Grabbing the tokenized values from the data_tokenized array and removing the stopwords

stop_words = set(stopwords.words("english"))

data_stopword_removed = []

for tokenizedindex in range(0, len(data_tokenized)):
    test_variable = data_tokenized[1]
    if test_variable not in stop_words:
        data_stopword_removed.append(test_variable)

print("\nNumber of records: %d" % len(data_stopword_removed))

It gives the following error message

C:\Users\L31101\PycharmProjects\Year3\venv\Scripts\python.exe C:/Users/L31101/PycharmProjects/Year3/SentimentAnalysis.py

Row count: 5792
Column count: 7
Traceback (most recent call last):
  File "C:/Users/L31101/PycharmProjects/Year3/SentimentAnalysis.py", line 47, in <module>
    if test_variable not in stop_words:
TypeError: unhashable type: 'list'

Process finished with exit code 1

I've tried asking friends around my school but none of them could give me an answer regarding this issue. Hence, I'm looking for some help from the community :)

Aucun commentaire:

Enregistrer un commentaire