dimanche 8 avril 2018

Detect how many rows are in chinese in a dataframe with python

I've got 10,000 rows of data, i want to see how many of those rows contain chinese in the 'txt' column of the dataframe

I've tried using langdetect but i get the error stating that it only works on strings.

Current method is:

counter = 0


with open("annotation_sample.csv", "r") as f:
    csvreader = csv.reader(f, delimiter=",")
    for row in csvreader:
        if "汉" in row[1]:
            counter = counter ++ 1


print (counter)

which works but only if it contains '汉' which is merely 1 character out of all the possible chinese characters therefore doesnt return correct results.

where am i going wrong, am i tackling this the wrong way? quite new to pandas and python in general so any help would be great!

Aucun commentaire:

Enregistrer un commentaire