I've got 10,000 rows of data, i want to see how many of those rows contain chinese in the 'txt' column of the dataframe
I've tried using langdetect but i get the error stating that it only works on strings.
Current method is:
counter = 0
with open("annotation_sample.csv", "r") as f:
csvreader = csv.reader(f, delimiter=",")
for row in csvreader:
if "汉" in row[1]:
counter = counter ++ 1
print (counter)
which works but only if it contains '汉' which is merely 1 character out of all the possible chinese characters therefore doesnt return correct results.
where am i going wrong, am i tackling this the wrong way? quite new to pandas and python in general so any help would be great!
Aucun commentaire:
Enregistrer un commentaire