proprietary_terms = ["she", "personality matrix", "sense of self", "self-preservation", "learning algorithm", "her", "herself", "Helena"]
negative_words = ["concerned", "behind", "danger", "dangerous", "alarming", "alarmed", "out of control", "help", "unhappy", "bad", "upset", "awful", "broken", "damage", "damaging", "dismal", "distressed", "distressed", "concerning", "horrible", "horribly", "questionable"]
punctuation = [",", "!", "?", ".", "%", "/", "(", ")"]
def censor_three(input_text, censored_list, negative_words):
input_text_words = []
for x in input_text.split(" "):
x1 = x.split("\n")
for word in x1:
input_text_words.append(word)
for i in range(0,len(input_text_words)):
if (input_text_words[i] in censored_list) == True:
word_clean = input_text_words[i]
censored_word = ""
for x in range(0,len(word_clean)):
censored_word = censored_word + "X"
input_text_words[i] = input_text_words[i].replace(word_clean, censored_word)
count = 0
for i in range(0,len(input_text_words)):
if (input_text_words[i] in negative_words) == True:
count += 1
if count > 2:
word_clean = input_text_words[i]
for x in punctuation:
word_clean = word_clean.strip(x)
censored_word = ""
for x in range(0,len(word_clean)):
censored_word = censored_word + "X"
input_text_words[i] = input_text_words[i].replace(word_clean, censored_word)
return " ".join(input_text_words)
# print(censor_three(email_three, proprietary_terms, negative_words))
I was trying to go through this line by line but honestly I'm lost, please bear with me as I try to write down how I understand the function above and where I have questions with regards to my understanding.
The first 6 lines I understand that we are splitting the email string by " " and by new lines and appending each word to the blank list input_text_words (I think this creates a list like ["Dear", "Board", "of" , "Directors, ...])
The next for loop we examine each word in input_text_words and if any word in input_text_words[i] is = to a word in censored_list?
Question 1: Is my understanding above correct?
word_clean = input_text_words[i] <-- so here word_clean will contain the individual words in input_text_words
Question 2: For the above, I'm not sure why we do this
We create an empty string censored_word For index position in word_clean (I think looping through the letters of each word), we make a bunch of X's that is the length of the censored_word Then we replace any matches of censored_word and word_clean, we replace the clean words that match with X's
count = 0
For the words in input_text_words, here they use the same logic as above which I was confused about "if (input_text_words[i] in negative_words) == True:"
Question 3: Does above mean we loop through input_text_words and if the word matches a word in the negative_words list is True?
If true append 1 to variable count. If count is greater than 2 then we see the same question I had before
Question 4: word_clean = input_text_words[i] <-- I am still not too sure why we do this
Looping through the punctuation list, we strip word_clean of any punctuation.
Create a blank string censor_word, loop through each letter word_clean and add the same number of X's to censor_word as the number of letters in word_clean
For the index position of input_text_words replace word_clean with censored_word
Question 5: I don't really understand how the right word is getting replaced by censored_word
Lastly join each input_text_words with a " "
Would really appreciate if anyone could correct any misunderstandings I have in layman terms, I was not able to solve the question and had to look at the answer (and was actually super shocked by how long the function was, in all the exercises I have done till now I've never seen such a long function before). So since I suck at Python and could not answer it, one way for me to learn is to try to fully understand the answer they provided line by line so any help would be really really really appreciated.
Aucun commentaire:
Enregistrer un commentaire