I've implemented a program that does the following:
- scan all of the words in a web page into a string (using jsoup)
- Filter out all of the HTML markup and code
- Put these words into a spell checking program and offer suggestions
The spell checking program loads a dictionary.txt file into an array and compares the string input to the words inside the dictionary.
My current problem is that when the input contains the same word multiple times, such as "teh program is teh worst", the code will print out
You entered 'teh', did you mean 'the'?
You entered 'teh', did you mean 'the'?
Sometimes a website will have multiple words over and over again and this can become messy.
If it's possible, printing the word along with how many times it was spelled incorrectly would be perfect, but putting a limit to each word being printed once would be good enough.
My program has a handful of methods and two classes, but the spell checking method is below:
Note: the original code contains some 'if' statements that remove punctuation marks but I've removed them for clarity.
static boolean suggestWord;
public static String checkWord(String wordToCheck) {
String wordCheck, unpunctWord;
String word = wordToCheck.toLowerCase();
if ((wordCheck = (String) dictionary.get(word)) != null) {
suggestWord = false; // no need to ask for suggestion for a correct
// word.
return wordCheck;
}
// If after all of these checks a word could not be corrected, return as
// a misspelled word.
return word;
}
Aucun commentaire:
Enregistrer un commentaire