Skip to content

Commit

Permalink
fix:update nltk to use punkt_tab
Browse files Browse the repository at this point in the history
  • Loading branch information
cunla committed Aug 26, 2024
1 parent fdf3a7c commit f462a58
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 8 deletions.
2 changes: 1 addition & 1 deletion forum/similarity/algo/tfidf.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ def initialize_tfidf():
try:
nltk.data.find("tokenizers/punkt")
except LookupError:
nltk.download("punkt")
nltk.download("punkt_tab")

stemmer = nltk.stem.porter.PorterStemmer()
remove_punctuation_map = dict((ord(char), None) for char in string.punctuation)
Expand Down
18 changes: 11 additions & 7 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit f462a58

Please sign in to comment.