Best way to set an entirely custom stop words set #10942
Answered
by
polm
probavee
asked this question in
Help: Coding & Implementations
-
I can't set a full stop_word set by assigning it right after the model initialization import spacy
nlp = spacy.load("fr_core_news_sm")
custom_stops = {"faire","un"}
nlp.Defaults.stop_words = custom_stops
print(nlp("faire")[0].is_stop) # False import spacy
nlp = spacy.load("fr_core_news_sm")
custom_stops = {"faire","un"}
to_remove = set()
for stop in nlp.Defaults.stop_words:
if stop not in custom_stops:
to_remove.add(stop)
nlp.Defaults.stop_words.difference_update(to_remove)
nlp.Defaults.stop_words |= custom_stops
print(nlp("faire")[0].is_stop) # True What if I want only my stopwords set? Is there a better way? |
Beta Was this translation helpful? Give feedback.
Answered by
polm
Jun 10, 2022
Replies: 1 comment
-
|
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
probavee
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Defaults
is a class, so if you modify the class after your pipeline has been created it doesn't matter, because there's already an instance with the old data. You need to modify the defaults before you create the pipeline, like this.