Replies: 1 comment
-
If you want to remove stop words before the clustering you would have to clean your documents beforehand and then pass them to BERTopic. Do note that the |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
following one of th examples you have provided I was trying to do
`vectorizer = TfidfVectorizer(min_df=5)
embeddings = vectorizer.fit_transform(docs)
Train our topic model using TF-IDF vectors
topic_model = BERTopic(stop_words="english")`
However, I get the error that BERTopic doesn not stop_words.
What I would like to do is to remove the stop words before doing the clustering. My intension is not geting rid of the stop words only in the topic representation but also in the clustering step.
Thank you and Best Regards,
Avafor
Beta Was this translation helpful? Give feedback.
All reactions