-
Notifications
You must be signed in to change notification settings - Fork 760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting Representative Documents for Topics: bertopic==0.9.2 #285
Comments
I see, this happens because there is a However, there is a quick fix available here that fixes the issue. I will most likely release that fix to pypi today. |
I was also confused why |
@gsalfourn @MitraMitraMitra A new version of BERTopic (v0.9.3) was released that should fix this issue and some others that should be helpful. You can install that version through |
Hey Maarten, I was running bertopic on user reviews of an app. My goal is to perform sentiment analysis on reviews per topic. I managed to get topics. But now I need to print the reviews per topic along with their sentiment label (1 or 0). topic_model.get_representative_docs() only print the reviews with their topic. Is there a way to keep other columns like sentiment label and star rating so I can perform sentiment analysis per topic? |
Hey @MaartenGr, I have been working with bertopic for a while and it is really awesome, I appreciate your dedication and help! I am trying to get representative documents for all topics (minus the -1 topic). When I use Here is the code I am using to get the representative sentences:
|
@mdcox It seems that your code is correct, so it is indeed strange that you are not getting the topics you are looking for. Having said that, it is difficult to see more without knowing a bit more. Which version of BERTopic are you using? Could you share your entire code for training BERTopic? |
@MaartenGr thank you for the quick reply! We are currently using
|
@mdcox There has been some significant fixes since v0.9.4 and I believe that correctly finding representative documents might be one of them. I would advise using the newest version as those issues might already be fixed. |
Okay good note! It turns out there was a small big that I just found as well Thank you very much for the help.... definitely will start using the updated version. |
@maarten,
In the link: https://maartengr.github.io/BERTopic/api/bertopic.html#bertopic._bertopic.BERTopic.get_representative_docs you show how to extract representative documents for all topics or a single topic.
To extract the representative docs of all topics you suggest using
representative_docs = topic_model.get_representative_docs()
and to get the representative docs of a single topic, to use
representative_docs = topic_model.get_representative_docs(topic=12)
Getting the representative docs for a single topic works as you suggested, however, there appears to a problem with getting representative docs for all topics using the approach you suggested. When I try to get representative docs for all docs with
topic_model.get_representative_docs()
it gives me an error message suggesting that I am missing an argument:The interesting thing is that when I use the following three approaches:
none of them give any error messages; they all give me an unordered dictionary of representative docs for all topics.
The text was updated successfully, but these errors were encountered: