-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Sentence segmentation in gensim #1135
Comments
No, gensim generally expects such tokenization to happen elsewhere. Popular Python options for this include NLTK or Spacy.io. |
It's a very basic and must have functionality in a nlp library. gensim provides functions for parsing text from corpus. It also has models that take list of sentences as argument. But there is no utility for segmenting text into sentences which is very disappointing. 😞 |
I agree. Unsupervised segmentation (into blocks/sentences/words) falls nicely under gensim's mission. Pull requests welcome :) CC @tmylk . |
Added to wiki |
The summarization code splits text into sentences. Based on the English summary results, it does a great job. https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/summarization/textcleaner.py |
I need to split corpus text stream into sentences for further processing. I checked gensim documentation but could not find anything on sentence segmentation.
Is there any utility available in gensim for sentence segmentation?
The text was updated successfully, but these errors were encountered: