Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the code for POS tags #57

Closed
wants to merge 3 commits into from

Conversation

samirghouri
Copy link

Added the hero.pos in the nlp module.

@samirghouri
Copy link
Author

Hey @jbesomi , Can you please check this PR?

@jbesomi
Copy link
Owner

jbesomi commented Jul 10, 2020

Hi @samirghouri! Thanks a lot for your review and welcome to Texthero! 🎉

If you haven't already done, please read throughout CONTRIBUTING.md.

Your PR is also modifying the documentation. This should be avoided as this is not part of the POS. We will make the change on the website only after the POS function will be integrated into Texthero, starting from the next version. If you feel like the CONTRIBUTING.md document wasn't enough clear on that point, do not hesitate to improve it and open another PR.

Some comments:

  1. For the POS, do we need both the tagger and parser? why is that? (you might need to refer to the spaCy )
  2. As the contributing document explains, Texthero's workhorse is the quality of the documentation. We need to make sure the docstring is cristal clean. Have a look at texthero.nlp.named_entities for an idea (look also at the source code). Improvements consist in:
  • Having a better heading title "Return a Pandas Series with part-of-speech tagging"
  • Show a list of possible POS tags as well as what they mean
  • Link to other resources (for example with the See Also section in the docstring). Refer to the numpy docstring guide for that (see CONTRIBUTING)

Extra Improvements:

  1. batch_size=32. It would be great if you can do some research and understand which values we should use there. Even better, would be to test it on a concrete example. An alternative would also be to use apply instead of the for-cycle (spacy pipe shuld be multi-threading and that's why I picked this option but making a more in-depth analysis might be useful as-well) It's something you might be interested in? You can start by opening an issue for instance and says that we should focus more on that part. Thanks!

Regards,

@jbesomi jbesomi marked this pull request as draft July 15, 2020 08:24
@jbesomi
Copy link
Owner

jbesomi commented Jul 15, 2020

Hey @samirghouri, any news regarding that? :)

@jbesomi jbesomi linked an issue Jul 15, 2020 that may be closed by this pull request
@jbesomi jbesomi mentioned this pull request Jul 18, 2020
@ghost ghost mentioned this pull request Jul 18, 2020
@henrifroese henrifroese added the enhancement New feature or request label Aug 3, 2020
@jbesomi
Copy link
Owner

jbesomi commented Aug 13, 2020

I'm closing this as there hasn't been any update recently

@jbesomi jbesomi closed this Aug 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add POS tagging
3 participants