-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
count(s) and term_frequency(s) #92
count(s) and term_frequency(s) #92
Conversation
@jbesomi Can I have a review ? |
Seems like its failing the .....................................................................................................
======================================================================
FAIL: test_correct_index_26_term_frequency (tests.test_indexes.AbstractIndexTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/travis/virtualenv/python3.6.7/lib/python3.6/site-packages/parameterized/parameterized.py", line 530, in standalone_func
return func(*(a + p.args), **p.kwargs)
File "/home/travis/build/jbesomi/texthero/tests/test_indexes.py", line 96, in test_correct_index
self.assertTrue(result_s.index.equals(t_same_index.index))
AssertionError: False is not true
======================================================================
FAIL: test_incorrect_index_26_term_frequency (tests.test_indexes.AbstractIndexTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/travis/virtualenv/python3.6.7/lib/python3.6/site-packages/parameterized/parameterized.py", line 530, in standalone_func
return func(*(a + p.args), **p.kwargs)
File "/home/travis/build/jbesomi/texthero/tests/test_indexes.py", line 103, in test_incorrect_index
self.assertFalse(result_s.index.equals(t_different_index.index))
AssertionError: True is not false
---------------------------------------------------------------------- |
Thank you, that looks a great start! @henrifroese is working on #90, we should probably wait for his merge before we can continue with that. For a broader view, you can have a look there: #85 As you have some Javascript knowledge (and I assume also some web-development knowledge), would you be interested in helping out with #40 ? I can support you there. This is quite an interesting subject, as we will have to work with Sphinx, CSS, html and maybe even a bit of JS Otherwise, if you are more interested in the software development, what about #65 ? |
Hey @jbesomi Yes, I can work with both. Should I proceed with this PR? I have to add just a couple of test cases to fix this. |
For this PR, you will need to wait a bit. I'm making some important changes. I will let you know when is done (in an hour or so). If you want to keep going in the meantime, may you have a look there and add ask for questions/add your opinion? #40 |
Hey @ishanarora04, you can keep working on it! For you to know, once you are finished, I will try to uniform all three functions so that they have all the same arguments ( |
caee34a
to
590bd94
Compare
@jbesomi Should term_frequency be part of test_indexes.py since we are generating a new series? |
Hey! Yes it should be. What do you mean by "since we are generating a new series?" Also, term_frequency is basically count normalized by the number of words in the document. You are doing something different right now... isn't? |
07487ca
to
98d45a1
Compare
Hey, Can I have a review here ? |
Hey @ishanarora04, thanks, amazing! I'm out of town for the weekend, I will look into that tomorrow evening (ECT) or latest on Monday. If you are interested in contributing more, this issue needs some help #65 👍 Thank you for your patience! |
This is up for review. |
Hey @ishanarora04, thank you! 🎉 When reviewing, I noticed that the @henrifroese and @mk2510 recently made some big improvements to the documentation and the preprocessing file. You can see the main changes there: #107 If you are okay with that, I will first merge PR #107 and then will merge yours, that way we can reduce merging conflicts. Review for now (for efficiency and to avoid do the work twice, you might want to wait that we merge #107 before doing any changes):
For involving you in the discussion: Initially, preprocessing functions were receiving a Text Series and then scikit-learn default settings were used. scikit-learn by default lowercase and remove punctuation, that's why we added test such as *_punctuation_are_kept and *_not_lowercase. Now, |
Thanks for the review. Yes, we can wait for #107 to be merged. I will inculcate all suggestions |
Incorporated the suggestions. Meanwhile, I can start working on #65 |
Top! 🎉 |
Hey @ishanarora04. Thank you for your PR! Just merged 🎉 🎉 |
Thanks
…On Wed, Jul 22, 2020 at 1:42 PM Jonathan Besomi ***@***.***> wrote:
Hey @ishanarora04 <https://github.com/ishanarora04>. Thank you for your
PR! Just merged 🎉 🎉
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#92 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABWIJOEW5YRNOWECY6WY3FLR42NONANCNFSM4O2NCJ2A>
.
|
Replace term frequency by Count and creates a new method term_frequency