Spelling checker has been modified #71

bitanb1999 · 2023-03-11T17:34:29Z

Please check the options that you have completed and strike out the options that do not apply via this pull request:

a clear title and description of the Pull Request has been provided
you have read
the Contributing doc
the Developer Guide
the pull request passes the tests (`./test-coverage "tests slow-tests"``) - this will also be visible via the Code coverage report and CI/CD task on the Pull Request
you have performed some kind of smoke test by running your changes in an isolated environment i.e. Docker container, Google Colab, Kaggle, etc...
[] ~~the notebooks are updated (see notebooks folder, read the Notebooks docs)~~
CHANGELOG.md has been updated (please follow the existing format)

Goal or purpose of the PR

The spelling checker previously used TextBlob and required tokenization for the spelling checking and spelling quality summarisation. This took significant time and the result score calculated was also not satisfactory.

Changes implemented in the PR

I replaced the checker function with a package that states to be much faster than TextBlob and jamspell, namely, Symspellpy. Further, the result scoring was entirely based on the ratio of the number of misspelled words to the total length of the string. This doesn't take ease of reading or "whether the phrase makes sense" into account. To resolve these issues, I used fuzzy-matching techniques that compare the original text and the rectified text and mark the score of the text accordingly.

sourcery-ai · 2023-03-11T17:34:35Z

Sourcery Code Quality Report

❌ Merging this PR will decrease code quality in the affected files by 2.06%.

Quality metrics	Before	After	Change
Complexity	3.06 ⭐	3.51 ⭐	0.45 👎
Method Length	38.27 ⭐	43.00 ⭐	4.73 👎
Working memory	4.81 ⭐	5.10 ⭐	0.29 👎
Quality	86.71% ⭐	84.65% ⭐	-2.06% 👎

Other metrics	Before	After	Change
Lines	137	154	17

Changed files	Quality Before	Quality After	Quality Change
nlp_profiler/high_level_features/ease_of_reading_check.py	85.73% ⭐	85.18% ⭐	-0.55% 👎
nlp_profiler/high_level_features/spelling_quality_check.py	87.36% ⭐	84.28% ⭐	-3.08% 👎

Here are some functions in these files that still need a tune-up:

File	Function	Complexity	Length	Working Memory	Quality	Recommendation

Legend and Explanation

The emojis denote the absolute quality of the code:

⭐ excellent
🙂 good
😞 poor
⛔ very poor

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.

Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Help us improve this quality report!

nlp_profiler/high_level_features/ease_of_reading_check.py

neomatrix369 · 2023-03-12T10:33:10Z

This PR depends on the merging of PR #69 - once we merge that PR the current one can proceed but till then let's resolve any comments on this PR

neomatrix369 · 2023-03-12T10:37:20Z

Please also do one last check in https://github.com/neomatrix369/nlp_profiler/blob/master/CONTRIBUTING.md to see if any dependent files need changing i.e. re-running notebooks etc, the Developer Guide is also something to review as a closing action.

Maybe you can enhance the existing grammar check example in the notebook(s) to illustrate the new package's features.

There are notebooks on this repo, please take a look at them and re-run them on your local machine to see if your changes have taken effect and no issues have arisen.

There are also markdown files in this repo, they may need a touch-up due to this change - can you pls check if that's the case?

neomatrix369

LGTM - just a few changes requested

nlp_profiler/high_level_features/spelling_quality_check.py

neomatrix369 · 2023-03-12T10:42:08Z

This PR is related to #8, have a good read of the issue to see if all or most of the requirements there are resolved by this PR

bitanb1999 · 2023-03-12T12:56:47Z

This PR is related to #8, have a good read of the issue to see if all or most of the requirements there are resolved by this PR

I checked #8 and #2 and it addresses both issues. The results have been modified with fuzzy algorithm and they are penalizing for each misspelled word and arrangement of tokens. See this article: https://towardsdatascience.com/fuzzy-string-matching-in-python-68f240d910fe
Also, the Symspell package is much faster than TextBlob as stated by multiple articles and hence #2 is also being addressed.

nlp_profiler/high_level_features/grammar_quality_check.py

neomatrix369 · 2023-03-12T16:16:24Z

One last thing to do is update the CHANGELOG.md for this change - its very easy to do, see how the previous ones are done

spelling checker modified

d292883

sourcery-ai bot mentioned this pull request Mar 11, 2023

Spelling checker has been modified (Sourcery refactored) #72

Closed

bitanb1999 added 3 commits March 11, 2023 23:07

spelling checker modified and requirement txt too

067a9f3

spelling checker modified and requirement txt too

f075699

ease of reading check debugged

0e2de48

neomatrix369 reviewed Mar 12, 2023

View reviewed changes

nlp_profiler/high_level_features/ease_of_reading_check.py Show resolved Hide resolved

neomatrix369 assigned bitanb1999 Mar 12, 2023

neomatrix369 added enhancement New feature or request high-level feature(s) labels Mar 12, 2023

neomatrix369 self-requested a review March 12, 2023 10:36

neomatrix369 requested changes Mar 12, 2023

View reviewed changes

nlp_profiler/high_level_features/spelling_quality_check.py Outdated Show resolved Hide resolved

neomatrix369 linked an issue Mar 12, 2023 that may be closed by this pull request

Improve logic behind spell checking text #8

Closed

2 tasks

code cleaned

24a3dc3

neomatrix369 reviewed Mar 12, 2023

View reviewed changes

nlp_profiler/high_level_features/grammar_quality_check.py Outdated Show resolved Hide resolved

neomatrix369 mentioned this pull request Mar 12, 2023

Changes made to grammar check function #69

Merged

6 tasks

bitanb1999 and others added 7 commits March 12, 2023 23:45

code cleaned with black

fe92433

Update CHANGELOG.md

ef91966

code cleaned

fddbe1e

code cleaned with black

9298299

fixed merge conflicts

b3c0e31

changelog has been cleaned

243fbc9

Merge branch 'master' into spelling_check

d05006b

neomatrix369 merged commit 0100ac0 into neomatrix369:master Mar 12, 2023

This was referenced Mar 12, 2023

Revert "Spelling checker has been modified" #75

Merged

[BUG] Improving/changing the spell checker leads to tests breaking, implementation changing #79

Open

neomatrix369 added the good first issue Good for newcomers label Mar 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spelling checker has been modified #71

Spelling checker has been modified #71

bitanb1999 commented Mar 11, 2023 •

edited by neomatrix369

Loading

sourcery-ai bot commented Mar 11, 2023 •

edited

Loading

neomatrix369 commented Mar 12, 2023 •

edited

Loading

neomatrix369 commented Mar 12, 2023 •

edited

Loading

neomatrix369 left a comment

neomatrix369 commented Mar 12, 2023

bitanb1999 commented Mar 12, 2023

neomatrix369 commented Mar 12, 2023

Spelling checker has been modified #71

Spelling checker has been modified #71

Conversation

bitanb1999 commented Mar 11, 2023 • edited by neomatrix369 Loading

Goal or purpose of the PR

Changes implemented in the PR

sourcery-ai bot commented Mar 11, 2023 • edited Loading

Sourcery Code Quality Report

Legend and Explanation

neomatrix369 commented Mar 12, 2023 • edited Loading

neomatrix369 commented Mar 12, 2023 • edited Loading

neomatrix369 left a comment

Choose a reason for hiding this comment

neomatrix369 commented Mar 12, 2023

bitanb1999 commented Mar 12, 2023

neomatrix369 commented Mar 12, 2023

bitanb1999 commented Mar 11, 2023 •

edited by neomatrix369

Loading

sourcery-ai bot commented Mar 11, 2023 •

edited

Loading

neomatrix369 commented Mar 12, 2023 •

edited

Loading

neomatrix369 commented Mar 12, 2023 •

edited

Loading