Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⬆️ Propose v/1.4.0 | ⚡ Few Improvements | ➖ Drop dependencies to reduce the footprint #41

Merged
merged 38 commits into from
May 20, 2021

Conversation

Ousret
Copy link
Member

@Ousret Ousret commented May 16, 2021

Changes :

  • Dependency: ➖ Using standard logging instead of using the package loguru.
  • Dependency: ➖ Dropping nose test framework in favor of the maintained pytest.
  • Dependency: ➖ Choose to not use dragonmapper package to help with gibberish Chinese/CJK text. Prefer using characters occurrences instead.
  • Dependency: 🔧 ➖ Require cached_property only for Python 3.5 due to constraint. Dropping for every other interpreter version.
  • Bugfix: 🐛 BOM marker in a CharsetNormalizerMatch instance could be False in rare case even if obviously present. Due to the sub-match factoring process.
  • Improvement: ❇️ Return ASCII if given sequences fit.
  • Performance: ⚡ Huge improvement over large payload.
  • Change: 🔥 Stop support for UTF-7. Could be reverted

@Ousret Ousret added the enhancement New feature or request label May 16, 2021
@codecov-commenter
Copy link

codecov-commenter commented May 16, 2021

Codecov Report

Merging #41 (716f9b0) into master (65dea98) will decrease coverage by 0.80%.
The diff coverage is 88.41%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #41      +/-   ##
==========================================
- Coverage   84.56%   83.76%   -0.81%     
==========================================
  Files          13       14       +1     
  Lines         920      967      +47     
==========================================
+ Hits          778      810      +32     
- Misses        142      157      +15     
Impacted Files Coverage Δ
charset_normalizer/probe_inherent_sign.py 88.88% <ø> (+38.88%) ⬆️
charset_normalizer/cached_property/__init__.py 77.08% <77.08%> (ø)
charset_normalizer/probe_chaos.py 80.83% <84.00%> (-1.35%) ⬇️
charset_normalizer/normalizer.py 82.68% <85.71%> (-2.93%) ⬇️
charset_normalizer/unicode.py 93.60% <94.11%> (-0.57%) ⬇️
charset_normalizer/__init__.py 100.00% <100.00%> (ø)
charset_normalizer/probe_coherence.py 94.89% <100.00%> (-0.04%) ⬇️
charset_normalizer/probe_words.py 88.88% <100.00%> (+2.77%) ⬆️
charset_normalizer/version.py 100.00% <100.00%> (ø)
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 65dea98...716f9b0. Read the comment docs.

@Ousret Ousret changed the title 🔧 Using pytest instead of the deprecated nose ⬆️ Propose v/1.4.0 | ⚡ Few Improvements | ➖ Drop dependencies to reduce the footprint May 18, 2021
@Ousret Ousret added documentation Improvements or additions to documentation help wanted Extra attention is needed labels May 18, 2021
@Ousret
Copy link
Member Author

Ousret commented May 18, 2021

For now, we observe a slight decrease in Chinese-related CP detection. Should be fixed soon. Thinking of ways to resolve this.

@Ousret
Copy link
Member Author

Ousret commented May 18, 2021

Otherwise,

  • Dependencies tree has been reduced to the minimum
  • ASCII resolution when content fits to it
  • Fix UTF-16 nonsense detection when content contain only numbers (~1KB)
  • Performance on large file to be resolved
  • Chinese detection to be improved, need reach previous detection threshold/success

@Ousret Ousret merged commit 98d12fa into master May 20, 2021
@Ousret Ousret deleted the patch-refresh branch May 20, 2021 06:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed
Development

Successfully merging this pull request may close these issues.

2 participants