Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Fix very rare case of ?bad? bytes payload that cause decode/encode errors #40

Merged
merged 1 commit into from
May 13, 2021

Conversation

Ousret
Copy link
Member

@Ousret Ousret commented May 13, 2021

@codecov-commenter
Copy link

codecov-commenter commented May 13, 2021

Codecov Report

Merging #40 (702735d) into master (523c71b) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #40   +/-   ##
=======================================
  Coverage   84.56%   84.56%           
=======================================
  Files          13       13           
  Lines         920      920           
=======================================
  Hits          778      778           
  Misses        142      142           
Impacted Files Coverage Δ
charset_normalizer/normalizer.py 85.60% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 523c71b...702735d. Read the comment docs.

@potiuk
Copy link

potiuk commented May 13, 2021

Cool

@Ousret
Copy link
Member Author

Ousret commented May 13, 2021

So I can confirm that this patch fixes the issue. The websites that raise that are behind the Cloudflare app-firewall I noticed.
Does not seems to happen every time as you noticed.

@Ousret
Copy link
Member Author

Ousret commented May 13, 2021

I am merging this now as it does not interfere with everything else. This 'bug' is a very edge case and seems related to errors in TX.

@Ousret Ousret changed the title 🐛 Fix very rare case of ?bad? characters that cannot be translated to Unicode 🐛 Fix very rare case of ?bad? bytes payload that cause decode/encode errors May 13, 2021
@potiuk
Copy link

potiuk commented May 13, 2021

Cool. I can re-run the test tomorrow for all 33.000 sites - it was pretty repeatable with 16-18 cases per run.

@Ousret Ousret merged commit d2fac2c into master May 13, 2021
@Ousret Ousret deleted the bugfix-bad-unicode-from-source branch May 13, 2021 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants