-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use cChardet fork #7126
Comments
Any updates on this? |
I suggest dropping all character set guessing technologies. |
I forgot this PR was opened. I'm the current maintainer of https://github.com/faust-streaming/cChardet and the only issue I have at the moment is support for macOS ARM builds is unavailable. Other than that I think this is a safe merge. |
I suggest using faust-streaming/cChardet for the speedup, but giving the user the option to disable charset guessing technologies altogether. |
@asvetlov I think, character set guessing technologies are obsolete nowadays. I vote for removal from aiohttp. What you think? |
For my own edification, what replaced character set guessing techniques? It'd be useful to know in order to understand cChardet's purpose in this project to begin with, and what would replace it. |
@wbarnha Nothing replaces. Guessing is not required anymore. Everyone has to setup correct content-types. Also, UTF-8 is a winner. If one wants something else, he MUST specify it explicitly. No implicit encodings anymore. That's my thoughts. |
To be honest, that sounds pretty reasonable. Just looking through the code, it appears that it is only used for |
Also, if the mimetype is set correctly (which |
All of the major web browsers have some faculty for character set guessing. Here's an interesting blog post on the matter: https://hsivonen.fi/chardetng/ |
Yes, but we're not a browser, so legacy compatibility is a little less relevant, while performance is more relevant. We shouldn't penalise performance for something that is probably needed for 1 in a million requests or less. It should be trivial for a user to add charset guessing directly in their code, so I'm still thinking that we should still remove this library. |
Guessing anything is definitely a bad thing. For example, if a Russian text starts with an English phrase, a typical guesser says it's English. I strongly consider, earlier or sooner, these installations must fail. I think the time has come. Absolutely agreed about performance. Moreover, cchardet is an extra dependency. “Supply chain” problem is also important. |
We've removed charset guessing from aiohttp itself, but added a new Once backports are done, we should have a transitionary release in 3.8.6, where the new parameter will be present, but defaults to the older behaviour of trying charset-normalizer. |
Is your feature request related to a problem?
cChardet is probably abandoned and charset-normalizer seems to be slower according to #6819.
Describe the solution you'd like
Several days ago cChardet was forked and patches for 3.10 and 3.11 were applied.
Describe alternatives you've considered
Check out original issue #6819 and PyYoshi/cChardet#81
Related component
Server
Additional context
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: