-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnicodeDecodeError with chardet 3 #225
Comments
For reference: chardet/chardet#128. |
I think vint's |
I can confirm on my side that the same example file yields similar results: scriptencoding utf-8
" :purple_heart: 💜
" set list listchars=tab:»·,trail:·,eol:¬,nbsp:_,extends:❯,precedes:❮
I'm using vint v0.3.12 on a freshly installed computer. I just tested on a older computer, the exact same code works. Both are running Ubuntu 16.04 and installed vint through pip. I checked both Hope that helps |
Same here.
Any news on this topic? |
Yeah I have some fancy comments in my vimrc and it totally kills vint |
Hey @Kuniwak, I saw you commited fix to this issue recently. Thanks a lot for that! Do you have an ETA for the next Thanks again |
These commit are in review. I'm gonna ship it when the review is finished. |
* WIP * Make debugging easy for fix encoding bugs * Fix encoding problem that is #225 #242 * More simple implementation for bytes compatible * Make more simple * Remove debugging code * It is a classmethod, not instance method * Add a test case for suddn EOF * Rename to the correct name * Care multiple scriptencoding * Fix a problem about debug_hint overwriting * Care single line scriptencoding * decoding error is not a RuntimeError but Exception * More debug_hint * Fix a problem about missing last char * Change Chardet priority * Revert "WIP" This reverts commit 1fb7dfc. * Split files * Try to resolve module name conflict * Cosmetic changes * Compose strategies to decoding_strategy
This bugfix was shipped at v0.3.15. |
Will try. It seems however that 0.3.15 has some output of debug info left:
|
The bug itself is fixed, thanks a lot for this huge improvement! 💜 |
The debug code was removed at v0.3.16. Sorry for my mistake. |
Closing as fixed. |
The following minimal vim file will cause an error:
encoding_hint
inparse_file
fromchardet.detect(bytes_seq)
is:{'encoding': 'Windows-1254', 'confidence': 0.5658124254347925, 'language': 'Turkish'}
.With chardet 2.3 it is
{'encoding': 'ISO-8859-2', 'confidence': 0.6680924803464797}
.They seem to temporarily have disabled
ISO-8859-2
as per the README on PyPI.But anyway, since
scriptencoding
is present, this should be used by vint directly, andparse_file
should fall back toutf-8
probably anyway in case of errors?!b'scriptencoding' in bytes_seq
could be used here for starters.The text was updated successfully, but these errors were encountered: