-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch all encoding to UTF-8 #256
Comments
While copying WoRMS data into our DB:
Problem is 'Radwańska, 1996', I guess. |
I'm very +1 on this! |
DB is UTF8, API is unicode, all python code as well. The file format still to clear up. |
trying to set '🦠 unicode!' in a text field, I got during the export: |
Writing of files:
Reading of files:
|
… option for using old encoding.
Do you really insist on BOMs? It only makes reading the files harder, see here, for example: You need to read the files with |
Hello, we tried for ordinary users and on many OSes, the BOM is OK for spreadsheet apps. |
Fixed in 2.5.12 |
Hmm... But even regular users don't always use spreadsheet apps. I really don't think that a BOM is the right way these days where UTF8 is the standard everywhere (even Windows starts using it). But you're the boss... |
This means that we might want a simple charset detection in pyecotaxa so the user does not have to worry about this. |
I had not followed the export part of this. By "leave the choice to users to users" you mean a checkbox at export time asking to choose the encoding? I think most users will not understand it and those who will would be knowledgeable enough to deal with this afterwards. So I think all export should be UTF-8 (and if some obscure windows-only utility fails, then too bad). Then BOM or no BOM I don't know. @moi90: Rainer tested opening Laurent's file (which I assume is UTF-8 with BOM) with Python and found no problem. |
I found out the following:
(https://stackoverflow.com/a/44573867/1116842) So I have no problem with BOM anymore and will make utf-8-sig the default when reading EcoTaxa files. |
…a web application. See ecotaxa/ecotaxa_front#256 Closes ecotaxa#3.
Latin-1 is not the preferred standard.
Input files in Latin-1 should be detected and converted to UTF-8 upon import. Field names (mapping) should all be UTF-8.
The text was updated successfully, but these errors were encountered: