-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
readstat not converting encoding of sas7bcat labels #152
Labels
Comments
Hi, thanks for the report. The issue should be fixed now, please update and try with the latest code. |
Thanks! What about the other proposal? It would be good in order to have consistency between reading sas7bdat and sas7bcat |
Okay, I’ve changed the behavior to match SAS7BDAT. Thanks.
… On Oct 11, 2018, at 11:54, Otto Fajardo ***@***.***> wrote:
Thanks!
What about the other proposal? It would be good in order to have consistency between reading sas7bdat and sas7bcat
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub <#152 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAIONzLEkWJidG9clye3sgVsOi8pmOqxks5uj5P4gaJpZM4XX30P>.
|
Awesome! Thanks again! |
This correction will probably solve this issue on Haven |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Sas7bcat labels with special characters are not correctly translated into UTF-8.
As an example coming from this R-Haven issue, when reading the file "formats.sas7bcat" coming from here, I get labels like "modalit\xe9 \xe01", which are not valid UTF8 but are valid windows-1252 or latin1. The thing is that readstat correctly sets the file.encoding to windows-1252, so that string should be already valid UTF-8 when my function readstat_value_label_handler gets it. This happens in pyreadstat, in R-Haven and debugging readstat with gdb. An user found a similar issue in pyreadstat for another file of his.
Looking at readstat_sas7bcat_read.c, in the function sas7bcat_parse_value_labels, it seems to me that the variable label never gets converted. I inserted the following after line 91 and cures the problem:
As my understanding of readstat and iconv is still low (hope to improve it!) I am not sure if this is the proper solution, and therefore I did not dare to send a PR, but I can do after your suggestions.
Another smaller, but still confusing thing is that if I set the encoding manually with readstat_set_file_character_encoding, to let's say something like LATIN1, and later I want to recover the file encoding with readstat_get_file_encoding, I still get WINDOWS-1252. The reason for this I think is because in readstat_sas7bcat_read.c line 371:
should be:
as it is in readstat_sas7bdat_read.c line 594, to reflect that the user set the encoding manually.
The text was updated successfully, but these errors were encountered: