Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ODBC.jl does not process latin1 strings correctly on Windows #379

Open
ahjulstad opened this issue Oct 26, 2023 · 1 comment
Open

ODBC.jl does not process latin1 strings correctly on Windows #379

ahjulstad opened this issue Oct 26, 2023 · 1 comment

Comments

@ahjulstad
Copy link

On Win 10, it seems that the ODBC driver outputs strings in the local system locale, while ODBC.jl erroneously assumes they are coming as UTF-8.

Tested on Win10 with ODBC driver 18. MSSQL from mcr.microsoft.com/mssql/server:2022-preview-ubuntu-22.04

If I change system locale to UTF-8 as discussed here, Strings are correctly parsed.

The test code I used is below: (DockerCompose is just a small package I threw together to make it easy to spin up fresh MSSQL instances for testing.)

Something should perhaps be fixed in ODBC.jl, if the scenario of ODBC towards MSSQL on Windows is important.. .. a local workaround is still to force UTF-8 as in the link

using DockerCompose     #  https://github.com/ahjulstad/DockerCompose.jl
using DBInterface
using DBInterface: execute
using DataFrames

c = DockerCompose.sqlserver()

conn = try
    DockerCompose.connect(c)
catch
    sleep(10)
    DockerCompose.connect(c)
end

execute(conn, "CREATE DATABASE withutf8 COLLATE LATIN1_GENERAL_100_CI_AS_SC_UTF8") 
execute(conn, "USE withutf8")
execute(conn, "CREATE TABLE textutf8 (
    t  [nvarchar](max) NULL,
)")

execute(conn, "INSERT INTO textutf8 (t) values 
    ('Bjørnefjært i går')
")
execute(conn, "INSERT INTO textutf8 (t) values 
('望研測来白制父委供情治当認米注。規')
")


res = execute(conn, "SELECT * from textutf8") 
DataFrame(res)


@taylorwarczinsky
Copy link

Thanks @ahjulstad for your issue. Saved me after days of tracking down why greek symbols were not working correctly.
I agree that it would be great if this was fixed to avoid having to check the "use unicode" box on every machine you run on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants