Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix utf8 support with MS SQL Server ODBC driver #323

Closed
wants to merge 2 commits into from

Conversation

lvoinea
Copy link

@lvoinea lvoinea commented Mar 31, 2023

Hi Mark,

This pull request addresses one critical issue for us: handling of UTF-8 statements under Linux when using the MS provided ODBC driver. Basically, something like insert into ΩDBC.dbo.Tαble2 ([Col Ω], [Col α]) values (N'an Ω', 'other') would not work without this fix.

The problem can be located in several places, but isql, the command line tool bundled with unixODBC, is able to handle this according to our expectations. This led us to investigate a fix within node-odbc.

The fix is rather trivial, and follows the same approach as in isql. It involved simply setting the locale to the values configured by the user (i.e., setlocale(LC_ALL, "");).

Our dev environment is made up of:

  • macOS 11.6 workstations
  • unixODBC 2.3.11
  • libiconv 1.17 (separately installed from HomeBrew)
  • MS SQL Server 2017 (RTM-CU24) (KB5001228) - 14.0.3391.2 (X64) running in a Docker container.

Output of locale command:

LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

NOTE: We tested the fix under Windows 10 as well and it works, although the behaviour is somewhat different. Under macOS emoji characters are not accepted, as expected (as they fall outside the UCS-2 set). However, they are accepted under Windows 10. Also worth mentioning: UTF-8 is handled by node-odbc under Windows 10 without the fix.

Lucian Voinea added 2 commits March 29, 2023 15:38
Behaviour is different accros platforms.
Not sure what the specification should be in this case.
@@ -66,6 +67,8 @@ SQLHENV ODBC::hEnv;

Napi::Value ODBC::Init(Napi::Env env, Napi::Object exports) {

setlocale(LC_ALL, "");
Copy link
Member

@kadler kadler Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling setlocale like this is a process-wide effect which can change the behavior of every other user of C library code.

It is usually advised to do this by the main program, not in library or middleware code (which node-odbc is).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. However, there seems to be no out-of-the-box way of doing that in a JavaScript application on Node.js (although interest seems to be there - see nodejs/node#28099). Or am I wrong?

@kadler
Copy link
Member

kadler commented Mar 31, 2023

Is there no connection option for the driver to tell it which locale to use?

@kadler
Copy link
Member

kadler commented Mar 31, 2023

Or maybe the issue is that the MS SQL driver only provides wide-character interfaces and unixODBC is doing the conversion (and that uses the locale settings to do so). This would explain the differences seen on Windows, since the Windows build sets UNICODE, so everything is bound as SQLWCHAR and converted to/from UTF-16 instead of UTF-8.

If that's the case, I think the better solution is to implement #292

@lvoinea
Copy link
Author

lvoinea commented Apr 3, 2023

It looks like the MS SQL Server driver provides indeed only the wide character interfaces:

> nm /usr/local/lib/libmsodbcsql.18.dylib | grep SQLConnect      
000000000006dce0 T _SQLConnectW
> nm /usr/local/lib/libmsodbcsql.18.dylib | grep SQLExec         
000000000003db90 T _SQLExecDirectW
000000000003b820 T _SQLExecute

@lvoinea
Copy link
Author

lvoinea commented Apr 3, 2023

If that's the case, I think the better solution is to implement #292

Not sure how this would help under Linux. I guess unixODBC would continue to assume I'm feeding in ASCII and try to convert it to UCS-2 byte by byte, which would not work.

@kadler
Copy link
Member

kadler commented Apr 3, 2023

#292 would help because node-odbc would bind as SQLWCHAR and then convert to/from UTF-16. This bypasses the need for unixODBC to do any conversion.

@lvoinea
Copy link
Author

lvoinea commented Apr 3, 2023

Does this mean node-odbc would call de wide-character interfaces directly?

@kadler
Copy link
Member

kadler commented Apr 3, 2023

Sorry, yeah the solution to #292 is a slightly different problem than what you're facing. What you need is a way for node-odbc to call the wide ODBC APIs instead of the narrow ones. There is actually a way to do that, which we use on Windows, by defining UNICODE. However, this is a compile-time option and you would need to modify bindings.gyp to add this option.

Mark and I did talk about how to make this easier for an application to use this support. One way would be to provide a way for the application to specify a use_wide (or whatever) option on the connection to switch between the methods.

@stale
Copy link

stale bot commented May 3, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale This issue hasn't seen any interaction in 30 days. label May 3, 2023
@kadler kadler removed the stale This issue hasn't seen any interaction in 30 days. label May 3, 2023
@stale
Copy link

stale bot commented Jun 8, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale This issue hasn't seen any interaction in 30 days. label Jun 8, 2023
@stale stale bot closed this Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale This issue hasn't seen any interaction in 30 days.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants