Fix utf8 support with MS SQL Server ODBC driver #323

lvoinea · 2023-03-31T11:43:53Z

Hi Mark,

This pull request addresses one critical issue for us: handling of UTF-8 statements under Linux when using the MS provided ODBC driver. Basically, something like insert into ΩDBC.dbo.Tαble2 ([Col Ω], [Col α]) values (N'an Ω', 'other') would not work without this fix.

The problem can be located in several places, but isql, the command line tool bundled with unixODBC, is able to handle this according to our expectations. This led us to investigate a fix within node-odbc.

The fix is rather trivial, and follows the same approach as in isql. It involved simply setting the locale to the values configured by the user (i.e., setlocale(LC_ALL, "");).

Our dev environment is made up of:

macOS 11.6 workstations
unixODBC 2.3.11
libiconv 1.17 (separately installed from HomeBrew)
MS SQL Server 2017 (RTM-CU24) (KB5001228) - 14.0.3391.2 (X64) running in a Docker container.

Output of locale command:

LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

NOTE: We tested the fix under Windows 10 as well and it works, although the behaviour is somewhat different. Under macOS emoji characters are not accepted, as expected (as they fall outside the UCS-2 set). However, they are accepted under Windows 10. Also worth mentioning: UTF-8 is handled by node-odbc under Windows 10 without the fix.

…r MacOS.

Behaviour is different accros platforms. Not sure what the specification should be in this case.

kadler · 2023-03-31T14:49:05Z

src/odbc.cpp

@@ -66,6 +67,8 @@ SQLHENV ODBC::hEnv;

 Napi::Value ODBC::Init(Napi::Env env, Napi::Object exports) {

+  setlocale(LC_ALL, "");


Calling setlocale like this is a process-wide effect which can change the behavior of every other user of C library code.

It is usually advised to do this by the main program, not in library or middleware code (which node-odbc is).

Good point. However, there seems to be no out-of-the-box way of doing that in a JavaScript application on Node.js (although interest seems to be there - see nodejs/node#28099). Or am I wrong?

kadler · 2023-03-31T14:49:59Z

Is there no connection option for the driver to tell it which locale to use?

kadler · 2023-03-31T14:54:37Z

Or maybe the issue is that the MS SQL driver only provides wide-character interfaces and unixODBC is doing the conversion (and that uses the locale settings to do so). This would explain the differences seen on Windows, since the Windows build sets UNICODE, so everything is bound as SQLWCHAR and converted to/from UTF-16 instead of UTF-8.

If that's the case, I think the better solution is to implement #292

lvoinea · 2023-04-03T07:59:19Z

It looks like the MS SQL Server driver provides indeed only the wide character interfaces:

> nm /usr/local/lib/libmsodbcsql.18.dylib | grep SQLConnect      
000000000006dce0 T _SQLConnectW
> nm /usr/local/lib/libmsodbcsql.18.dylib | grep SQLExec         
000000000003db90 T _SQLExecDirectW
000000000003b820 T _SQLExecute

lvoinea · 2023-04-03T11:25:13Z

If that's the case, I think the better solution is to implement #292

Not sure how this would help under Linux. I guess unixODBC would continue to assume I'm feeding in ASCII and try to convert it to UCS-2 byte by byte, which would not work.

kadler · 2023-04-03T13:54:52Z

#292 would help because node-odbc would bind as SQLWCHAR and then convert to/from UTF-16. This bypasses the need for unixODBC to do any conversion.

lvoinea · 2023-04-03T15:51:15Z

Does this mean node-odbc would call de wide-character interfaces directly?

kadler · 2023-04-03T18:59:31Z

Sorry, yeah the solution to #292 is a slightly different problem than what you're facing. What you need is a way for node-odbc to call the wide ODBC APIs instead of the narrow ones. There is actually a way to do that, which we use on Windows, by defining UNICODE. However, this is a compile-time option and you would need to modify bindings.gyp to add this option.

Mark and I did talk about how to make this easier for an application to use this support. One way would be to provide a way for the application to specify a use_wide (or whatever) option on the connection to switch between the methods.

stale · 2023-05-03T19:42:47Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2023-06-08T10:36:38Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Lucian Voinea added 2 commits March 29, 2023 15:38

Fixed UTF8 input handling with MS SQL Server ODBC driver. Tested unde…

e52a983

…r MacOS.

Skipped tests that check rejecting non-UCS-2 compliant unicode input.

978ab52

Behaviour is different accros platforms. Not sure what the specification should be in this case.

kadler reviewed Mar 31, 2023

View reviewed changes

stale bot added the stale This issue hasn't seen any interaction in 30 days. label May 3, 2023

kadler removed the stale This issue hasn't seen any interaction in 30 days. label May 3, 2023

stale bot added the stale This issue hasn't seen any interaction in 30 days. label Jun 8, 2023

stale bot closed this Jun 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix utf8 support with MS SQL Server ODBC driver #323

Fix utf8 support with MS SQL Server ODBC driver #323

lvoinea commented Mar 31, 2023

kadler Mar 31, 2023 •

edited

Loading

lvoinea Apr 3, 2023

kadler commented Mar 31, 2023

kadler commented Mar 31, 2023 •

edited

Loading

lvoinea commented Apr 3, 2023

lvoinea commented Apr 3, 2023 •

edited

Loading

kadler commented Apr 3, 2023

lvoinea commented Apr 3, 2023

kadler commented Apr 3, 2023

stale bot commented May 3, 2023

stale bot commented Jun 8, 2023

		@@ -66,6 +67,8 @@ SQLHENV ODBC::hEnv;

		Napi::Value ODBC::Init(Napi::Env env, Napi::Object exports) {

		setlocale(LC_ALL, "");

Fix utf8 support with MS SQL Server ODBC driver #323

Fix utf8 support with MS SQL Server ODBC driver #323

Conversation

lvoinea commented Mar 31, 2023

kadler Mar 31, 2023 • edited Loading

Choose a reason for hiding this comment

lvoinea Apr 3, 2023

Choose a reason for hiding this comment

kadler commented Mar 31, 2023

kadler commented Mar 31, 2023 • edited Loading

lvoinea commented Apr 3, 2023

lvoinea commented Apr 3, 2023 • edited Loading

kadler commented Apr 3, 2023

lvoinea commented Apr 3, 2023

kadler commented Apr 3, 2023

stale bot commented May 3, 2023

stale bot commented Jun 8, 2023

kadler Mar 31, 2023 •

edited

Loading

kadler commented Mar 31, 2023 •

edited

Loading

lvoinea commented Apr 3, 2023 •

edited

Loading