Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Support for DSN flag - sendStringParametersAsUnicode #180

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

sinhashubham95
Copy link

This is a feature addition for a new DSN flag - sendStringParametersAsUnicode which basically allows us to specify if the Go default string types should be sent as UNICODE(if true) or not(if false).

Official Support for sendStringParametersAsUnicode in JDBC

@sinhashubham95
Copy link
Author

@microsoft-github-policy-service agree

@sinhashubham95
Copy link
Author

sinhashubham95 commented Mar 5, 2024

@stuartpa @apoorvdeshmukh @shueybubbles can you please check this PR?

@@ -68,6 +68,9 @@ Other supported formats are listed below.
* `multisubnetfailover`
* `true` (Default) Client attempt to connect to all IPs simultaneously.
* `false` Client attempts to connect to IPs in serial.
* `sendStringParametersAsUnicode`
* `true` (Default) Go default string types sent as `nvarchar`.
* `false` Go default string types sent as `varchar`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ing types sent as varchar.

without any way of indicating the code page, how does this work? Go strings are UTF8

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shueybubbles it's upto the clients to decide how they want to use Go strings. In the same way other drives like postgres for Go also works.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced it's a good idea for the driver to have such a broad toggle without a more comprehensive fix for mixing string encodings AND having a way for clients to specify the code page of the input.

Note the driver already has a way to designate input parameter strings as varchar to avoid the conversion - you need to use the driver-specific VarChar
Using that type has the benefit of apps being explicit about the desired conversion behavior and being able to mix unicode strings with non-unicode strings in the same connection.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shueybubbles the fact that this flag is made available in the official support for sqlserver is because it makes the applications which work by default with non-unicode strings have lesser chances of failure. With this once and for all we can add this in the connection string, and nowhere explicit type casts are required on top of a string to varchar, without which any database columns having varchar as a type, will work the way as expected(for example index queries, etc). We are just trying to adhere to the official documentation of sqlserver here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shueybubbles can you help close this? This has been pending since long. We can have a quick connect on this if needed.

@shueybubbles
Copy link
Collaborator

shueybubbles commented Mar 5, 2024

please add some functional tests showing what problem this is solving.
JDBC differentiates parameters by type and only applies this flag when the input parameter is not a wide character type. Your change would apply the conversion to all input strings regardless of whether the server type is a wide char type which seems incorrect.

@moizm89
Copy link

moizm89 commented Apr 17, 2024

@shueybubbles problem this PR solves is mentioned here - #40

@sinhashubham95
Copy link
Author

sinhashubham95 commented Apr 20, 2024

@shueybubbles I agree to your point that it will change for all input parameters that are of the string type, but it will only happen for the clients opting for it specifically. It's a backward compatible change as by default this flag is true itself unless specified.

@sinhashubham95
Copy link
Author

@shueybubbles have added test cases for the above please check, also this feature is in accordance with the Official JDBC driver as well.

@codecov-commenter
Copy link

codecov-commenter commented Apr 20, 2024

Codecov Report

Attention: Patch coverage is 85.71429% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 74.54%. Comparing base (fe7c3d4) to head (28f3fbf).
Report is 2 commits behind head on main.

❗ Current head 28f3fbf differs from pull request most recent head 0b266b8. Consider uploading reports for the commit 0b266b8 to get more accurate results

Files Patch % Lines
mssql.go 85.71% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #180      +/-   ##
==========================================
- Coverage   74.72%   74.54%   -0.18%     
==========================================
  Files          32       32              
  Lines        6298     6341      +43     
==========================================
+ Hits         4706     4727      +21     
- Misses       1316     1331      +15     
- Partials      276      283       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@shueybubbles
Copy link
Collaborator

shueybubbles commented Apr 30, 2024

@shueybubbles have added test cases for the above please check, also this feature is in accordance with the Official JDBC driver as well.

Your proposed change is quite different from the JDBC model.
In JDBC, the application can still use alternative methods and types to pass UNICODE data to the server. With your proposal, there's no way for an app to say "please send this string as unicode" .
By comparison, with the current VarChar type declared in the go-mssqldb driver, it's entirely possible for an app to declare a parameter as MBCS (really, UTF-8) so it doesn't get converted to UTF-16 on the way to the server. What's the default MBCS encoding for a JDBC string? Is it system-dependent? If so, comparing that model to the Go model where strings are always UTF8 is dicy.

As the docs for JDBC point out, blindly sending non-unicode data to the server creates a lot of chances for encoding errors due to collation mismatches. If someone just changes the connection string without changing their code, apps that rely on the current behavior to avoid those mismatches will be broken, in ways that can be hard to detect until much later when someone notices data quality issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants