Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support gbk Encoding To Fix #532 #533

Merged
merged 9 commits into from
Mar 14, 2018
Merged

Support gbk Encoding To Fix #532 #533

merged 9 commits into from
Mar 14, 2018

Conversation

ceshihao
Copy link
Contributor

@ceshihao ceshihao commented Jan 8, 2018

To fix #532

Description

This PR adds gbk encoding support.

@ceshihao
Copy link
Contributor Author

ceshihao commented Jan 8, 2018

also related to #228

@shlomi-noach
Copy link
Contributor

shlomi-noach commented Jan 8, 2018

Thank you. Adding GBK will create a different problem as per https://github.com/go-sql-driver/mysql#interpolateparams. We would need to turn off interpolateParams, meaning we'd need to turn on prepared statements.

  1. Is this fix only applicable when the migrated table has a column of CHARACTER SET = GBK?
  2. Or is there a possibility that some utf8 column would have GBK encoding?

If the former, then we can conditionally disable interpolateParams for migrations where GBK is present.

@ceshihao
Copy link
Contributor Author

ceshihao commented Jan 8, 2018

I add an option --include-multibyte-charset to indicate charset includes multibyte encoding which can be used to determine whether disable interpolateParams.

@shlomi-noach
Copy link
Contributor

Do you think you have the answer though?

@ceshihao
Copy link
Contributor Author

ceshihao commented Jan 8, 2018

I think this PR only handles the case with gbk encoding columns.

For question 2, i am not sure about the solution.

@shlomi-noach
Copy link
Contributor

Thank you. Please be advised that #526 is to be merged (if all goes well), and will affect your PR once merged, since it also touches (and moves) MySQL URI definitions.

@ceshihao
Copy link
Contributor Author

ceshihao commented Jan 9, 2018

  1. Or is there a possibility that some utf8 column would have GBK encoding?

Do you have any idea about this case?

@shlomi-noach
Copy link
Contributor

Do you have any idea about this case?

I don't...

@shlomi-noach shlomi-noach self-assigned this Jan 10, 2018
return fmt.Sprintf("%s:%s@tcp(%s:%d)/%s?interpolateParams=true&autocommit=true&charset=utf8mb4,utf8,latin1", this.User, this.Password, hostname, this.Key.Port, databaseName)
var multibyteCharset string
if includeMultibyte {
multibyteCharset = ",gbk,gb2312,big5,cp932,sjis"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is multibyteCharset a correct description? utf8 is also a multibyte charset. Is there a better definition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about riskCharset and includeRiskCharset?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed according to your comments.

if includeMultibyte {
multibyteCharset = ",gbk,gb2312,big5,cp932,sjis"
}
return fmt.Sprintf("%s:%s@tcp(%s:%d)/%s?interpolateParams=%t&autocommit=true&charset=utf8mb4,utf8,latin1%s", this.User, this.Password, hostname, this.Key.Port, databaseName, !includeMultibyte, multibyteCharset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd create a variable called interpolateParams that happens to be set to !includeMultibyte, and use interpolateParams in the Sprintf statement. The logic to [not] use interpolateParams may end up considering additional inputs to the charset.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks
We need manual close interpolateParams if column charset with multibyte encodings like BIG5, CP932, GB2312, GBK or SJIS ?
riskCharset , Is there any risk for this fix ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no risk for disabling interpolateParams, it is a matter of performance.

@ceshihao
Copy link
Contributor Author

resolve conflict after #526 merged

@cenkore
Copy link

cenkore commented Feb 26, 2018

Any progress about this issue ?

@shlomi-noach
Copy link
Contributor

@cenkore apologies for the delay! I hope to work on this in the next few days.

@shlomi-noach shlomi-noach temporarily deployed to production/mysql_role=ghost_testing March 7, 2018 14:38 Inactive
@shlomi-noach
Copy link
Contributor

Sent to testing.

@ceshihao
Copy link
Contributor Author

ceshihao commented Mar 8, 2018

Sent to testing.

Glad to see it.

@shlomi-noach
Copy link
Contributor

Can you please design a test (similar to https://github.com/github/gh-ost/tree/master/localtests/utf8mb4) that would express the support for the new changes?

@shlomi-noach
Copy link
Contributor

Ignore the above. I will use #532 as a test.

@shlomi-noach
Copy link
Contributor

Here's a funny thing: I added this test and it passes with a "normal" run, without adding the --include-risk-charset flag.

@ceshihao
Copy link
Contributor Author

ceshihao commented Mar 8, 2018

Here's a funny thing: I added this test and it passes with a "normal" run, without adding the --include-risk-charset flag.

https://github.com/go-sql-driver/mysql/blob/master/dsn.go#L73

It seems that the validation in driver is on Collation instead of Charset.

Perhaps, we do NOT need the flag.

@shlomi-noach shlomi-noach mentioned this pull request Mar 12, 2018
@shlomi-noach shlomi-noach mentioned this pull request Mar 12, 2018
@shlomi-noach shlomi-noach merged commit a17f2df into github:master Mar 14, 2018
@shlomi-noach
Copy link
Contributor

Superseded by #560

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

error 1366 on invalid character
3 participants