Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddl: fix partition definition for literal with non-utf8 charsets + binary column | tidb-test=pr/2262 (#49229) #51601

Conversation

ti-chi-bot
Copy link
Member

This is an automated cherry-pick of #49229

What problem does this PR solve?

Issue Number: close #36433, close #49251

Problem Summary:

The string literal in the partition clause is stored as a formated string literal. For example, the '你好' will be kept as '你好' in the schema, and the collation information is lost. However, when running the getRangeLocateExprs, the expression is constructed with the DDL session, which always use utf8mb4 collation.

When inserting the row, the string (after converted to the collation corresponding to the column) will be compared with the utf8mb4 literal string, which may cause unexpected behavior. In the issue #36433, the insert value 你好 is converted to binary encoding GBK \xc4\xe3\xba\xc3, and compare it with the utf8 string '你好' through binary collation directly, then former is smaller than the later one.

MySQL records the binary representation in table schema. If we run show create table a in MySQL, it'll give a hex literal with _binary prefix in the partition clause.

What changed and how does it work?

If the target is binary charset, use the hex literal to represent the string. Actually this PR does two change:

  1. Always use the evaluated string to represent the partition clause.
  2. If the target column collation is binary, use the _binary + hex literal to represent the definition, as it may be an invalid utf8 string.

As TiDB always use utf8 to represent a string internally (except binary), it's fine to not consider other charset like GBK.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Release note

Fix the issue that the partition clause is compared through the collation of `utf8` charset, but not the column charset.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot ti-chi-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. type/cherry-pick-for-release-5.4 This PR is cherry-picked to release-5.4 from a source PR. labels Mar 8, 2024
Copy link

ti-chi-bot bot commented Mar 8, 2024

This cherry pick PR is for a release branch and has not yet been approved by triage owners.
Adding the do-not-merge/cherry-pick-not-approved label.

To merge this cherry pick:

  1. It must be approved by the approvers firstly.
  2. AFTER it has been approved by approvers, please wait for the cherry-pick merging approval from triage owners.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot ti-chi-bot bot added do-not-merge/cherry-pick-not-approved and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 8, 2024
Copy link

ti-chi-bot bot commented Mar 8, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign you06 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Mar 8, 2024
@YangKeao
Copy link
Member

Close, because it will change the behavior.

@YangKeao YangKeao closed this May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/cherry-pick-not-approved release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. type/cherry-pick-for-release-5.4 This PR is cherry-picked to release-5.4 from a source PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants