Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(query): add collation #8610

Merged
merged 3 commits into from
Nov 3, 2022
Merged

feat(query): add collation #8610

merged 3 commits into from
Nov 3, 2022

Conversation

sundy-li
Copy link
Member

@sundy-li sundy-li commented Nov 2, 2022

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Summary about this PR

Fixes #8104

@vercel
Copy link

vercel bot commented Nov 2, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated
databend ⬜️ Ignored (Inspect) Nov 3, 2022 at 0:40AM (UTC)

@mergify mergify bot added the pr-feature this PR introduces a new feature to the codebase label Nov 2, 2022
@sundy-li
Copy link
Member Author

sundy-li commented Nov 2, 2022

MySQL [(none)]> select substr('城区-主城区-其他', 1, 6), length('我爱中国');
+---------------------------------------------------+------------------------+
| substring('城区-主城区-其他' from 1 for 6)        | length('我爱中国')     |
+---------------------------------------------------+------------------------+
| 城区                                              |                     12 |
+---------------------------------------------------+------------------------+
1 row in set (0.030 sec)

MySQL [(none)]> set collation = 'utf8';
Query OK, 0 rows affected (0.028 sec)

MySQL [(none)]> select substr('城区-主城区-其他', 1, 6), length('我爱中国');
+---------------------------------------------------+------------------------+
| substring('城区-主城区-其他' from 1 for 6)        | length('我爱中国')     |
+---------------------------------------------------+------------------------+
| 城区-主城区                                       |                      4 |
+---------------------------------------------------+------------------------+
1 row in set (0.036 sec)

@BohuTANG BohuTANG requested a review from FANNG1 November 2, 2022 13:49
@BohuTANG
Copy link
Member

BohuTANG commented Nov 2, 2022

CI checking failure :)

@BohuTANG BohuTANG self-requested a review November 2, 2022 14:12
@BohuTANG BohuTANG merged commit ac98a1b into databendlabs:main Nov 3, 2022
),
level: ScopeLevel::Session,
desc: "Char collation, support \"binary\" \"utf8\" default value: binary",
possible_values: Some(vec!["binary", "utf8"]),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why “binary” is the default value not utf-8?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the string column is stored as binary by default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can set global to make it default to use utf8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: substr should consider string encodings
3 participants