-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support utf8mb4_zh_0900_as_cs
for utf8mb4
character set
#19747
Comments
@bb7133 PTAL |
BTW, MySQL supports lots of collations for |
Adding a new collation is easy, but adding a new charset is hard now. |
MySQL has a framework to support language specific collation, it based on UCA v900 and accept different tailoring rules. |
@xiongjiwei How much performance would we gain compared with the unified collation framework? |
@zz-jason I have no idea, I am not deep in this, but I instinctively feel that if we implement framework very carefully, we may only loss performance when startup, it is hard but still has a way. |
Do you mean when the tidb-server instance is started up? |
right |
OK. I think it's affordable. Would you like to write a proposal for this framework or other algorithms? |
sure |
This feature request is related to #10192 because I believe they actually aim to the same goal - the 'pinyin' order of Chinese character. Basically we've two alternatives:
|
I feel like option 2 is better for now considering the cost. It does not prevent us go for optional 1 in the future if needed. |
hi, @zz-jason, consider opinion of bb7133 , it is much easier to
if we just need pinyin order. if we need full compatibility with MySQL, we may have a lot to do. we should discuss which way to achieve pinyin order. I'd like to write a proposal if we choose full compatibility with MySQL |
@bb7133 @ilovesoup It's indeed easy to be implemented, but introducing a new collation
|
We aim to make the Chinese characters of
It will not change, no collation will be changed since it is released.
We may not support it and this is not something new, considering there are some features in TiDB that doesn't exists in MySQL(Sequences/Placement Rules...). However, if the |
The algorithm of Example program: package main
import (
"fmt"
"golang.org/x/text/collate"
"golang.org/x/text/language"
)
func main() {
testCases := []string{"一", "二", "三", "四", "五", "1", "one", "yi", "YI", "yī", "🤔"}
collator := collate.New(language.Chinese)
var buffer collate.Buffer
for _, tc := range testCases {
fmt.Printf("%s\t%x\n", tc, collator.KeyFromString(&buffer, tc))
buffer.Reset()
}
} Output:
on MySQL 8.0: select column_0, weight_string(column_0 collate utf8mb4_zh_0900_as_cs)
from (values
row('一'), row('二'), row('三'), row('四'), row('五'), row('1'), row('one'), row('yi'), row('YI'), row('yī'), row('🤔')
) a; output:
|
|
well this issue is about
that's unfortunate, but there's still no need to reimplement UCA by hand in Rust, reusing ICU's |
The Go's sortKey is not the same as ICU's sortKey. The sortKey can be implement-defined. Besides, collate in Go is still unreleased, and it tends to buggy:golang/go#38726. |
Feature Request
Is your feature request related to a problem? Please describe:
It's unable to order by a column based on it's pinyin order. For example:
insert some data:
a query requires to order by column
a
in its pinyin order:Describe the feature you'd like:
Support the
utf8mb4_zh_0900_as_cs
collate like MySQL, the output should be:Describe alternatives you've considered:
N/A
Teachability, Documentation, Adoption, Migration Strategy:
N/A
The text was updated successfully, but these errors were encountered: