From 9a7c27e4649962fdcb78c4fa10a0bd35097a2822 Mon Sep 17 00:00:00 2001 From: jwxiong Date: Tue, 15 Sep 2020 17:47:51 +0800 Subject: [PATCH] more detail --- .../2020-09-12-utf8mb4_zh_0900_as_cs.md | 20 ++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/docs/design/2020-09-12-utf8mb4_zh_0900_as_cs.md b/docs/design/2020-09-12-utf8mb4_zh_0900_as_cs.md index b92976d4c39ab..051a525ef45f8 100644 --- a/docs/design/2020-09-12-utf8mb4_zh_0900_as_cs.md +++ b/docs/design/2020-09-12-utf8mb4_zh_0900_as_cs.md @@ -32,11 +32,7 @@ select * from t order by a; ## Proposal -`pinyin` order for Chinese character supported by this proposal will add a new collation named `utf8mb4_general_zh_cs` which are exactly same with `gbk_bin`. - -choose collation ID `2048` for `utf8mb4_general_zh_cs` - -> MySQL supports two-byte collation IDs. The range of IDs from 1024 to 2047 is reserved for user-defined collations. [see also](https://dev.mysql.com/doc/refman/8.0/en/adding-collation-choosing-id.html) +`pinyin` order for Chinese character supported by this proposal will add a new collation named `utf8mb4_general_zh_cs` which are exactly same with `gbk_bin`. Collation `utf8mb4_general_zh_cs` is for charset `utf8mb4`. Following SQL statements should have same result. ```sql @@ -51,13 +47,19 @@ select weight_string(convert(a using gbk) collate gbk_bin); ## Rationale -### Compare +### How to implement + +Collation `utf8mb4_general_zh_cs` actually convert `utf8mb4` to `gbk` code point and do same thing with collation `gbk_bin`. For the compatibility with MySQL, our convert step should exactly same as MySQL `convert(... using gbk)`. -convert `utf8mb4` encode to `gbk` code point and sort them as code point order. +### Parser + +choose collation ID `2048` for `utf8mb4_general_zh_cs` and add it into parser + +> MySQL supports two-byte collation IDs. The range of IDs from 1024 to 2047 is reserved for user-defined collations. [see also](https://dev.mysql.com/doc/refman/8.0/en/adding-collation-choosing-id.html) -### Key +### Compatibility with current collations -The `key` of `utf8mb4_general_zh_cs` is same as collation `gbk_bin`. +`utf8mb4_general_zh_cs` has same priority with `utf8mb4_unicode_ci` and `utf8mb4_general_ci` which means these three collations incompatible with each other. ### Alternative MySQL has a lot of language specific collation, for `pinyin` order, MySQL use collation `utf8mb4_zh_0900_as_cs`.