Skip to content

Commit

Permalink
more detail
Browse files Browse the repository at this point in the history
  • Loading branch information
xiongjiwei committed Sep 15, 2020
1 parent c308a56 commit 9a7c27e
Showing 1 changed file with 11 additions and 9 deletions.
20 changes: 11 additions & 9 deletions docs/design/2020-09-12-utf8mb4_zh_0900_as_cs.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,11 +32,7 @@ select * from t order by a;

## Proposal

`pinyin` order for Chinese character supported by this proposal will add a new collation named `utf8mb4_general_zh_cs` which are exactly same with `gbk_bin`.

choose collation ID `2048` for `utf8mb4_general_zh_cs`

> MySQL supports two-byte collation IDs. The range of IDs from 1024 to 2047 is reserved for user-defined collations. [see also](https://dev.mysql.com/doc/refman/8.0/en/adding-collation-choosing-id.html)
`pinyin` order for Chinese character supported by this proposal will add a new collation named `utf8mb4_general_zh_cs` which are exactly same with `gbk_bin`. Collation `utf8mb4_general_zh_cs` is for charset `utf8mb4`.

Following SQL statements should have same result.
```sql
Expand All @@ -51,13 +47,19 @@ select weight_string(convert(a using gbk) collate gbk_bin);

## Rationale

### Compare
### How to implement

Collation `utf8mb4_general_zh_cs` actually convert `utf8mb4` to `gbk` code point and do same thing with collation `gbk_bin`. For the compatibility with MySQL, our convert step should exactly same as MySQL `convert(... using gbk)`.

convert `utf8mb4` encode to `gbk` code point and sort them as code point order.
### Parser

choose collation ID `2048` for `utf8mb4_general_zh_cs` and add it into parser

> MySQL supports two-byte collation IDs. The range of IDs from 1024 to 2047 is reserved for user-defined collations. [see also](https://dev.mysql.com/doc/refman/8.0/en/adding-collation-choosing-id.html)
### Key
### Compatibility with current collations

The `key` of `utf8mb4_general_zh_cs` is same as collation `gbk_bin`.
`utf8mb4_general_zh_cs` has same priority with `utf8mb4_unicode_ci` and `utf8mb4_general_ci` which means these three collations incompatible with each other.

### Alternative
MySQL has a lot of language specific collation, for `pinyin` order, MySQL use collation `utf8mb4_zh_0900_as_cs`.
Expand Down

0 comments on commit 9a7c27e

Please sign in to comment.