-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bytes, strings: Title does not treat Unicode punctuation as separators #34994
Comments
Change https://golang.org/cl/202077 mentions this issue: |
Potential duplicate of #6801, although this is specific to permitting Unicode punctuation. I'm not sure that the return value of Title can be changed now. |
It's not obvious to me that we can change this now. If we do change it, does Unicode define the set of characters that break words? Is that locale dependent? CC @mpvl |
It looks like Unicode standard defines what is a word boundary (http://unicode.org/reports/tr29/#Word_Boundaries), but it's not something that could be incorporated into `strings.Title` (guessing from the comments in the linked issue). Title already treats ASCII punctuation as word boundaries (except underscore), but it doesn’t do that regarding Unicode punctuation. It looks to me as if it was supposed to be changed to also support latter, but for some reason no one ever did that. If the reason is that the output of Title mustn’t be changed now, should the documentation describing these parts of the behavior still be a BUG notice?
… Wiadomość napisana przez Ian Lance Taylor ***@***.***> w dniu 19.10.2019, o godz. 22:08:
It's not obvious to me that we can change this now.
If we do change it, does Unicode define the set of characters that break words? Is that locale dependent? CC @mpvl
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Title has been deprecated |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Affirmative.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
The bug that prevents from capitalizing letters that begin words preceeded with Unicode punctuation is mentioned here: https://github.com/golang/go/blob/master/src/strings/strings.go#L713
Also here: https://github.com/golang/go/blob/master/src/bytes/bytes.go#L652
Simple recipe reproducing the bug: https://play.golang.org/p/b1PyVSETmV3
Output:
What did you expect to see?
No output (every word in the processed string should be capitalized).
What did you see instead?
The word after U+2024 ONE DOT LEADER (․) remained uncapitalized.
The text was updated successfully, but these errors were encountered: