-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: runes: create new package analogous to bytes, for rune slices #34313
Comments
This would not be necessary once (if) generics are implemented. |
On 9/16/19, Burak Serdar ***@***.***> wrote:
This would not be necessary once (if) generics are implemented.
You could also claim (I do) that if something like this came along,
there would be less justification for "generics".
Frankly, generics require a much shallower boundary between intrinsic
and user-defined objects or, perhaps more usefully, but much more
difficult to do right, a much richer "type" mechanism with open-ended
attributes.
Go with generics then becomes either a beautiful academic artifact or
a Frankenstein monster of a language. Guess which is more likely to
happen first.
Incidentally, even knowing that the Go Team's efforts put the
integrity of the language very high on the list of objectives, it is
still quite revealing that there is no "Go with Generics" in the wild,
whether to be disparaged or to be revered.
Lucio.
|
On Sun, Sep 15, 2019 at 10:00 PM lootch ***@***.***> wrote:
On 9/16/19, Burak Serdar ***@***.***> wrote:
> This would not be necessary once (if) generics are implemented.
>
You could also claim (I do) that if something like this came along,
there would be less justification for "generics".
Frankly, generics require a much shallower boundary between intrinsic
and user-defined objects or, perhaps more usefully, but much more
difficult to do right, a much richer "type" mechanism with open-ended
attributes.
Go with generics then becomes either a beautiful academic artifact or
a Frankenstein monster of a language. Guess which is more likely to
happen first.
I disagree. I think the latest generics proposal has a chance to be useful
without becoming a monster. The idea that in order to implement generics
you have to define the semantics of the generic types precisely is what
created c++/Java generics. Defining generics in terms of existing types has
a better chance of being used correctly because it demands less from the
author and from the reader.
Incidentally, even knowing that the Go Team's efforts put the
integrity of the language very high on the list of objectives, it is
still quite revealing that there is no "Go with Generics" in the wild,
whether to be disparaged or to be revered.
I think the reason for this is the experience with the c++/java generics,
and despite all the efforts, many counter-proposals ended up offering
similar solutions.
…
Lucio.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#34313?email_source=notifications&email_token=AA4AGDNAYJ6EF3SDURKWZDDQJ4AEDA5CNFSM4IW4YTDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6YBNGQ#issuecomment-531633818>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA4AGDIJ6DIY2LM72MIKAXLQJ4AEDANCNFSM4IW4YTDA>
.
|
I have trouble with your opening sentence: Working with and manipulating non-English data requires us to use runes slices. That is presented as a fact but is an opinion, one I just don't think is true. I speak only English but I have spent a lot of time working with text that is not ASCII and, although it can be attractive to work with rune slices, they are not really a good solution. In fact, I think they are a trap: they don't answer most of the questions that persist with multilingual text because, despite what many want to believe, a rune is not a character. (See blog.golang.org/strings for an explanation of this.) I would therefore prefer not to add such a package as it would promote bad practice. |
@robpike I hear you but now I'm really puzzled. My take away from your blog post (which I have revisited many times over the years including just before making this proposal today) is that runes are a better way to deal with non-english characters and smileys ad what not vs. bytes. Ranging over a string gives runes. Now I do recall from reading the article linked to in your blog that some Unicode code points are modifiers and what not and some characters can be made with multiple combination of Unicode code points and they can mess things up but what's a better way to deal with mutable collections of Unicode code points than a slice of runes that's made available in Go? |
Runes are code points, from which characters are made. Bytes are also things from which characters are made. Why use both? Sometimes we need the code points themselves, but providing a package that handles slices of them will encourage the poor practice of converting back and forth between rune slices and bytes slices/strings rather than the more efficient method of just iterating the bytes appropriately. |
May I share an example use case? Suppose we're building a simple text editor. When people enter text, the enter unicode code points to make characters. If we use rune slices, we can simply insert the required rune at the right position. If we are using byte slices, for each insertion or deletion, we would have to iterate the slice through a function to parse Unicode, find the right position to insert or delete & make the change. Since this iteration can throw an error, we'd have to check for error. If we are using strings, we'd have to reallocate for every single insertion or deletion & then again run iterations. Essentially if we want to work with mutable sets of unicode characters, then neither the bytes solution nor the strings solution seems efficient |
off topic, but I thought to mention Perl6 here https://www.evanmiller.org/a-review-of-perl-6.html cf: Strings and Regexes caveat, see footnote 2 a contributor to Perl6 also wrote this module |
the idea of using rope data structures in an editor intrigued me at one point but I've never taken the time to look into it |
And the runes solution is misleading and leads to incorrect thinking. Text is hard, and rune slices solve almost none of what makes text hard. |
on a side note A Philosophy of Software Design by J. Ousterhout The book includes commentary on a student project of writing a text editor. |
Using runes in a text editor seems like a good idea at first, but it fails badly once you get to Unicode compose sequences, like e + composing acute vs é. The former is two runes while the latter is one. And for some sequences there's not even a single-rune sequence. In general Unicode text processing requires considering largish sequences of input, not just a single byte and not just a single rune either. There's little benefit to []rune as the representation, and there are real drawbacks to having two representations. So Go has standardized on []byte/string and UTF-8. If you find that []rune works really well for your editor somehow (maybe you ignore all the multirune characters), that's fine. A "runes" library forked from "bytes" could easily be maintained as a go get-able package outside the standard library. Note that generics are not going to help here, because the encoding stored in the underlying data is different between []byte and []rune. This is a likely decline. Leaving open for a week for final comments. |
runes
with functionality similar to bytes
to work with rune slices
Hopefully my comment won't be interpreted as cultural bias. I'm opposed to this on linguistic reasons. Rune is used in Plan 9, and also appears in Golang. The suggested use diverging excessively from the original North Germanic languages' use of the word. D. Mendeleev used एक (eka) and द्वि (dvi) for certain postulated elements. экаалюминій, экаборъ, экасилицій |
There have been no comments objecting to declining this issue. Declined. |
Working with and manipulating non-English data requires us to use runes slices. If we want to do operations like comparing two rune slices, replacing, indexing etc, we have to cast to string, do those operations and cast back or write custom functions.
I would like to therefore propose creating a package
runes
mirroring the packagebytes
with functionality to work directly with rune slices rather than bytes to support international language use casesThe text was updated successfully, but these errors were encountered: