-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Rename StrBuf
to String
#60
Conversation
I can't think of a reason to disagree with this, but it seems to go against the grain of existing languages so I would like to understand better the drawbacks for this. In C++, Java and most other 'static' feeling languages, the default and ubiquitous string type is fixed length and the more flexible, mutable string type has a longer/more complex name and is secondary. Here we propose the opposite. The only advantage I can see is saving a word per string and history. If that is it then +1 to this proposal. Are there other reasons to preference fixed length strings? |
Wouldn't DST allow us to have |
By fixed length you mean immutable, right? Not "in the type"? Also note that in Java |
Another option could be to have a much higher level persistent string data structure, such as a variant of https://en.wikipedia.org/wiki/Rope_%28computer_science%29 made with Arcs where nodes could be an enum (discriminating on unused pointer bits) of a tree node, &'static str, 0-7 inline characters, ~str or Arc. This would basically allow to clone, share, send and concatenate strings freely with constant memory cost and O(1) or O(log n) time cost. The main issue of this kind of data structure in general is that accessing the kth element requires O(log n) time instead of O(1), but unlike in arrays where it is sometimes the prevalent operation, this is very rare with strings (also because due to UTF-8 it is mostly meaningless), so it could be a good fit. EDIT: to be clear, the API-level difference is that a solution like this allows to use the same type for both "work in progress" and "finished" strings without catastrophic effects like using 2x memory (using StrBuf for finished strings) or requiring quadratic time for repeated concatenation (using string for work in progress strings). |
I would think we still need a contiguous-memory string builder type at some level, but a rope is certainly desirable as a library. |
@pcwalton yes, I mean immutable rather than length in the type. Java's String is in the library, but it is also (iirc) preferenced by the language in certain ways (string literals, concatenation syntax, etc.) that |
Java Strings are immutable, but I think a big reason for that is so that methods like trim() can just return a slice of the original String without worrying about it getting changed by another operation. Since Rust can freeze the original string when you take a slice, it doesn't have to worry about this particular case. |
I don't agree; lang items are really nice for letting us streamline the compiler and language by allowing us to reuse most of the rules for user-defined things in language-defined things. For example, operator overloading used to not be done with lang items, but instead with magic compiler rules; it was a lot simpler when we moved them to be lang item traits. |
I think we are in violent agreement and that either I didn't phrase things very well or I don't know my own mind. The lang items concept works very well I think. To be concrete about something in Rust that I don't like: In particular with Java Strings, (and I am totally ignorant about the implementation, so I might be wrong) it seems that Strings are special cases rather than having general purpose stuff that uses a lang item type thing. For example, there is no operator overloading for any class except String which gets to overload |
|
There are going to be other string types like one doing small string optimization. It's a bad thing to have one more closely tied to slices than others, because it encourages people to ignore choosing the one that's actually the right tool for the job. |
Oh yeah, +1 for |
There is some danger that this change would encourage people to use |
@zkamsler where would StrSlice be more appropriate? (Genuine question, I have the same gut feeling myself, but can't actually come up with any examples). |
I think the naming we've chosen for arrays is some what arbitrary already (array: |
@nick29581 In my opinion, a |
I do not mind the spirit that drives this ticket, but I would not r+ the RFC myself until we find a better name for whatever replaces (On mozilla/rust#13717 I suggested the paired renaming @thestinger 's suggestion of Has the paired renaming Worst case scenario, of course, is that I will just do |
Some that came up during lunch today. We could call the string type let foo: &'static chars = "test";
let bar: &chars = foo.slice_to(2);
let mut baz: Str = Str::from_slice("test");
baz.push("bar");
let foo: &chars = baz.as_slice(); |
I figured it'd be On Thursday, May 1, 2014, Alex Crichton notifications@github.com wrote:
|
@alexcrichton: I don't really like |
It is a collection of chars, just encoded properly. cf. On Thursday, May 1, 2014, Daniel Micay notifications@github.com wrote:
|
I don't see this as an indictment against |
if you want people to use a string slice where possible and not a growable string, would you apply the same rationale you are with the sigils (less typing, less cost) and use the smaller,convenient name for the lighter string slice, and keep a longer name for the growable buffer. so would that suggest plain Str for the immutable string slice (the thing impatient people will prefer to write), and StrBuf for the dynamic resizing buffer. I've always been happy with the name 'buffer' for something growable (and I don't carry any sentimental attachment to C++'s string type either ) |
The goal should be to encourage using borrowed slices whenever possible, and I think either |
yes i realise you'd pass a pointer not an actual allocation around as a temporary. (&Str?) |
A string is just a vector of bytes, enforcing the UTF-8 invariant. The current |
I like |
+1 for |
Update: Currently in Rust, I believe the plan of record is to remove the impl of If I am mistaken and this is not the plan, then I retract my support for |
I have a strong dislike for Basically, I think having a word and an abbreviation of that word as two things is as bad (or even worse) as using the same word. Note that the First off, I think that the fact that there are similar concepts, does not mean we should have familiar names. There is liable to be confusion as to when to use the two different kinds of string, since they both represent an abstract idea of a string of characters. But, they are different and have different use cases and programmers should think of them differently. Therefore, I think it is important that they have different names. Since Some concrete examples of why this is a problem: Learning Rust, a beginner reads a tutorial. Reads such and such string does such and such. Alt-tabs to their editor and needs to type out a string type - "do I need str or String", back to the tutorial. And then its forgotten the next day because this kind of thing does not stick well in the mind. IRL conversation between devs: Every single time you say string, you will have to disambiguate. Even if you personally maintain a convention that you say "stur" and not string for Since I feel like I shouldn't be entirely negative, I would like to vote for ( |
Sounds like a bad tutorial. The best naming in the world won't help a tutorial that doesn't teach the difference between owned strings and string slices. If the user ever questions "do I need str or String" then they haven't properly learned the difference between the two types. This holds true if it's called
Why would anyone ask that? If the second speaker really needs to know the difference, the question is "owned string or slice?"
Again, why? In most conversational contexts, it doesn't matter. And if it does matter, you'll have to disambiguate even with your proposed |
But these names don’t tell which is which. It could be owned |
If there are going to be multiple string types with multiple purposes, maybe having |
@SimonSapin I'm not saying they are perfect, just that they are better than str/string. In the chars/utf8 case you have to memorise that two different words have two different meanings (and you are right, there is no indication of the word-> meaning mapping). In the str case you have to memorise that the same word has two different meanings depending on how it is spelt (and again, there is no indication of word->meaning). |
Note that there's precedent for this: Java has kballard and SimonSapin have also brought up good points. I'd like to add that
I don't think anyone is saying that |
@nick29581 IMO it's very easy to understand which type stands for what in |
@kballard I assume the tutorial_does_ explain the difference. The reader has to understand and memorise the difference. I posit that understanding and memorising that difference is easier if there are different words, not just different to write, but different to think. That is, I believe life is easier if it is StrBuf/str or zaphod/ford, and harder if it is String/str. I disagree on the conversational point. People (generally) use the language they program in as the language when discussing the program. Even where the word can't be pronounced - I hear "twiddle T" more often than "unique pointer to T", and so forth. I think you are probably right that people will prefer 'string' to 'you tee eff ate'. So I guess I should prefer chars and maybe people will use that. I think that there is a level of formality between conversational (where you would use 'string' and 'number') and totally precise (where you would use 'you 32' and 'you tee eff ate'). I think it is worthwhile having (mostly) unambiguous language for that. |
That's true no matter what the names are. I believe that naming one after the word "string" and the other after anything else will lead to confusion. For example, as @Valloric says, if the slice is called
As I stated above, regardless of what you name them, people will still think of both as strings. One is an "owned string" and the other is a "string slice". Just as how
How do you pronounce I certainly don't pronounce it "amp left-bracket tee right-bracket". I usually just say "vector", and if I feel like I need to distinguish ownership then it's "vector slice" or just "slice". Similarly, I pronounce
Do you disagree that the important distinction at this point between the two types is one of ownership? Because once you've proceeded past the informal "call them both 'string'" stage, what you need to do is distinguish which is the owned value and which is the borrowed value. We already have the proper terminology for that distinction, and as I talked about earlier, that terminology ("owned" vs "slice") is how I use to talk about both strings and vectors in any situation where it matters. If you do agree that the important distinction at this level is ownership, then the actual type names shouldn't be important for this level of discourse as you're still not actually using them. |
@kballard I fear we will not be able to reconcile our points of view. The trouble is that we both have valid points, but we disagree on which has the higher priority and I think that is very difficult to argue. I agree To answer your questions. I pronounce I think that ownership is one difference. It is also important to know that StrBuf is growable and str is not and that str has the same physical size as its logical size, whilst StrBuf might include spare capacity. Also, you can have a borrowed reference to a StrBuf, so relying purely on the ownership to disambiguate is not enough. (of course a borrowed reference to a StrBuf is not a slice, but I think that is more reason to disambiguate them). |
I was not expecting that. I think this explains a lot about why we disagree.
Only if there's an owned, non-growable string type that is in common usage.
Sure, but that's an uncommon type. If I really need to talk about it, I'll just be precise ("amp stir buff", or "reference to stir buff", etc). In general, when talking conversationally about types, the context is the types that are in common usage in the code you're talking about. Once we've finished transitioning from I think you are right that we will never reconcile our points of view. So obviously one of us has to give (or be overruled). I would urge you to consider backing down. If you read this entire thread and all of the discussion in other places, I think you'll find that you are in the significant minority in your opinion. The most common opinion here is that |
@kballard I didn't mean to compare |
I'm not sure what you mean. If you expect people to understand that the phrase "owned string" is a reference to |
If the two camps boil down to the following:
Then the only hope of reconciliation is to agree to give one of the types a compound name. Something like |
I vote +1 for &str / String.
|
It makes sense to me for strings to be named similarly to arrays/vectors: Pronunciation is something I’m unsure about, but I pronounce (IMO, in a perfect world string literals would also mirror arrays/vectors — TL;DR: |
The most common used string type should be named using the word "string", |
A string contains text. Consider renaming the string slice type to Otherwise, +1 for |
StrBuf
to Str
and remove &str
from the languageStrBuf
to String
We decided in today's meeting to accept this modified version of this RFC. |
This commit adds a new Cargo feature to the `futures` crate, `use_std`, which is enabled by default. The `futures` crate is now tagged as `#![no_std]` with extra support like tasks, oneshot, and channels coming in through the `use_std` feature. Most crates will likely depend on the `use_std` version, but that shouldn't preclude those that don't want to! Closes rust-lang#60
@netvl |
No description provided.