Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconsider pointer syntax #1454

Closed
nigeltao opened this issue Jul 20, 2022 · 11 comments
Closed

Reconsider pointer syntax #1454

nigeltao opened this issue Jul 20, 2022 · 11 comments
Labels
leads question A question for the leads team

Comments

@nigeltao
Copy link

nigeltao commented Jul 20, 2022

https://github.com/carbon-language/carbon-lang/blob/trunk/docs/design/README.md#pointer-types says "The type of pointers-to-values-of-type-T is written T*".

I get that T* looks similar to C++'s int* x syntax, but I suggest ptr T or ptr(T) instead, a short form of Carbon.Pointer(T) the way i32 is short for Carbon.Int(32).

The point being that the syntax reads left-to-right just like the prose does: "pointer to T". The implication is that const T* is no longer ambiguous as (const T)* or const (T*) and you don't need a precedence rule. It's either const(ptr(T)) or ptr(const(T)) and simply reads left-to-right. Carbon's syntax for "a function that takes etc and returns a ReturnType" also reads left-to-right, unlike C++: the ReturnType is on the right.

It also makes pointeriness, as a type decorator, consistent with Slice(T) or Stack(T) meaning "slice of T" or "stack of T", again both read left-to-right. See also the Array(N, T) instead of the Rusty [T; N] idea. Again, Array(OuterN, Array(InnerN, T)) reads left-to-right the way you'd pronounce the type name aloud.

If optional and result types, as type decorators, become idiomatic (and promoted into the prelude), you could have opt(T) or res(T) being short for Carbon.Optional(T) or Carbon.Result(T). Again, consistent left-to-right syntax makes it easy to understand opt(ptr(T)) versus ptr(opt(T)).

It also opens the door to a consistent syntax with other flavors of pointers, such as uptr(T) or rcptr(T) being short for Carbon.UniquePointer(T) or Carbon.RefCountedPointer(T), if you wanted smart pointers.

I skimmed #523 but AFAICT it's mostly discussing T* versus *T but doesn't consider not using a star at all.

Binary De-Reference Operator

Somewhat related, but coming back to C++ after some years swimming in Go, I often stumble on p->x versus v.x when Go would just use . for both. The compiler knows whether or not the left hand operand is a pointer, so why make the programmer sweat the difference in Carbon?

There admittedly may be a subtle distinction for overloaded -> operators.

Unary De-Reference and Reference Operators

Somewhat related, but more 'out there' in terms of not-C++-syntax, is spelling *p and &v as deref p and ref v (or addr v), a bit like how C++'s ! becomes Carbon's not.

Having fewer meanings for the * and & symbols means that code is easier to read, understand, and write.

@Turbine1991
Copy link

Turbine1991 commented Jul 20, 2022

I totally feel for you with the deference operators. It often feels a little nasty using a mix of both. Keeping this consistent between references and pointers - and having the backend detect accessing an unassigned object, would bring us more inline with 2022.

Though I see a flaw. If you're assigning a value to a pointer, it'll be able to change the address. But if you're assigning a value to a reference, it'll call the overloaded operator. The real question is, how important is overloading the = operator in comparison to the simplicity of this improvement. Perhaps a new language construct could be used to assign a value to a pointer.

Or perhaps I'm over thinking this. A new optional pointer implementation could be introduced, which allows such syntax. Instead of those overly verbose pointer.get()

@chandlerc
Copy link
Contributor

While I'm sympathetic to the desire to further improve the pointer syntax here, I think the evaluation of *T and ptr T really ends up in the same place. We have to make a tradeoff between preserving familiarity vs. getting to a more uniform & principled grammar. When we evaluated this, we picked familiarity, and I don't think that ptr changes things. The points you bring up around left-to-right reading and other things all apply to both spellings and were all things we considered.

It is definitely a hard tradeoff to make, we totally get that. But familiarity is also valuable, and we feel like the problems created by T* aren't large enough to balance out the very sharp hit to familiarity we expect.

That doesn't mean we can't ever revisit this decision, but I don't think we have new information to motivate that at this point. I think what would help significantly more to motivate revisiting this is getting Carbon complete enough to run some user studies and get data on how people respond to the two different syntaxes. Until then, I think we should stick with our decision.

The other two are really separate questions. I think there has already been extensive discussion of postfix dereference that would be good to look at first for that one. For the second one, that probably should just be a fresh question for the leads. I would think about the familiarity tradeoff for those as well.

@nigeltao
Copy link
Author

Fair enough on 'not now, maybe later'. Nonetheless...

On familiarity, Carbon already diverges from C++ on array syntax, on function syntax and on basic b: bool versus bool b syntax. Whatever syntax Carbon will have to represent C++'s std::unique_ptr<T>, I'm guessing that it won't be exactly the same as C++ (and hence, unfamiliar to some degree).

As for "the evaluation of *T and ptr T really ends up in the same place", I will repeat that the former doesn't have an obvious analogous uptr T syntax, something not considered in #523. In my modern (C++11 or later) C++ APIs, I prefer to use std::unique_ptr<T> over T* to clearly show (in code, not just in comments) ownership: whether the caller or callee is responsible for deleting an argument or return value. As better memory safety (including avoiding double-free, use-after-free and memory leaks) is a Carbon goal, I hope that idiomatic, hand-written Carbon APIs would do similarly, especially if passing around std::unique_ptr can be made as performant as passing around raw pointers.

@nigeltao
Copy link
Author

Separately, Carbon's T* might be a false friend. Syntactically, it resembles a C++ pointer but, semantically, it acts like a C++ reference, due to non-nullability.

Tangential idea: if we're keeping T* syntax, consider T& instead. If we flip to ptr T, consider ref T instead. Centering Carbon on references (instead of pointers, now best thought of as 'optional references') also mitigates my "Binary De-Reference Operator" question, as v.x should then become the common thing and p->x a rare 'this exists for C++ historical reasons' thing.

@chandlerc
Copy link
Contributor

Fair enough on 'not now, maybe later'. Nonetheless...

On familiarity, Carbon already diverges from C++ on array syntax, on function syntax and on basic b: bool versus bool b syntax. Whatever syntax Carbon will have to represent C++'s std::unique_ptr<T>, I'm guessing that it won't be exactly the same as C++ (and hence, unfamiliar to some degree).

Certainly, we can't be 100% familiar and make the improvements we hope to make. =] You mention many changes that seemed reasonably positive on the familiarity-cost vs. improvement-benefit ratio.

IMO, we also shouldn't give up on familiarity because we're not at 100% or even because we're below some %. I think every bit of familiarity we can provide has value. The question is more -- where does the value of familiarity fall below the benefit of changes? That's where we start considering a tradeoff towards a break in syntax.

So we end up having to draw lines somewhere. In the long-term, I would actually be interested in seeing real usability studies that try to examine how important different familiarity is to see if we're actually making the right tradeoffs. But those will be very difficult to run and are a long way away (they'll almost certainly need a working toolchain). Until then, I think we'll have to accept the judgement calls on where to draw the line. And I've flagged this to get a fresh call, just want to write up some thoughts on the fact that I do think these are tough calls but necessary to draw the line somewhere and move on.

As for "the evaluation of *T and ptr T really ends up in the same place", I will repeat that the former doesn't have an obvious analogous uptr T syntax, something not considered in #523. In my modern (C++11 or later) C++ APIs, I prefer to use std::unique_ptr<T> over T* to clearly show (in code, not just in comments) ownership: whether the caller or callee is responsible for deleting an argument or return value. As better memory safety (including avoiding double-free, use-after-free and memory leaks) is a Carbon goal, I hope that idiomatic, hand-written Carbon APIs would do similarly, especially if passing around std::unique_ptr can be made as performant as passing around raw pointers.

Definitely agree that an owning pointer will likely end up needing some new syntax (or library type). But non-owning pointers are much more widespread in code and APIs. So I think the familiarity argument is primarily around non-owning cases.

While the concept of pointer we're currently gravitating toward is non-nullable, it is still distinctly indirect -- you can refer to the pointer distinctly from the pointee. I think the erasure of this distinction is the primary aspect of a reference. As long as we are preserving this distinction, I think pointers are a better term to anchor around (even if imperfect).

@jimspr
Copy link

jimspr commented Jul 22, 2022

IMO, the issues are more apparent with larger examples involving pointers to functions. Since C/C++ is inside-out, it gets complicated for types involving functions. I looked around briefly, but I didn't see how functions pointers are handled in Carbon. For instance, this is painful in C/C++: "char * ( * ( * a[N])())()" What would that look like in Carbon? Also, given what looks to be the array syntax for types in Carbon, it looks like arrays will be inside out as well.

@nigeltao
Copy link
Author

nigeltao commented Jul 22, 2022

But non-owning pointers are much more widespread in code and APIs.

Today, yes, and necessarily so in any C/C++ code that cannot assume C++11 as a minimum. And there's definitely a trade-off between the short term migration state and the long term end state. But if Carbon can assume that C++11 features are always available, I'd hope for the long term state (especially at API boundaries) to involve much more owning pointers (or shared pointers or lifetime-constrained-but-otherwise-raw pointers).

@chandlerc
Copy link
Contributor

But non-owning pointers are much more widespread in code and APIs.

Today, yes, and necessarily so in any C/C++ code that cannot assume C++11 as a minimum. And there's definitely a trade-off between the short term migration state and the long term end state. But if Carbon can assume that C++11 features are always available, I'd hope for the long term state (especially at API boundaries) to involve much more owning pointers (or shared pointers or lifetime-constrained-but-otherwise-raw pointers).

I hope for our pointers to grow smoothly into lifetime-constrained non-owning pointers.

@chandlerc
Copy link
Contributor

IMO, the issues are more apparent with larger examples involving pointers to functions. Since C/C++ is inside-out, it gets complicated for types involving functions.

I'm hopeful that Carbon will focus more on a callable interface and generic pointers to it which shouldn't have this complicating problem.

@chandlerc
Copy link
Contributor

The leads talked about this, and I think we all remain open to an all-prefix notation here.

However, at this point we don't really have any new information from when we made the decision, and there remains a familiarity benefit to the postfix-*.

We want to get some experience with postfix-* to understand in practice the cost of retaining this familiarity before we change it to something else. We'll work to keep prefix syntaxes open however when evaluating other designs.

@chandlerc chandlerc closed this as not planned Won't fix, can't repro, duplicate, stale Jul 30, 2022
@jonmeow jonmeow added the leads question A question for the leads team label Aug 10, 2022
@nigeltao
Copy link
Author

we don't really have any new information from when we made the decision

FWIW, it's Sutter's (freshly publicized) cppfront and not Carbon, but https://github.com/hsutter/cppfront/blob/8140ab920a38e9cc66d1862c9b823a98a9f25e3b/regression-tests/pure2-lifetime-safety-reject-null.cpp2#L10-L14 uses prefix-star for a pointer type (and postfix-star for de-reference):

    p: *int = nullptr;
    // ... more code ...
    print_and_decorate( p* );

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
leads question A question for the leads team
Projects
None yet
Development

No branches or pull requests

5 participants