-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
redesign constructors to allow uninitialized fields #30
Comments
Wouldn't it be possible to generate code to check for uninitialized references only after an "inner" constructor returns? Some checks could of course be eliminated since the compiler can reason that they must be set (like Of course, the trouble with that would be when you have something like this:
Here the init function gets a partially initialized But what if we required a special type for uninitialized objects? So that The uninitialized types would always be immediate shadow parents of their initialized concrete counterparts. So the declaration that
That's a little tricky, but does seem to solve the problem. |
This is clever, but I don't think we should do it. This type would have to be hacked in as a special case in various little places in the compiler and it could get ugly. It would also introduce (1) a case where the type of a value changes, currently impossible, and (2) a concrete type that has subtypes (Uninitialized is concrete because there can be a value of that type). |
Ok, these are all fair points. It's a cute idea though. It does feel a little too clever, admittedly — and a little too much like how Ruby handles the object specific parent class, which is confusing but clever. And it's a good point about testing a word that's already loaded not being likely to be all that expensive. I like the idea of using a |
Oh darn, I forgot: using SEGV would not give us the behavior we have now, which is to throw an error as soon as a null reference arises, not when it is used. For example, assuming o.f is uninitialized:
In general, you'd get the error at some random point in the future. Probably not ideal. On the other hand, when accessing an uninitialized element of an unboxed array you don't get any error, so perhaps this is "consistent". |
If we're going to allow uninitialized references to hang out as long as one feels like letting them hang out, then this strikes me as the most consistent behavior, honestly. It kind of sucks though because then you can have uninitialized values potentially propagating all over the place from a single accidentally uninitialized field that gets used, but I feel like that's kind of the corollary of allowing them to persist beyond object construction at all. This is one of those decisions that it's really hard to predict which choice will in retrospect have been the right one. It feels very much like we could easily go one way and later really regret it as it becomes clear that the choice was either too lax or too strict. If we go lax, we can always get stricter, but we'd end up potentially breaking people's code. If we go strict, we can always relax things later without breaking anything. But implementing strict behavior is probably harder. |
It's a tough one. But the lax behavior is the one that everybody has done before, and that the designers of Algol said they regret IIRC. A good rule of thumb is to raise errors as soon as possible (i.e. closer to the root cause). I just tried a cell array access microbenchmark, and the null checks seem to cost about 2%. |
Honestly, 2% is not bad. Nobody really cares if something is 2% slower (except in benchmarks competitions). These days, no one cares that Hadoop is absurdly inefficient, because it can scale, which is far more important. So maybe we should go strict and fix the error that the Algol folks feel that they made. Then the question is becomes how strict to be. Is letting null references exist but checking for them on usage and throwing an exception the right trade off? Can you come up with an example of a case where letting an object be constructed with uninitialized references is really handy? If we allow uninitialized references to persist beyond construction time, we will need to provide intrinsics to check if a field is initialized or not, if for no other reason, than that we would need that in order to display objects with uninitialized references without barfing. If there are a lot of other situations where checks like that would have to be made, it would seem like a fairly compelling argument for not allowing uninitialized fields to escape constructors. Think about how much of a mistake allowing the |
This also interacts with the immutable issue: obviously you never want to construct an immutable object with uninitialized fields, because then it will be permanently uninitialized. One idea would be to disallow the overriding of default inner constructor for immutable types — then you can only construct immutable objects by giving all field values to the default inner constructor. On the other hand, if inner constructors were never allowed to be return objects with uninitialized fields, then you could safely allow the definition of non-default inner constructors for immutable objects. However, that implies that the |
Actually, looking at the |
Exactly; the only way to get rid of "null check hell" is to raise an error as soon as a null reference is accessed. Checking at the end of a constructor is neither here nor there since you can still call functions during the constructor. I have thought about the ability to check whether a reference is uninitialized, and I have hesitated. A good example is HashTable, which uses an Array to store the keys and values. Any value can be a hash key, so unused spots must be uninitialized. I use a bitvector to keep track of which spots are used. Sure it would be more efficient to have isassigned(a, i). But then whether something is uninitialized becomes part of your program's behavior, and the next thing you want is the ability to clear a reference back to uninitialized. I guess that's kind of a hybrid approach where you still need null checks, but when you don't do null checks errors are thrown sooner. What I'm shooting for is to have no null pointers, period. In fact we currently have this with our struct types. But this comes at a cost of too little flexibility in constructing objects. My new proposed contract with the user is that there still may not be any null pointers, but you're on your own to make that happen since the language can no longer guarantee it. From that perspective it might make sense to check at the end of a constructor, but there is no analogous thing you can do for Arrays. I don't care what happens when printing objects. Printing something as null is kind of dishonest because you can't do anything with that knowledge; accessing the element is still an error. The contract is that an uninitialized object is not allowed to exist, whatever happens is your own fault, but we only check at well-defined points. This is like how you can have prohibited appliances in freshman dorm rooms at Harvard as long as you hide them during vacations when room inspections take place. |
For immutable objects |
My objection to not being able to print or otherwise check for uninitialized fields is that without that ability, you'll have to completely guess where you have an initialized value. If they exist and you can't check for them, then you're really just sticking your head in the sand and pretending they're not there. And you can always just trap UndefRefErrors and then do something else in the catch block. How is that different from having an |
I actually really dislike having |
It's true that you can implement A construct for "changing" a field of an immutable object is pretty common; even haskell has it. I feel like people will want to do things like assign the fields of a complex number, and do |
I'm having trouble with the constructor for |
Another possibility: reassign
Or a keyword argument?
though that one is a little leaky. |
I'm suspicious of hoping that the compiler is going to eliminate the creation of temporary objects. Even if it does manage to do so, imo, it puts too much distance between what's conceptually going on and what is actually going on. If a constructor can call another constructor to do its work, then it seems to me like these are the options:
I'm not really sure which is best :-| |
Here's a weird one... have |
Can you explain this bit:
Other than that it seems reasonable enough, but I'm a bit hesitant to add magical behaviors... |
The first one can use So we can have recursive constructors with no redundant allocations and allow uninitialized fields with very little overhead. As per your recent email on hash tables, |
So this is the basic plan:
This seems like a pragmatic combination. The last point is really important because it prevents "null check hell" and prevents undefined values from silently propagating through a program, thereby also avoiding many null checks, since anything that's been used cannot possibly be null. |
A couple weeks ago Stephan B. observed that you might have a constructor like:
So we actually need to convert every occurrence of Is the design settled? It seems a bit crazy but I feel like we really want recursive constructors. It's a clever feature. In other languages you have to do things like |
This whole thing still feels fishy to me, but I don't have anything better :-( |
You know, I'm thinking that having a |
Using this approach, the default constructor would just be this:
This would be provided by default, of course, so the above definition would be unnecessary. Here are various examples to try this on and see how it feels...
A little verbose but pretty clear.
Clean, clear, short. I like it.
Also simple, clear and concise. |
This would also jive well with immutable objects — you create them in their final form using |
It's a good point that My only remaining problem is that I want to rename the |
For posterity, it should be noted that half of this discussion took place in this email thread. |
I agree that |
It's not supposed to be a special constructor for arrays. It's part of the mutable container interface. It lets you write generic container manipulations like
How about |
branch merged in commit 587f36b. fixed. |
Add sizehint!, fix version number for atsign-noinline
Upgrade Pkg3 and ext dependencies to 0.7
Add exercise: custom set
Stdlib: ArgTools URL: https://github.com/JuliaIO/ArgTools.jl.git Stdlib branch: master Julia branch: master Old commit: 08b11b2 New commit: 4eccde4 Julia version: 1.11.0-DEV ArgTools version: 1.1.1 (Does not match) Bump invoked by: @DilumAluthge Powered by: [BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl) Diff: JuliaIO/ArgTools.jl@08b11b2...4eccde4 ``` $ git log --oneline 08b11b2..4eccde4 4eccde4 build(deps): bump actions/checkout from 2 to 3 (#30) 6a4049d build(deps): bump codecov/codecov-action from 1 to 3 (#32) f94a0d3 build(deps): bump actions/cache from 1 to 3 (#31) cb66300 enable dependabot for GitHub actions (#29) ``` Co-authored-by: Dilum Aluthge <dilum@aluthge.com>
…1047) Stdlib: NetworkOptions URL: https://github.com/JuliaLang/NetworkOptions.jl.git Stdlib branch: master Julia branch: master Old commit: f7bbeb6 New commit: 976e51a Julia version: 1.11.0-DEV NetworkOptions version: 1.2.0 (Does not match) Bump invoked by: @DilumAluthge Powered by: [BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl) Diff: JuliaLang/NetworkOptions.jl@f7bbeb6...976e51a ``` $ git log --oneline f7bbeb6..976e51a 976e51a Use human-readable title in the docs (#30) 895aee9 Update ssh-rsa key for github.com (#29) db83efd fix an issue found by JET (#28) ``` Co-authored-by: Dilum Aluthge <dilum@aluthge.com>
Also fixes a bug when trying to run a :thunk expr in Compiled mode.
…d56027 (#54056) Stdlib: ArgTools URL: https://github.com/JuliaIO/ArgTools.jl.git Stdlib branch: release-1.10 Julia branch: backports-release-1.10 Old commit: 08b11b2 New commit: 5d56027 Julia version: 1.10.2 ArgTools version: 1.1.2(Does not match) Bump invoked by: @IanButterworth Powered by: [BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl) Diff: JuliaIO/ArgTools.jl@08b11b2...5d56027 ``` $ git log --oneline 08b11b2..5d56027 5d56027 build(deps): bump julia-actions/setup-julia from 1 to 2 (#38) b6189c7 build(deps): bump codecov/codecov-action from 3 to 4 (#37) 997089b fix tests for TEMP_CLEANUP, which might be a Lockable (#35) 4a5f003 build(deps): bump actions/cache from 3 to 4 (#36) 84ba9e8 Hardcode doc edit backlink (#34) 9238839 build(deps): bump actions/checkout from 3 to 4 (#33) 4eccde4 build(deps): bump actions/checkout from 2 to 3 (#30) 6a4049d build(deps): bump codecov/codecov-action from 1 to 3 (#32) f94a0d3 build(deps): bump actions/cache from 1 to 3 (#31) cb66300 enable dependabot for GitHub actions (#29) ``` Co-authored-by: Dilum Aluthge <dilum@aluthge.com>
Currently we need to do ugly things to make circular references:
That is really bad, both ugly and compiler-unfriendly. With the change, we'd have
For constructors inside the type block,
this
is supplied, and the type's static parameters will be visible as well.this
will also be returned automatically, and usingreturn
in a constructor will be an error.Here's what
Rational
would look like:Notice the constructor inside the type block is only usable given an instantiated version of the type,
Rational{T}
. An outside generic functionRational()
is defined to do this for you. If there are no user-defined constructors, we can still provide the same default constructors we have now.One downside to this is that we'll have to check each field access for uninitialized references, as we currently do for elements of pointer arrays.
The text was updated successfully, but these errors were encountered: