-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OrderedDict to Base (for Julia 1.6?) #37761
Comments
Why not use OrderedCollections.jl? DataStructures.jl just re-exports OrderedCollections.jl: Importing OrderedCollections seems to be quite fast: $ julia --startup-file=no -e '@time using DataStructures'
0.155198 seconds (107.88 k allocations: 8.267 MiB)
$ julia --startup-file=no -e '@time using OrderedCollections'
0.034274 seconds (29.91 k allocations: 2.282 MiB)
$ julia --startup-file=no -e '@show VERSION'
VERSION = v"1.6.0-DEV.1045" |
You can apply that to basically every popular package so that's not a good argument for anything. Unless it's absolutely needed in Base, it should not be included. OTOH, including it in the stdlib would be OK. |
What can happen is that Julia's dict becomes ordered at some point (#10116) but new very similar Dict types will not be put in. |
Ok, that's pretty much what I meant, include an ordered variant, just not exported (why I mentioned Base), to keep compatibility. I'm not up-to-speed, but I'm not sure there's any "the" stdlib, only stdlibS (#27197) and I think adding OrderedCollection.jl as one would be great.
The "at some point" argument, means we've been missing out on this container as "batteries-included" for 5+ years (I know the arguments for and against inclusion of any code in general, and specifically for and against ordered). I think it's really important to have it in, even as the default, prevents bugs, and unordered should be opt-in. It's true I overlooked OrderedCollections.jl until I now when I started looking into locating the OrderedDict code to copy it for a PR. The speed argument of loading isn't the best, and Revise3 made loading 4x slower, however, when you look into the packages, you need to decide between 3+ alternatives: OrderedDict, OrderedRobinDict, LittleDict (SwissDict?) and for new users, to exclude false DefaultOrderedDict, OrderedSet, MultiDict (FrozenLittleDict, UnfrozenLittleDict?). I think an argument can be made for users to have one reasonably good default (ordered) option included, then advanced users look for options, e.g. [Unordered]Dict if it's actually faster (LittleDict is often faster). |
Huh? Not even close. Fresh Julia 1.6 session, Revise 2.7.6: tim@diva:~/.julia/dev/GridLayoutBase$ time julia-master --startup-file=no -q -e 'using Revise; yield(); using Images; yield()'
real 0m7.234s
user 0m7.201s
sys 0m0.328s Now with Revise 3.1.2: tim@diva:~/.julia/dev/GridLayoutBase$ time julia-master --startup-file=no -q -e 'using Revise; yield(); using Images; yield()'
real 0m4.376s
user 0m4.336s
sys 0m0.316s That's almost 2x faster, not 4x slower. |
[We should follow up on this off-topic at: https://github.com/timholy/Revise.jl/issues/542] You've comparing Revise2 to Revise3, I meant just loading without Revise vs. with Revise3 (even disregarding the startup cost of Revise itself):
|
There are several issues here. I'll get back to the issue of "would it help to move OrderedDict to a stdlib?" (answer: no) in a sec, let's first tackle how you're thinking about latency. You seem to imagine all factors are multiplicative. You're better off thinking of a model where the hit is tim@diva:~/Info/papers$ time julia-master --startup-file=no -e 'using Flux; yield()'
real 0m11.028s
user 0m10.769s
sys 0m0.532s
tim@diva:~/Info/papers$ time julia-master --startup-file=no -e 'using Revise; yield(); using Flux; yield()'
real 0m12.416s
user 0m12.073s
sys 0m0.604s Together (a small package like DataStructures and a large package like Flux), these two suggest that the only statement that is even remotely accurate is to say that Revise has a ~1s constant latency ( Back to the OrderedCollections issue. You are right that much of Revise's extra latency is due to inferring and compiling Dict and OrderedDict methods. But the problem is that currently precompilation can't cache even the inference results of Dict methods (as shown in #31466) due to the issues mentioned in #32705. So the fact that Revise needs to create type that are not in Base means that this wouldn't help. The only thing that would help would be to have an ordered Dict in Base and move all of Revise's key data types to Base, so that it could be built into the system image. But you can also do that now with PackageCompiler, and given your regular interest in latency I recommend you give it a try and see if you like it. |
I would like to propose OrderedDict added (I guess unexported, at least for now), to Julia 1.6. When It or later version will be LTS, and Julia 1 dropped as LTS, I feel it important it is readily available in Base, to be available in all used Julia versions from the (faster) system image.
Why? You can use DataStructures.jl, but it's a hassle to add it to each library that needs this one important container (I just fixed a bug in Pandas, that would have worked by just renaming Dict to OrderedDict had it been available, or Base.OrderedDict in this proposal), and will slow down (on my recent master):
This seems like a small PR, that I could even do. That code is available and fully baked, and think it's still Jeff's code, code I trust.
I'm NOT proposing changing the default of DIct to OrderedDict, at least for now, as I think it's still slower, and we can't really go back with that API decision. I note however that's what Python did:
So, at a minimum I propose just A.:
A.
Add OrderedDict, I think it needs to be NOT exported to avoid conflict in case DataStructures.jl will also be used.
B.
I think we could make a "standard library" named Ordered, so that:
will change the meaning of Dict to OrderedDict.
C.
Maybe at some point in the future we can do B. by default, and add UnorderedDict.
The text was updated successfully, but these errors were encountered: