-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite Data.IntMap to be faster and use less memory #340
base: master
Are you sure you want to change the base?
Conversation
This also partially fixes #327 - it doesn't have all the tactics and instances or |
Data/IntMap.hs
Outdated
@@ -10,6 +10,7 @@ | |||
-- Module : Data.IntMap | |||
-- Copyright : (c) Daan Leijen 2002 | |||
-- (c) Andriy Palamarchuk 2008 | |||
-- (c) Jonathan S. 2016 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really have no clue what the proper thing to do here is. To be clear, I'm pretty sure that all the actual code is mine - I looked at the existing source a few times to clarify what exactly some functions did and to put things in a nice, consistent order, but otherwise everything was build from scratch. However, the documentation is lifted straight from the existing codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation is also subject to copyright. I'm generally in favor of being expansive about attribution; it's almost always safer (legally, academically, and interpersonally) to give unnecessary credit than to fail to give necessary credit. That said, you should probably mention somewhere near the top that you wrote the code while previous authors designed the interface and wrote the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I figured so, and in general I wanted to be conservative both to properly credit the work done on documentation and to avoid legal issues since I know basically nothing real about copyright law. I'll add some clarification, but leave both existing lines.
This will be primarily up to @wrengr to decide. Are there operations or contexts where this new implementation is known or believed to be inherently worse? |
No. Everything has the same asymptotics (except |
Also, I think that most of the existing regressions can be fixed, especially the large ones - I haven't done the work to investigate |
Can you get this to build with older GHC by conditionally using your own On Sep 13, 2016 12:15 AM, "Jonathan S" notifications@github.com wrote:
|
32cfb0c
to
8a35111
Compare
I remove the requirements of |
The bounded-intmap README indicates regressions in intersection and On Sep 12, 2016 11:57 PM, "Jonathan S" notifications@github.com wrote:
|
So, I ran the
In particular, |
So it seems that the worst of the
Now the worst regression is 45%. |
If the worst regressions remain that bad, it may make sense to add your On Sep 13, 2016 5:45 PM, "Jonathan S" notifications@github.com wrote:
|
I just pushed a hacky change to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current IntMap
uses Bin
and a "heavy" Tip
everywhere but the top, where a Nil
is permissible. You use data Node t a = Bin !Key a !(Node L a) !(Node R a) | Tip
. There is a non-trivial cost to these empty Tip
s. In particular, every leaf of the tree ends up looking like (to use the current constructor names) Bin k a Nil Nil
, which is two words larger than Tip k a
. I'm not saying you're wrong, but it may be something to consider. One alternative might be data Node t a = Bin !Key a !(Node L a) !(Node R a) | LOnly !Key a !(Node L a) | ROnly !(Node R a)
.
Apropos of nothing, I do not like your definitions of R
and L
. Would it hurt anyone to use data L = L
and data R = R
?
Data/IntMap.hs
Outdated
-- Workshop on ML, September 1998, pages 77-86, | ||
-- <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.5452> | ||
-- | ||
-- * D.R. Morrison, \"/PATRICIA -- Practical Algorithm To Retrieve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this data structure still based on ideas from the PATRICIA paper? If so, the citation should remain. It may be that nothing remains here of Okasaki and Gill, in which case it would be reasonable to delete that reference; I'll leave that up to you.
The vast number of changes to |
I can copy some files around if you wish, but since this really is a rewrite, not an edit, I'd just recommend looking at the new code not in diff format. |
(Additionally, I don't think a separate file would work, since Github limits diffs to 1500 lines.) |
I'm not sure if there's a way to make it happy. The annoying thing is that On Sep 14, 2016 5:35 PM, "Jonathan S" notifications@github.com wrote:
|
I actually chose to use
This could work, and certainly would save more memory (only
I'd be willing to try the more compact variation if you think the space savings are important enough, but I'd prefer to do that as a separate PR if at all.
|
I'll have to read more deeply into this, because that doesn't make sense to me yet. The current code stores a key and a value in its
I fear you may be right about this. How hard would it be for you to use pattern synonyms temporarily, just to see how performance compares? I think these should all be "simple bidirectional" synonyms, which are generally rather straightforward.
Being smaller is tremendously helpful all by itself. But wait! There's more! A pointer to an already-evaluated node will generally be tagged (in low bits) to indicate which constructor that node was formed from. I don't know all the details of how this helps, but it helps quite tremendously.
How about something like this? #if __GLASGOW_HASKELL__ >= whatever
data Hand = L' | R'
type L = ' L'
type R = ' R'
#else
data L = L
data R = R
#endif |
Actually, I guess the pattern synonyms may not be quite as simple as I Another place to look for potential performance improvements is at the top On Sep 14, 2016 6:15 PM, "Jonathan S" notifications@github.com wrote:
|
Argh... I started looking into this some more, and I don't think pattern synonyms can help. However, I noticed that Another question: is there an efficiency benefit to writing |
Don't worry about using throwaway types like By the way: I should have already said that I greatly appreciate the work you've done on this tremendous rewrite already. Aside from the fact that it speeds up a lot of critical operations, I'm seeing now that its code is much easier to understand than that of the current |
No. I really have no clue why certain folds are slower with this implementation. The only thing I can think of is the fact that we need to use 3x the stack as the previous implementation since we need to store a node, a key, and a value instead of just a node at certain recursive calls.
Even if I fail to reduce the
Also this structure stores the keys and values in different locations, the spine of the tree is the same. If the existing implementation looks like Bin _ _ (Bin _ _ (Tip _ _) (Bin _ _ (Tip _ _) (Tip _ _))) (Tip _ _) then my implementation will look like NonEmpty _ _ (Bin _ _ (Bin _ _ Tip (Bin _ _ Tip Tip)) Tip) See also the
This is true. However,
I'm not saying that space reduction is useless or unfavorable. I just think that it won't produce enough time benefit to outweigh the additional complexity. In general, I think this may be worth it, solely because it saves memory even if it slows things down. I just want to put that discussion off until later, after the basic implementation is in.
I don't think this would actually help, though. The pointer tags already allow you to distinguish between
Yeah, upon further thought, I think it would be a significant, ugly project even with pattern synonyms.
I could... but that seems like a rediculous amount of complexity for basically no gain. If you truly feel passionate about this, I can just use
It's not quite an extra indirection, since the wrapper is not useless - I did have one idea here that I didn't try yet, which was to inline one level of data IntMap_ a = Empty | Singleton Key a | Branched Key a Key a (Node L a) (Node R a) This would decrease indirection by one level and make things more symmetrical. I didn't really implement this because I believe that the top level is rarely the bottleneck - the only reason it showed up in
Yeah, I'm not too worried about that. The one type I do want to get rid of is
Thank you. One of the things I really like about this design is the fact that everything is just simple |
On Thu, Sep 15, 2016 at 1:11 AM, Jonathan S notifications@github.com
Hrmm... stack space is usually pretty cheap, so I'm a bit surprised. I'll
That's fine. When I thought it could be benchmarked quickly using a couple
I personally strongly prefer the nullary-constructor datatypes over the
Apropos of not much: do you think there's some recursion-schemes-style
|
How did you like my |
In terms of efficiency, I doubt there is any difference at the moment (I haven't actually benchmarked), though the current code counts I wrote it manually for a few reasons:
This is somewhat related to the case of Basically, yes, I like your idea, but I may not implement it immediately. |
I just pushed optimized versions of
I believe I matched this behavior, but it possibly should be changed. |
This implicitly checks that the returned maps are valid, since fromList is separately checked for validity.
There isn't much speed to be gained from optimization, since the tests are just calling functions. The compile+test total time seems to be ~3.5x smaller after this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't just so this. With -O0
all the rewrite rules are skipped. We need to shift all tests that are supposed to trip rewrite rules into another module....
Ah, didn't think about that. It is a really unimportant change; I was just trying to reduce my iteration time. I'll revert. I do have a few less-effective techniques. Though is Now that I made the long-form |
|
I need to rerun full benchmarks (which take a while) to give you exact numbers. Roughly, the worst case regressions are near 50%, but most of the cases are mild improvements or break even. |
@gereeter, I'm hoping to make a release in the next week or three. Do you think you can get this beaten the rest of the way into shape? |
Is there anything left but updating benchmark numbers? |
Yeah, definitely. Sorry for dropping off the radar; I think at some point I was expecting some code review or something and didn't communicate properly while you were waiting on me and then I forgot to check in further or do random improvements.
Not that I know of? I thought I had updated benchmark numbers recently even, but I can't remember exactly when. I'll regenerate them tomorrow. #698 lists diffing Haddocks (which I did, but felt like someone else should double check), checking performance of GHC (which we never found the root cause of, but since the overall performance seemed improved even if there were more allocations, it didn't seem bad; also, I never figured out how to check that myself), and trying to improve Documentation can always be better, and maybe with a fresh look I can write some better and more extensive explanations, but I don't think it should be a blocker. |
I also apologize for dropping the ball here! I finally asked for advice on measuring GHC's performance and received good tips, particularly from Ben Gamari and Andreas Klebinger. I haven't actually done much benchmarking yet though. One difficulty here is that to take reliable timings you need a quiet machine, and my desktop usually isn't quiet. Allocations are somewhat easier to measure reliably, put so far they don't indicate any improvements due to this patch. In order to somewhat cancel out the noise when taking in timings, I've given Before:
This branch:
So we see a small regression in compile times, although I can't confidently say by how much. So while it would be good to have more data on the performance implications for GHC, I'm currently skeptical that we'll see GHC benefit from this change. IMHO we should finally invite other people to review this patch and to give this branch a try in their applications. Maybe we just haven't found the applications that will profit from this yet. |
BTW if you want to try your own hand at benchmarking GHC, you can use my branch: https://gitlab.haskell.org/sjakobi/ghc/-/tree/sjakobi-new-intmap If you need any help, I may be able to answer questions, but #ghc on freenode will be much more competent. This should be a good start regarding building GHC: https://ghc.dev/ |
After all the work that went into this rewrite I feel a bit bad that it hasn't been merged simply because I'm not fully convinced that the performance changes will pay off in practice. How about advertising this code a bit e.g. on the libraries list or r/haskell?! Publishing it in a separate library would help make it more accessible of course… |
For reference: @doyougnu has done an interesting performance analysis on how this branch would affect GHC. See https://gitlab.haskell.org/ghc/ghc/-/issues/19820#note_364086. |
One thing I didn't dig into is the root cause behind |
Very interesting and surprising....
…On Mon, Jul 26, 2021, 1:05 PM doyougnu ***@***.***> wrote:
For reference: @doyougnu <https://github.com/doyougnu> has done an
interesting performance analysis on how this branch would affect GHC. See
https://gitlab.haskell.org/ghc/ghc/-/issues/19820#note_364086.
One thing I didn't dig into is the root cause behind Demand Analysis
being universally worse across the packages I tested with. If it wasn't for
that effect these IntMaps would be a pretty clear win.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#340 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOOF7OYHS42LFV2VTEVXF3TZWIUVANCNFSM4CPNCJKA>
.
|
I'd welcome that. If there's a trade-off between these implementations (is it the following: efficient single-element operations vs. efficient bulk operations?) then, as a user, I'd like to have that choice. |
What is the status on this branch, is there any work we can do to get it towards merge status? Are there functions left to optimize? |
IMHO the best thing that can be done for this new The main reasons why I don't think that this work should be merged (yet) are:
|
I still sort of think of this as a work in progress, but I've been sitting for over 2 years now on this code, and all the tests pass with significant improvements on many benchmarks. The general approach is detailed at gereeter/bounded-intmap - the README is a bit outdated and I really should incorporate it into comments, but it is the best reference for how the structure works. As for compatibility, I think I'm missing some instances, some Safe Haskell stuff, and anything required to get things working on compilers that aren't the latest GHC.
If this is a large enough change, I wouldn't mind sending mail to the libraries mailing list, but it should be entirely backwards compatible.
cc @ekmett - This is a (heavily modified) version of an idea you came up with.
Memory usage
The old implementation had
6n-5
words of overhead (i.e. discounting storage of the keys and the pointers to values). The new implementation has only3n-1
words of overhead.Large runtime regressions
fromAscList
andfromDistinctAscList
are currently just aliases forfromList
foldlWithKey'
keysSet
,fromSet
,restrictKeys
, andwithoutKeys
are currently implemented naively and are probably much slowerBenchmarks
Merges