-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elaborate on asymptotics of IntMap #957
Conversation
containers/src/Data/IntMap.hs
Outdated
-- * even for extremely unbalanced tree the depth cannot be larger than | ||
-- the number of elements \(n\), | ||
-- * each level of a Patricia tree determines at least one more bit | ||
-- shared by all subelements, so there could not be more | ||
-- than \(W\) levels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this information is useful for someone looking to use an IntMap. The next paragraph might be useful however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it helpful to bring a quick exposition without forcing people to look into the paper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way people often read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it helpful to bring a quick exposition without forcing people to look into the paper.
I disagree... I think documentation should either explain well or point to something which does. The two points about the tree only raise more questions. But I'll leave it to treeowl.
It's important to explain that it's more like "it's normally more like log N, which is capped by W, but in unbalanced edge cases can grow as fast as n, but again capped by W"
This is a fair summary that could be documented.
containers/src/Data/IntMap.hs
Outdated
-- shared by all subelements, so there could not be more | ||
-- than \(W\) levels. | ||
-- | ||
-- If all \(n\) keys in the tree are less than \(N\), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true, but can be expanded as "If all n keys in the tree are in a contiguous range of size N...", since it is not about the absolute value.
But this is also not the full picture, there are other ways to get O(log N) where N is the number of potential keys. For instance if we only store elements that are k*i for i in [0..N], the statement above suggests it's O(log kN), but it's still O(log N).
The key to get O(log N) is that for every branch in the tree that divides the range, half of the potential elements should fall on either side. But this probably can't be presented nicely to the user without explaining the structure of the tree...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O(log kN) and O(log N) are the same as long as k does not grow with the growth of N.
The point is not about being precise, but to explain that in a typical scenario IntMap
is roughly logarithmic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I'm only seeing if the situation can be generalized to give the reader more information. But it might be good enough to address only the typical scenario of [0..N].
@treeowl any opinion on this? |
I don't really think there's anything very logarithmic going on here in general. The bounds we give are conservative approximations. If we can give tighter, but still conservative, approximations, all the better. But vague "probably this good unless your data aren't what we like" doesn't seem so valuable. |
The case where your keys are all in a small interval seems like a common enough situation that it's worth pointing out the logarithmic behavior in that case. min(n,64) doesn't actually tell you much because in practice maps contain less than 2^64 elements, so log(n) < 64. There's still an order of magnitude between a balanced tree of depth 10 and a Patricia tree of depth 64. So it's good to point out that cases where an |
Agreed. In many cases, we can give bounds in terms of the maximum minus the minimum, right? But that is also pretty conservative, I think, since a map with keys like |
@treeowl I'll gladly take specific suggestions, but I'm almost over my time budget for this. I'd say that "each level of a Patricia tree determines at least one more bit shared by all subelements" is descriptive enough for an alert reader. |
@treeowl I rebased and extended the description, how does it look now? It's purely a documentation change, I'd love to get it decided on one way or another. |
That looks pretty good to me! |
@treeowl just another reminder to take a look at the proposed documentation change. |
I'll try to have a look tonight! |
@treeowl one more ping. |
@treeowl just a gentle reminder to review this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry it took so long to review this. I have just a couple questions below.
-- If all \(n\) keys in the tree are between 0 and \(N\), | ||
-- the estimate can be refined to \(O(\min(n, \log N))\). If the set of keys | ||
-- is sufficiently "dense", this becomes \(O(\min(n, \log n))\) or simply | ||
-- the familiar \(O(\log n)\), matching balanced binary trees. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if there are negative keys? Can we give a similarly concise refinement in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can probably improve further, but we don't have to do it now.
containers/src/Data/IntMap.hs
Outdated
-- is sufficiently "dense", this becomes \(O(\min(n, \log n))\) or simply | ||
-- the familiar \(O(\log n)\), matching balanced binary trees. | ||
-- | ||
-- The most performant scenario for 'IntMap' are keys from a continuous subset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "continuous" the right word here? Maybe "contiguous"?
@treeowl thanks, updated and rebased. Good to go? |
Thanks, and thanks for your patience. |
Recently I had a long discussion elsewhere comparing asymptotics of
IntMap
to balanced binary trees. I think it's worth writing down in the documentation.