Elaborate on asymptotics of IntMap #957

Bodigrim · 2023-07-06T22:23:04Z

Recently I had a long discussion elsewhere comparing asymptotics of IntMap to balanced binary trees. I think it's worth writing down in the documentation.

containers/src/Data/IntMap.hs

meooow25 · 2023-07-07T15:35:21Z

containers/src/Data/IntMap.hs

+-- * even for extremely unbalanced tree the depth cannot be larger than
+--   the number of elements \(n\),
+-- * each level of a Patricia tree determines at least one more bit
+--   shared by all subelements, so there could not be more
+--   than \(W\) levels.


I'm not sure this information is useful for someone looking to use an IntMap. The next paragraph might be useful however.

I find it helpful to bring a quick exposition without forcing people to look into the paper.

The way people often read $O(\min(n, W))$ is "it grows linearly until $n < W$, then remains constant", which is quite a natural reading honestly. It's important to explain that it's more like "it's normally more like $\log N$, which is capped by $W$, but in unbalanced edge cases can grow as fast as $n$, but again capped by $W$", and this is because the depth of the tree is determined by these factors.

I find it helpful to bring a quick exposition without forcing people to look into the paper.

I disagree... I think documentation should either explain well or point to something which does. The two points about the tree only raise more questions. But I'll leave it to treeowl.

It's important to explain that it's more like "it's normally more like log N, which is capped by W, but in unbalanced edge cases can grow as fast as n, but again capped by W"

This is a fair summary that could be documented.

meooow25 · 2023-07-07T15:39:22Z

containers/src/Data/IntMap.hs

+--   shared by all subelements, so there could not be more
+--   than \(W\) levels.
+--
+-- If all \(n\) keys in the tree are less than \(N\),


This is true, but can be expanded as "If all n keys in the tree are in a contiguous range of size N...", since it is not about the absolute value.

But this is also not the full picture, there are other ways to get O(log N) where N is the number of potential keys. For instance if we only store elements that are k*i for i in [0..N], the statement above suggests it's O(log kN), but it's still O(log N).
The key to get O(log N) is that for every branch in the tree that divides the range, half of the potential elements should fall on either side. But this probably can't be presented nicely to the user without explaining the structure of the tree...

O(log kN) and O(log N) are the same as long as k does not grow with the growth of N.

The point is not about being precise, but to explain that in a typical scenario IntMap is roughly logarithmic.

Fair enough, I'm only seeing if the situation can be generalized to give the reader more information. But it might be good enough to address only the typical scenario of [0..N].

Bodigrim · 2023-07-11T21:56:42Z

@treeowl any opinion on this?

treeowl · 2023-07-11T22:30:26Z

I don't really think there's anything very logarithmic going on here in general. The bounds we give are conservative approximations. If we can give tighter, but still conservative, approximations, all the better. But vague "probably this good unless your data aren't what we like" doesn't seem so valuable.

Lysxia · 2023-07-14T18:56:55Z

The case where your keys are all in a small interval seems like a common enough situation that it's worth pointing out the logarithmic behavior in that case. min(n,64) doesn't actually tell you much because in practice maps contain less than 2^64 elements, so log(n) < 64. There's still an order of magnitude between a balanced tree of depth 10 and a Patricia tree of depth 64. So it's good to point out that cases where an IntMap has depth 64 are actually quite specific.

treeowl · 2023-07-14T19:07:33Z

Agreed. In many cases, we can give bounds in terms of the maximum minus the minimum, right? But that is also pretty conservative, I think, since a map with keys like [minBound .. minBound + 10000] ++ [maxBound - 10000 .. maxBound] should be relatively efficient too, if I'm not mistaken. Is there a way we can talk about tree depth that's tighter while still being tractable?

Bodigrim · 2023-07-21T19:17:20Z

Is there a way we can talk about tree depth that's tighter while still being tractable?

@treeowl I'll gladly take specific suggestions, but I'm almost over my time budget for this. I'd say that "each level of a Patricia tree determines at least one more bit shared by all subelements" is descriptive enough for an alert reader.

Bodigrim · 2024-01-28T19:02:46Z

@treeowl I rebased and extended the description, how does it look now? It's purely a documentation change, I'd love to get it decided on one way or another.

containers/src/Data/IntMap.hs

Lysxia · 2024-01-30T08:03:16Z

That looks pretty good to me!

Bodigrim · 2024-02-01T21:36:42Z

@treeowl just another reminder to take a look at the proposed documentation change.

treeowl · 2024-02-01T22:14:30Z

I'll try to have a look tonight!

Bodigrim · 2024-02-10T23:07:27Z

@treeowl one more ping.

Bodigrim · 2024-02-29T20:32:30Z

@treeowl just a gentle reminder to review this PR.

treeowl

I'm sorry it took so long to review this. I have just a couple questions below.

treeowl · 2024-03-08T02:46:51Z

containers/src/Data/IntMap.hs

+-- If all \(n\) keys in the tree are between 0 and \(N\),
+-- the estimate can be refined to \(O(\min(n, \log N))\). If the set of keys
+-- is sufficiently "dense", this becomes \(O(\min(n, \log n))\) or simply
+-- the familiar \(O(\log n)\), matching balanced binary trees.


What if there are negative keys? Can we give a similarly concise refinement in that case?

I think we can probably improve further, but we don't have to do it now.

treeowl · 2024-03-08T02:48:52Z

containers/src/Data/IntMap.hs

+-- is sufficiently "dense", this becomes \(O(\min(n, \log n))\) or simply
+-- the familiar \(O(\log n)\), matching balanced binary trees.
+--
+-- The most performant scenario for 'IntMap' are keys from a continuous subset,


Is "continuous" the right word here? Maybe "contiguous"?

Bodigrim · 2024-03-09T19:13:57Z

@treeowl thanks, updated and rebased. Good to go?

treeowl · 2024-03-09T19:35:38Z

Thanks, and thanks for your patience.

konsumlamm reviewed Jul 7, 2023

View reviewed changes

containers/src/Data/IntMap.hs Outdated Show resolved Hide resolved

containers/src/Data/IntMap.hs Outdated Show resolved Hide resolved

meooow25 reviewed Jul 7, 2023

View reviewed changes

konsumlamm reviewed Jan 29, 2024

View reviewed changes

containers/src/Data/IntMap.hs Outdated Show resolved Hide resolved

Lysxia requested a review from treeowl February 1, 2024 22:12

treeowl reviewed Mar 8, 2024

View reviewed changes

Elaborate on asymptotics of IntMap

6de94fa

treeowl approved these changes Mar 9, 2024

View reviewed changes

treeowl merged commit 8f6ef9a into haskell:master Mar 9, 2024
10 checks passed

Bodigrim deleted the intmap-docs branch March 9, 2024 19:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elaborate on asymptotics of IntMap #957

Elaborate on asymptotics of IntMap #957

Bodigrim commented Jul 6, 2023

meooow25 Jul 7, 2023

Bodigrim Jul 7, 2023

Bodigrim Jul 7, 2023 •

edited

Loading

meooow25 Jul 7, 2023

meooow25 Jul 7, 2023

Bodigrim Jul 7, 2023

meooow25 Jul 7, 2023 •

edited

Loading

Bodigrim commented Jul 11, 2023

treeowl commented Jul 11, 2023

Lysxia commented Jul 14, 2023 •

edited

Loading

treeowl commented Jul 14, 2023

Bodigrim commented Jul 21, 2023

Bodigrim commented Jan 28, 2024

Lysxia commented Jan 30, 2024

Bodigrim commented Feb 1, 2024

treeowl commented Feb 1, 2024

Bodigrim commented Feb 10, 2024

Bodigrim commented Feb 29, 2024

treeowl left a comment •

edited

Loading

treeowl Mar 8, 2024

treeowl Mar 9, 2024

treeowl Mar 8, 2024

Bodigrim commented Mar 9, 2024

treeowl commented Mar 9, 2024

Elaborate on asymptotics of IntMap #957

Elaborate on asymptotics of IntMap #957

Conversation

Bodigrim commented Jul 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bodigrim Jul 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

meooow25 Jul 7, 2023 • edited Loading

Choose a reason for hiding this comment

Bodigrim commented Jul 11, 2023

treeowl commented Jul 11, 2023

Lysxia commented Jul 14, 2023 • edited Loading

treeowl commented Jul 14, 2023

Bodigrim commented Jul 21, 2023

Bodigrim commented Jan 28, 2024

Lysxia commented Jan 30, 2024

Bodigrim commented Feb 1, 2024

treeowl commented Feb 1, 2024

Bodigrim commented Feb 10, 2024

Bodigrim commented Feb 29, 2024

treeowl left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bodigrim commented Mar 9, 2024

treeowl commented Mar 9, 2024

Bodigrim Jul 7, 2023 •

edited

Loading

meooow25 Jul 7, 2023 •

edited

Loading

Lysxia commented Jul 14, 2023 •

edited

Loading

treeowl left a comment •

edited

Loading