Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Use Bitmapped Vector Tries for ImmutableList #25775

Closed
stevenguh opened this issue Apr 5, 2018 · 11 comments
Closed

[Proposal] Use Bitmapped Vector Tries for ImmutableList #25775

stevenguh opened this issue Apr 5, 2018 · 11 comments
Labels
area-System.Collections enhancement Product code improvement that does NOT require public API changes/additions
Milestone

Comments

@stevenguh
Copy link

I'm wondering why aren't we using bitmapped Vector Tries for ImmutableList? Forgive me if this question has been asked.

As far as I know, this data structure is used in Closure's PersistentVector, and ImmutableJS's List. I have a preliminary implementation and it shows that it's faster than the current implementation of the current ImmutableList. For getting sequential or random elements in the list, the trie is about 4 times faster, and for adding elements, it about 3 times faster. Of course, these numbers are the preliminary findings.

For those who don't know what Bitmapped Vector Tries is:
Understanding Clojure's Persistent Vectors
Immutable.js

@danmoseley
Copy link
Member

Interesting. @AArnott are you familiar with these?

@AArnott
Copy link
Contributor

AArnott commented Apr 5, 2018

I'm not familiar with the proposed data structure myself, but we've heard feedback around trie's being significantly better performers than AVL trees a few times. https://github.com/dotnet/corefx/issues/15413 tracks one such discussion. I can't find it now, but there was another discussion where someone had implemented and given a lot of metrics about their rewrite that looks really promising.
One area that I'm always interested in is GC pressure in mutating scenarios. The AVL tree we have is tuned to minimize GC pressure. And in particular our Builders have absolute 0 GC pressure by reusing all objects. I no virtually nothing about these trie data structures so I can't comment whether something similar can be accomplished with them. Given their perf wins, I certainly hope they are also good in these other areas or can be sufficiently good that the added perf makes up the rest of it.

@stevenguh
Copy link
Author

The bitmapped array trie can have a builder too (I am thinking the use of transient). It doesn't have 0 GC pressure, but it reuses the objects when it can while maintaining the data structure's integrity when using transient.

The sets and gets on the trie will be around O(log32 N), and O(1) for push and pop operation.

I haven't done an extensive profiling yet, but I have a feeling the trie will have faster access time for most cases, and slightly higher GC.

@svick
Copy link
Contributor

svick commented Apr 6, 2018

@stevenguh Do I understand it correctly that with that implementation, list.RemoveAt(0) would effectively have to reallocate the whole tree (in other words, it would allocate O(n) memory)? That doesn't seem acceptable to me, considering that, AFAICT, the current implementation would allocate only O(log n) memory for that operation.

@sharwell
Copy link
Member

sharwell commented Apr 6, 2018

See #14477 for an alternate proposal, and tunnelvisionlabs/dotnet-trees#58 for the implementation basis. The baseline implementation is not yet optimized; performance (time) is roughly equivalent to the current AVL trees but memory overhead is dramatically reduced. Aside from avoiding the large object heap, AVL trees are extremely memory-inefficient for these collections.

@stevenguh
Copy link
Author

@svick Yes, you are correct that insert/remove in general will re-index and re-reallocate the tree. However, there can be some tricks for shifting.
Maybe this structure should not be replacing the current ImmutableList implementation, but instead it should be a separate structure like tunnelvisionlabs/dotnet-trees#58.

@AArnott
Copy link
Contributor

AArnott commented Apr 6, 2018

Yes, you are correct that insert/remove in general will re-index and re-reallocate the tree.

That may be a non-starter then. Incremental updates of the immutable structures are O(log n) right now in CPU and GC pressure. Changing that to O(n) would likely have significant negative impact on scenarios already shipping that depend on this.

@danmoseley
Copy link
Member

@AArnott "Changing that to O(log n)" did you mean "O(n)" ?

@AArnott
Copy link
Contributor

AArnott commented Apr 6, 2018

Yes, thanks. Corrected above.

@stevenguh
Copy link
Author

Yeah, I think this data structure cannot replace the existing implementation because of its poor insert and remove. I have an implementation here that provide an API compatible alternative to ImmutableList in case anyone need that extra performance boost and if the penalty for Insert and Remove is not important.

Do you guys think this issue should be close?

@AArnott
Copy link
Contributor

AArnott commented May 1, 2018

Yes, thanks.

@AArnott AArnott closed this as completed May 1, 2018
@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Collections enhancement Product code improvement that does NOT require public API changes/additions
Projects
None yet
Development

No branches or pull requests

6 participants