Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try out 2-3-4 tree #32

Closed
6 of 8 tasks
dadhi opened this issue Jun 9, 2020 · 4 comments · Fixed by #33
Closed
6 of 8 tasks

Try out 2-3-4 tree #32

dadhi opened this issue Jun 9, 2020 · 4 comments · Fixed by #33
Assignees
Milestone

Comments

@dadhi
Copy link
Owner

dadhi commented Jun 9, 2020

todo

  • proof of concept and tests
  • add Leaf 4, 5
  • add right-leaning branch3 and fast lookups
  • polish and improve the perf plus benchmarks
  • add Enumerate
  • add Soft-delete in branches and hard delete in leafs - no need for rebalancing
  • Add under Experimental.ImMap234 name
  • Add bucketed map implementation

why

2-3-4 tree is a self-balanced tree with the much simplier and straightforward balancing comparing to AVL rotations (IMHO).

The plain 2, 3 leaf nodes and 3 branch node compared to AVL binary only branches promises the menory savings.

In addition I may apply some of the technics I did in AVL Experimental.ImMap for further savings and speedup:

  1. Prevent unnecessary temp node creation and construct the final structure. Compared to AVL, t234 has only split-3-leaf/branch case to consider.

  2. Flatten the lower level structure to save space, e.g. Branch2 with the Leaf branches can flattened to the special 4, 5, 6, 7 leafs nodes. We can start with 4 and 5 leaf nodes because the addition won't split them, so we keep current split approach intact.
    Moreover we can hold a single shape for 4 leafs - right-leaning, and hold the centralized shape for 5 leafs. This way the nodes will be locally rebalanced minimizing the braching split!

Here is the current wip tree structure, then I will describe the adjustments for the point 2.

class ImMap
{ 
    static ImMap Empty = new ImMap();
    class Entry : ImMap {}
    class Leaf2 : ImMap {}
    class Leaf3 : Leaf2 {}
    class Branch2 : ImMap {}
    class Branch3 : Branch2 {}
}

Adjustments:

class ImMap
{ 
    static ImMap Empty = new ImMap();
    class Entry : ImMap {}
    class Leaf2 : ImMap {}
    // maybe not need for virtual Leaf lookup *
    class Leaf3 : Leaf2 {} 

    // don't inherit Leaf3 because it is checked for split
    class Branch2Leaf4 : ImMap {}
    class Branch2Leaf5 : Branch2Leaf4 {}

    class Branch2 : ImMap {}
    class Branch3 : Branch2 {}
}

* - Regarding Lookup we may combine iteratition for branches and virtual call (polymorphism) for Leafs and BranchLeafs. But it should be proved by benchmark. So the Leaf3 : Leaf2 won't be needed to speedup lookup, likely it would be faster if the inheritance for Leaf3 is cut to ImMap.

dadhi added a commit that referenced this issue Jun 11, 2020
@dadhi
Copy link
Owner Author

dadhi commented Jun 17, 2020

Let's see how to construct a tree from 1 to 20:

[1 2 3] -> [1 2 3 4 5] ->   
 
         [3]               
[1 2]        [4 5 6 7 8]
->
         [3 6]
[1 2]  [4 5] [7 8 9 10 11]
->
             [6]
       [3]           [9]
[1 2]  [4 5]  [7 8]   [10 11 12 13 14]
->
             [6]
       [3]             [9, 12]
[1 2]  [4 5]  [7 8]   [10 11]  [13 14 15 16 17]

->
                      [6, 12]
       [3]               [9]                       [15]
[1 2]  [4 5]  [7 8]   [10 11]  [13 14] [16 17 18 19 20]

@dadhi dadhi closed this as completed Jun 17, 2020
@dadhi dadhi reopened this Jun 17, 2020
@dadhi
Copy link
Owner Author

dadhi commented Jun 18, 2020

Another idea is similar to where Right Leaning Red-Black Trees are proposing - replace Branch3 with two Branch2s.

There are two approaches to it:

  1. Always use right leaning Branch2 -> Branch2
  2. Use Left and Right leaning branches depending on ditection of split.

Pros:

  1. Lookup is faster and simplier:
    For both 1st and 2nd approach the Lookup may be simplified because now we have only Branch2 and Leafs - no need for addition cast / checks on each iteration.
  2. Approach 2 only - Addition may be simplified so there is no need in destructive variant for Leaf3 and Branch3.

Cons:

  1. The memory consumed by Branch3 is one pointer more.

The memory increase in this approach may be compensated a bit by Leaf4 and 5 approach described above.

@dadhi
Copy link
Owner Author

dadhi commented Jun 22, 2020

Another idea is similar to where Right Leaning Red-Black Trees are proposing - replace Branch3 with two Branch2s

Update

It did not work at the end. Complexity sky-rocketed with still even slower Lookups and slower Addition.

So I am back to virtual recursive Lookup and simple flat Brakch3.

New ideas how to improve it further:

  1. [no effect] Remove Branch3 : Branch2 inheritance.
    It should remove call to the base constructor and speed-up casts a bit.
  2. [no or negative effect] Try out making the base ImMap abstract and seal all descentant plus add ImMapEmpty type. Bencmark the addition of single element with reference equality to Empty replaced by the type check.~
  3. [significant effect - memory and the perf] Split Leaf5 so to keep the leaf with newly added item smaller than the other one. It should speedup further addition to the same side by less copying when creating a smaller leafs and with postponing the next split.
  4. Try to add a distinct nodes representing the branch with leafs. This may be in conflict with the point 3 because we want to minimize number of shapes / types of such nodes. The benefit is a less casting when adding to the leaf branches and moreover we may devirtualiaze and inline the Lookup for the leafs. The possible types are: Leaf2...Leaf3, Leaf2...Leaf4, Leaf2...Leaf5, etc.
    This also can help with 6.
  5. Can help with 4 or be a separate point. Rebalance branch with leafs on addition when adding to the internal edge of biggest leaf - make the newly added item a new branch entry and add an old entry to the smaller leaf, those making less copying and tree is more normally distributed.
  6. Spare some memory by adding keys directly to the branches and speedup the key comparison so no need to reach for the Entry.Key.
  7. Inline the addition to Leaf5 in Branch3.
  8. Make AddOrUpdateSplit a static call.

@dadhi dadhi self-assigned this Jun 22, 2020
@dadhi
Copy link
Owner Author

dadhi commented Jun 25, 2020

The results

Addition

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-8750H CPU 2.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.301
  [Host]     : .NET Core 3.1.5 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.27001), X64 RyuJIT
  DefaultJob : .NET Core 3.1.5 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.27001), X64 RyuJIT


|                                 Method | Count |            Mean |         Error |        StdDev | Ratio | RatioSD |    Gen 0 |    Gen 1 |    Gen 2 | Allocated |
|--------------------------------------- |------ |----------------:|--------------:|--------------:|------:|--------:|---------:|---------:|---------:|----------:|
|         Experimental_ImMap_AddOrUpdate |     1 |        19.46 ns |      0.093 ns |      0.087 ns |  1.00 |    0.00 |   0.0068 |        - |        - |      32 B |
|    Experimental_ImMapSlots_AddOrUpdate |     1 |       121.06 ns |      1.529 ns |      1.277 ns |  6.22 |    0.05 |   0.0663 |        - |        - |     312 B |
|      Experimental_ImMap234_AddOrUpdate |     1 |        19.43 ns |      0.074 ns |      0.069 ns |  1.00 |    0.01 |   0.0068 |        - |        - |      32 B |
| Experimental_ImMap234Slots_AddOrUpdate |     1 |       117.34 ns |      0.665 ns |      0.622 ns |  6.03 |    0.04 |   0.0663 |        - |        - |     312 B |
|                  ConcurrentDict_TryAdd |     1 |       175.07 ns |      0.980 ns |      0.917 ns |  9.00 |    0.04 |   0.1853 |   0.0012 |        - |     872 B |
|              ImmutableDict_Builder_Add |     1 |       177.48 ns |      0.879 ns |      0.779 ns |  9.12 |    0.07 |   0.0339 |        - |        - |     160 B |
|                                        |       |                 |               |               |       |         |          |          |          |           |
|         Experimental_ImMap_AddOrUpdate |    10 |       457.21 ns |      1.403 ns |      1.313 ns |  1.00 |    0.00 |   0.2651 |   0.0005 |        - |    1248 B |
|    Experimental_ImMapSlots_AddOrUpdate |    10 |       356.58 ns |      0.910 ns |      0.851 ns |  0.78 |    0.00 |   0.1273 |   0.0005 |        - |     600 B |
|      Experimental_ImMap234_AddOrUpdate |    10 |       404.97 ns |      1.185 ns |      1.108 ns |  0.89 |    0.00 |   0.2122 |   0.0005 |        - |    1000 B |
| Experimental_ImMap234Slots_AddOrUpdate |    10 |       343.20 ns |      1.229 ns |      1.149 ns |  0.75 |    0.00 |   0.1273 |   0.0005 |        - |     600 B |
|                  ConcurrentDict_TryAdd |    10 |       703.77 ns |      1.315 ns |      1.165 ns |  1.54 |    0.00 |   0.2613 |   0.0019 |        - |    1232 B |
|              ImmutableDict_Builder_Add |    10 |     2,327.55 ns |      5.936 ns |      5.552 ns |  5.09 |    0.02 |   0.1564 |        - |        - |     736 B |
|                                        |       |                 |               |               |       |         |          |          |          |           |
|         Experimental_ImMap_AddOrUpdate |   100 |    10,180.90 ns |     57.651 ns |     53.927 ns |  1.00 |    0.00 |   6.4545 |   0.2899 |        - |   30432 B |
|    Experimental_ImMapSlots_AddOrUpdate |   100 |     3,970.42 ns |     12.280 ns |     10.886 ns |  0.39 |    0.00 |   1.9608 |   0.1144 |        - |    9240 B |
|      Experimental_ImMap234_AddOrUpdate |   100 |    10,143.91 ns |     45.716 ns |     40.526 ns |  1.00 |    0.01 |   5.4779 |   0.2289 |        - |   25800 B |
| Experimental_ImMap234Slots_AddOrUpdate |   100 |     3,937.81 ns |     17.458 ns |     16.330 ns |  0.39 |    0.00 |   1.8768 |   0.0992 |        - |    8856 B |
|                  ConcurrentDict_TryAdd |   100 |    12,139.06 ns |     45.494 ns |     42.555 ns |  1.19 |    0.01 |   4.8370 |   0.4730 |        - |   22768 B |
|              ImmutableDict_Builder_Add |   100 |    35,251.17 ns |    113.558 ns |    106.223 ns |  3.46 |    0.02 |   1.9531 |   0.1221 |        - |    9376 B |
|                                        |       |                 |               |               |       |         |          |          |          |           |
|         Experimental_ImMap_AddOrUpdate |  1000 |   178,430.30 ns |    638.426 ns |    565.948 ns |  1.00 |    0.00 |  98.1445 |   0.2441 |        - |  462624 B |
|    Experimental_ImMapSlots_AddOrUpdate |  1000 |   102,538.52 ns |    480.403 ns |    425.865 ns |  0.57 |    0.00 |  47.8516 |   0.1221 |        - |  225496 B |
|      Experimental_ImMap234_AddOrUpdate |  1000 |   180,436.96 ns |  1,130.151 ns |  1,057.144 ns |  1.01 |    0.01 |  87.8906 |   1.4648 |        - |  414480 B |
| Experimental_ImMap234Slots_AddOrUpdate |  1000 |    84,713.35 ns |    337.963 ns |    299.596 ns |  0.47 |    0.00 |  40.4053 |   0.1221 |        - |  190232 B |
|                  ConcurrentDict_TryAdd |  1000 |   124,306.71 ns |  1,671.098 ns |  1,563.146 ns |  0.70 |    0.01 |  43.2129 |   0.9766 |        - |  205328 B |
|              ImmutableDict_Builder_Add |  1000 |   474,727.06 ns |    991.683 ns |    879.101 ns |  2.66 |    0.01 |  20.0195 |   0.4883 |        - |   95780 B |
|                                        |       |                 |               |               |       |         |          |          |          |           |
|         Experimental_ImMap_AddOrUpdate | 10000 | 4,418,104.64 ns | 27,677.872 ns | 25,889.899 ns |  1.00 |    0.00 | 992.1875 | 328.1250 | 109.3750 | 6253388 B |
|    Experimental_ImMapSlots_AddOrUpdate | 10000 | 4,291,301.34 ns | 33,092.427 ns | 29,335.574 ns |  0.97 |    0.01 | 609.3750 | 210.9375 |  70.3125 | 3856354 B |
|      Experimental_ImMap234_AddOrUpdate | 10000 | 4,549,840.78 ns | 42,349.737 ns | 39,613.971 ns |  1.03 |    0.01 | 914.0625 | 328.1250 | 140.6250 | 5739218 B |
| Experimental_ImMap234Slots_AddOrUpdate | 10000 | 3,826,270.52 ns | 27,039.210 ns | 25,292.494 ns |  0.87 |    0.01 | 531.2500 | 242.1875 |  39.0625 | 3362691 B |
|                  ConcurrentDict_TryAdd | 10000 | 2,992,046.02 ns | 25,592.476 ns | 23,939.218 ns |  0.68 |    0.01 | 273.4375 | 121.0938 |  42.9688 | 1645240 B |
|              ImmutableDict_Builder_Add | 10000 | 6,083,897.19 ns | 15,221.664 ns | 14,238.354 ns |  1.38 |    0.01 | 148.4375 |  70.3125 |        - |  959776 B |

@dadhi dadhi linked a pull request Jun 25, 2020 that will close this issue
@dadhi dadhi closed this as completed in #33 Jun 25, 2020
@dadhi dadhi added this to the v2.1.0 milestone Jun 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant