Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only use one SlotMap #312

Closed
wants to merge 1 commit into from
Closed

Only use one SlotMap #312

wants to merge 1 commit into from

Conversation

DasLixou
Copy link

@DasLixou DasLixou commented Jan 3, 2023

Objective

Change SlotMap for children and parents to SecondaryMap

Changes that will affect external library users must update RELEASES.md before they will be merged.

Context

Maybe remove lines like 204, 205 in node.rs, maybe they are useless now.

* Change `SlotMap` for `children` and `parents` to `SecondaryMap`
@alice-i-cecile alice-i-cecile added code quality Make the code cleaner or prettier. performance Layout go brr labels Jan 3, 2023
@alice-i-cecile
Copy link
Collaborator

Neat. Reading the docs for SecondaryMap suggests that this is primarily a performance optimization. I think this makes sense, but have you tested out the benchmarks before and after?

@nicoburns
Copy link
Collaborator

A SecondaryMap does not leak memory even if you never remove elements. In return, when you remove a key from the primary slot map, after any insert the space associated with the removed element may be reclaimed. Don’t expect the values associated with a removed key to stick around after an insertion has happened!

I think this is the only benefit. And I think how it works is like this:

  • Each slot in a slotmap has a generation. This is stored in the key.
  • When a value is removed from the slotmap, that slot is marked as free and can be reused.
  • When a new value is insert, the slotmap will reused free slots, incrementing the generation.
  • SecondarySlotMaps only allow you to insert data using a key from the primary SlotMap
  • When a SecondarySlotMap gets an insert using a key with the same index as a filled slot but a newer generation, then it will overwrite the existing slot rather than writing a new one.

I guess that might give a slight performance boost, as it would save the SecondarySlotMap from looking for a free slot on insert. It also means we don't have to manually remove keys from the secondary maps (only the primary one).

Seems like a win, but a small one? Although measuring would of course be a good idea.

@DasLixou
Copy link
Author

DasLixou commented Jan 4, 2023

Neat. Reading the docs for SecondaryMap suggests that this is primarily a performance optimization. I think this makes sense, but have you tested out the benchmarks before and after?

I don’t really have much experience with benchmarking 😅 is there a easy command for that?

@nicoburns
Copy link
Collaborator

is there a easy command for that?

There is: ‘cargo bench’. You should run it first on the main branch. Then checkout your branch and run it again (and ideally post the output of the second run here).

@DasLixou
Copy link
Author

DasLixou commented Jan 4, 2023

is there a easy command for that?

There is: ‘cargo bench’. You should run it first on the main branch. Then checkout your branch and run it again (and ideally post the output of the second run here).

Ok thanks I’ll do it later

@DasLixou
Copy link
Author

DasLixou commented Jan 4, 2023

(i ran the new changes first, so don't be confused with the change: in the first run)

Main branch:

Running benches/big_tree.rs (target/release/deps/big_tree-c3274880c73435c8)
yoga benchmarks/10 nodes
                        time:   [3.7467 µs 3.7663 µs 3.8068 µs]
                        change: [-12.163% -7.6785% -2.8881%] (p = 0.01 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
yoga benchmarks/100 nodes
                        time:   [61.810 µs 63.254 µs 65.479 µs]
                        change: [-3.3460% -0.6702% +2.3812%] (p = 0.67 > 0.05)
                        No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
yoga benchmarks/1_000 nodes
                        time:   [547.73 µs 549.79 µs 552.44 µs]
                        change: [-4.7231% -2.4128% +0.9725%] (p = 0.13 > 0.05)
                        No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
yoga benchmarks/10_000 nodes
                        time:   [6.4154 ms 6.6022 ms 6.7302 ms]
                        change: [-12.138% -5.8416% +0.0426%] (p = 0.11 > 0.05)
                        No change in performance detected.
Benchmarking yoga benchmarks/100_000 nodes: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 5.0s or enable flat sampling.
yoga benchmarks/100_000 nodes
                        time:   [64.583 ms 65.505 ms 67.521 ms]
                        change: [-4.6037% -1.2794% +2.7353%] (p = 0.55 > 0.05)
                        No change in performance detected.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe
Benchmarking yoga benchmarks/1_000_000 nodes: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 9.2s.
yoga benchmarks/1_000_000 nodes
                        time:   [660.67 ms 675.59 ms 693.82 ms]
                        change: [-3.1889% -0.6838% +2.7028%] (p = 0.66 > 0.05)
                        No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

big trees (wide)/10 nodes (2-level hierarchy)
                        time:   [6.7790 µs 6.8115 µs 6.8448 µs]
                        change: [-3.3728% -2.1418% -0.6639%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
big trees (wide)/100 nodes (2-level hierarchy)
                        time:   [70.430 µs 70.725 µs 71.326 µs]
                        change: [-3.4528% -1.5146% +1.0743%] (p = 0.24 > 0.05)
                        No change in performance detected.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe
big trees (wide)/1_000 nodes (2-level hierarchy)
                        time:   [736.14 µs 762.79 µs 788.81 µs]
                        change: [-13.515% -11.227% -8.7170%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
big trees (wide)/10_000 nodes (2-level hierarchy)
                        time:   [11.027 ms 11.234 ms 11.421 ms]
                        change: [+0.9311% +4.0876% +7.9068%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
big trees (wide)/100_000 nodes (2-level hierarchy)
                        time:   [268.77 ms 272.07 ms 276.25 ms]
                        change: [-2.4531% -1.1787% +0.2290%] (p = 0.14 > 0.05)
                        No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
Benchmarking big trees (wide)/100_000 nodes (7-level hierarchy): Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 6.1s or enable flat sampling.
big trees (wide)/100_000 nodes (7-level hierarchy)
                        time:   [87.510 ms 88.222 ms 89.108 ms]
                        change: [-2.5946% -0.4142% +2.0500%] (p = 0.75 > 0.05)
                        No change in performance detected.

big trees (deep)/4000 nodes (12-level hierarchy)
                        time:   [3.4314 ms 3.4438 ms 3.4597 ms]
                        change: [-4.5181% -2.6306% -1.3397%] (p = 0.01 < 0.05)
                        Performance has improved.
big trees (deep)/10_000 nodes (14-level hierarchy)
                        time:   [9.1692 ms 9.2062 ms 9.2447 ms]
                        change: [-12.316% -7.4009% -3.3635%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe
Benchmarking big trees (deep)/100_000 nodes (17-level hierarchy): Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.1s or enable flat sampling.
big trees (deep)/100_000 nodes (17-level hierarchy)
                        time:   [121.64 ms 122.19 ms 122.93 ms]
                        change: [-7.5771% -3.9943% -0.9266%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high mild
Benchmarking big trees (deep)/1_000_000 nodes (20-level hierarchy): Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 11.8s.
big trees (deep)/1_000_000 nodes (20-level hierarchy)
                        time:   [978.77 ms 982.04 ms 985.01 ms]
                        change: [-1.0994% -0.5164% -0.0249%] (p = 0.10 > 0.05)
                        No change in performance detected.

super deep trees/100 nodes (100-level hierarchy)
                        time:   [121.53 µs 124.10 µs 128.11 µs]
                        change: [-5.2969% -2.3819% +0.6303%] (p = 0.15 > 0.05)
                        No change in performance detected.
super deep trees/1_000 nodes (1000-level hierarchy)
                        time:   [1.6208 ms 1.6336 ms 1.6464 ms]
                        change: [-16.023% -12.590% -8.7120%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe

     Running benches/complex.rs (target/release/deps/complex-530763ac3f1de3fd)
deep hierarchy/build    time:   [1.4819 µs 1.4878 µs 1.4940 µs]
                        change: [-1.5044% -0.3083% +0.8159%] (p = 0.61 > 0.05)
                        No change in performance detected.
Found 17 outliers among 200 measurements (8.50%)
  7 (3.50%) high mild
  10 (5.00%) high severe
deep hierarchy/single   time:   [10.342 µs 10.377 µs 10.416 µs]
                        change: [-2.9090% -1.9813% -0.7496%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 22 outliers among 200 measurements (11.00%)
  10 (5.00%) high mild
  12 (6.00%) high severe
deep hierarchy/relayout time:   [684.84 ns 686.42 ns 688.54 ns]
                        change: [-3.0510% -2.4288% -1.8392%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 200 measurements (7.00%)
  5 (2.50%) high mild
  9 (4.50%) high severe

     Running benches/generated/mod.rs (target/release/deps/generated-13df0a1a5ff28b7e)
Benchmarking generated benchmarks: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.7s, enable flat sampling, or reduce sample count to 60.
generated benchmarks    time:   [1.1147 ms 1.1161 ms 1.1176 ms]
                        change: [-1.5664% -0.7790% -0.0485%] (p = 0.04 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe

DasLixou's PR:

     Running benches/big_tree.rs (target/release/deps/big_tree-c3274880c73435c8)
yoga benchmarks/10 nodes
                        time:   [3.8384 µs 3.9592 µs 4.2327 µs]
yoga benchmarks/100 nodes
                        time:   [62.085 µs 62.675 µs 63.992 µs]
yoga benchmarks/1_000 nodes
                        time:   [563.86 µs 566.83 µs 571.33 µs]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
yoga benchmarks/10_000 nodes
                        time:   [6.4474 ms 6.5433 ms 6.7936 ms]
yoga benchmarks/100_000 nodes
                        time:   [67.736 ms 68.666 ms 69.945 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
Benchmarking yoga benchmarks/1_000_000 nodes: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.7s.
yoga benchmarks/1_000_000 nodes
                        time:   [672.96 ms 680.24 ms 688.46 ms]

big trees (wide)/10 nodes (2-level hierarchy)
                        time:   [6.9490 µs 6.9881 µs 7.0260 µs]
big trees (wide)/100 nodes (2-level hierarchy)
                        time:   [72.536 µs 72.735 µs 73.001 µs]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
big trees (wide)/1_000 nodes (2-level hierarchy)
                        time:   [830.08 µs 839.06 µs 844.16 µs]
Found 3 outliers among 10 measurements (30.00%)
  1 (10.00%) low severe
  1 (10.00%) low mild
  1 (10.00%) high mild
big trees (wide)/10_000 nodes (2-level hierarchy)
                        time:   [10.761 ms 10.803 ms 10.871 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
big trees (wide)/100_000 nodes (2-level hierarchy)
                        time:   [274.13 ms 275.31 ms 276.51 ms]
Benchmarking big trees (wide)/100_000 nodes (7-level hierarchy): Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 5.7s or enable flat sampling.
big trees (wide)/100_000 nodes (7-level hierarchy)
                        time:   [88.638 ms 90.006 ms 92.189 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

big trees (deep)/4000 nodes (12-level hierarchy)
                        time:   [3.4972 ms 3.5592 ms 3.6727 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
Benchmarking big trees (deep)/10_000 nodes (14-level hierarchy): Analyzi
big trees (deep)/10_000 nodes (14-level hierarchy)
                        time:   [9.4407 ms 9.7421 ms 10.211 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
Benchmarking big trees (deep)/100_000 nodes (17-level hierarchy): Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.0s or enable flat sampling.
Benchmarking big trees (deep)/100_000 nodes (17-level hierarchy): Collec
big trees (deep)/100_000 nodes (17-level hierarchy)
                        time:   [122.81 ms 127.24 ms 134.27 ms]
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high mild
Benchmarking big trees (deep)/1_000_000 nodes (20-level hierarchy): Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 12.0s.
big trees (deep)/1_000_000 nodes (20-level hierarchy)
                        time:   [983.16 ms 987.13 ms 992.06 ms]
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high severe

super deep trees/100 nodes (100-level hierarchy)
                        time:   [125.18 µs 128.72 µs 131.80 µs]
super deep trees/1_000 nodes (1000-level hierarchy)
                        time:   [1.7458 ms 1.8420 ms 1.9232 ms]
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) low mild
  1 (10.00%) high mild

     Running benches/complex.rs (target/release/deps/complex-530763ac3f1de3fd)
deep hierarchy/build    time:   [1.4755 µs 1.4801 µs 1.4855 µs]
Found 34 outliers among 200 measurements (17.00%)
  11 (5.50%) high mild
  23 (11.50%) high severe
deep hierarchy/single   time:   [10.621 µs 10.645 µs 10.671 µs]
Found 25 outliers among 200 measurements (12.50%)
  10 (5.00%) high mild
  15 (7.50%) high severe
deep hierarchy/relayout time:   [702.93 ns 706.14 ns 710.11 ns]
Found 9 outliers among 200 measurements (4.50%)
  4 (2.00%) high mild
  5 (2.50%) high severe

     Running benches/generated/mod.rs (target/release/deps/generated-13df0a1a5ff28b7e)
Benchmarking generated benchmarks: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.7s, enable flat sampling, or reduce sample count to 60.
generated benchmarks    time:   [1.1147 ms 1.1167 ms 1.1190 ms]
Found 15 outliers among 100 measurements (15.00%)
  4 (4.00%) high mild
  11 (11.00%) high severe

Looks like my changes actually decrease performace: i run my change again after the benchmarks of the main branch:

Running benches/big_tree.rs (target/release/deps/big_tree-c3274880c73435c8)
yoga benchmarks/10 nodes
                        time:   [4.0729 µs 4.3312 µs 4.6612 µs]
                        change: [+7.6997% +12.603% +17.638%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
Benchmarking yoga benchmarks/100 nodes: Collecting 10 samples in estimated 5.0022 s (58k iterations)^C

@nicoburns
Copy link
Collaborator

Hmm... basically looks like performance isn't affected much at all. That benchmark is only using 10 nodes and completes in microseconds so there tends to be a bit of variance on it. This is actually what I would expect. Especially as our benchmarks currently only test layout computation, and don't benchmark tree creation at all. Whereas tree creation is the only thing I would expect this to affect (if anything).

@alice-i-cecile
Copy link
Collaborator

Hmm... basically looks like performance isn't affected much at all. That benchmark is only using 10 nodes and completes in microseconds so there tends to be a bit of variance on it. This is actually what I would expect. Especially as our benchmarks currently only test layout computation, and don't benchmark tree creation at all. Whereas tree creation is the only thing I would expect this to affect (if anything).

Precisely. We should probably have dedicated benches for that.

@alice-i-cecile alice-i-cecile added the blocked Cannot be advanced until something else changes label Jan 5, 2023
@alice-i-cecile
Copy link
Collaborator

Going to block on #322 here, or other compelling evidence that this helps :) I think this is right, but perf changes are notoriously hard to do by gut feel.

@nicoburns
Copy link
Collaborator

Cherry picking this on top of #401 confirms that this PR doesn't make much difference to perf even if we measure tree creation. So a decision on this should be made purely on the basis of code style.

@alice-i-cecile
Copy link
Collaborator

I think this is somewhat more complex; closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Cannot be advanced until something else changes code quality Make the code cleaner or prettier. performance Layout go brr
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants