Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bag improvements. #354

Merged
merged 2 commits into from
May 2, 2017
Merged

Bag improvements. #354

merged 2 commits into from
May 2, 2017

Conversation

andersio
Copy link
Member

@andersio andersio commented Apr 28, 2017

  1. Rename RemovalToken to Bag.Token. This could be source breaking, since Bag has to be parameterised.

  2. Replace the token object with a monotonic UInt64 timestamp. In other words, Bag itself no longer incurs an ARC overhead except for its ContiguousArray storage. This improves the performance in iterations and copying.

  3. Store the elements and the tokens in SoA form, improving cache locality in iterations and token searching during removals. That said best case removals seem to be affected by this change.

  4. Exploit the timestamps to accelerate average case removals via binary search.

@NachoSoto
Copy link
Member

Do you have a benchmark to show what this accomplishes?

@andersio
Copy link
Member Author

andersio commented Apr 29, 2017

Item bag-refactor master
Copying, 32768 elements 0.034 ms ~0.626 ms
Iteration, 1 element 10 ms 13 ms
Iteration, 128 elements 407 ms 794 ms
Insertion, 100000 elements 19 ms 30 ms
Fast Path Removal, 32768 elements 30 ms 20 ms
Worst Case Removal, 32768 elements 304 ms 24258 ms
Signal Delivery, 16 observers, 100000 events 119 ms 195 ms
Worse Case Observer Detaching, 128 observers ~0.7 ms >1 ms

Benchmark Source
i5 6360U, Xcode 8.3.2, Compiled with -Owholemodule.

@mdiep
Copy link
Contributor

mdiep commented May 1, 2017

Those benchmarks don't seem like real-world usage. I would expect most bags to have <10 elements and nearly all to have <100 elements. Do you agree? If so, could you re-run your benchmark with realistic sizes?


lastTimestamp = timestamp

let token = Token(timestamp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, it doesn't seem like there's any point in using a timestamp. This could just use a counter? Bag requires mutual exclusion for cross-thread use anyway.

Copy link
Member Author

@andersio andersio May 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of a timestamp here is that the monotonic clock never overflows because the processor is going to break down way way before this happens...

I avoided monotonic counter because it was the original Bag implementation ICYMI. Somehow it was worried that the monotonic counter would overflow, so a reference-type RemovalToken was born to allow "reindexing" when it overflows.

That said like the 64-bit monotonic clock, the overflow practically can never happen unless we got 100% uptime for 100+ years running a tick per clock cycle at 4 GHz, yet a typical load-inc-store is gonna take >3 clock cycles. So it is up to us to do it on our own, or rely on the platform libraries, libdispatch in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure… but if we still used a UInt64, then this would have the same domain. Using a counter would start at an earlier value. It would also only increment by 1 for each insertion—instead of skipping values in the case of a timestamp. So using a counter would be less likely to overflow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, true.

@andersio
Copy link
Member Author

andersio commented May 1, 2017

@mdiep The rewritten benchmark. It is either operations on 1 or 8 reference-counted elements/observers, running for 2 million samples. Compiled with -Owholemodule.

bag-refactor

@testAverageRemoval(): avg 9863 ns; min 8663 ns
@testCopying(): avg 1683 ns; min 1452 ns
@testDisposableDetachment(): avg 33346 ns; min 29684 ns
@testFastPathRemoval(): avg 8831 ns; min 7653 ns
@testInsertion(): avg 6118 ns; min 5299 ns
@testIteration(): avg 452 ns; min 393 ns
@testIterationOfOne(): avg 155 ns; min 122 ns
@testSignalDelivery(): avg 749 ns; min 651 ns

master

@testAverageRemoval(): avg 9476 ns; min 8332 ns
@testCopying(): avg 1315 ns; min 1145 ns
@testDisposableDetachment(): avg 27795 ns; min 24664 ns
@testFastPathRemoval(): avg 7781 ns; min 6696 ns
@testInsertion(): avg 6199 ns; min 5234 ns
@testIteration(): avg 906 ns; min 797 ns
@testIterationOfOne(): avg 189 ns; min 159 ns
@testSignalDelivery(): avg 1151 ns; min 988 ns

Iterations are no doubt faster even at small element count — this alone is enough to justify the PR IMO. Insertions are not leading significantly here, despite showing a huge gap in the original benchmark with large iteration count above.

Removals and copying are slower, and the assumption is that d6e525f leads to more work for these two operations. That said these are just one-time occurrence for every observation, and way less important than the iteration performance.

@mdiep
Copy link
Contributor

mdiep commented May 1, 2017

Without a real world benchmark, I don't know how we can evaluate the performance of those two implementations and make an informed choice.

@andersio
Copy link
Member Author

andersio commented May 1, 2017

Iteration alone is already fairly relevant, if one considers the multiplicative effect in composed signal graphs. Edit: For sure it is negligible in low-frequency paths, but then in high speed scrolling it is increasingly relevant with the amount of signals that were pulled by the collection/scroll view.

@mdiep
Copy link
Contributor

mdiep commented May 1, 2017

But these seem like very low overhead. Will having from 906ms to 452ms really make a difference when scrolling a collection/scroll view?

If not, we should consider the conciseness/maintainability of the code.

@andersio
Copy link
Member Author

andersio commented May 1, 2017

I considered it qualifying both conciseness and speediness, after removing the opaque token type and the binary search. Pretty clean and straightforward, isn't it? :)

Stepping aside from any claim in performance, the core of the PR is replacing the object token with a UInt64 token. Could we at least agree on getting rid of a bunch of retains and releases per element being a plus?

But these seem like very low overhead. Will having from 906ms to 452ms really make a difference when scrolling a collection/scroll view?

I can't say it makes a difference on iOS given the practical amount of information expected to fit in the screen real estate with a reasonable scrolling speed. But together with cuts here and there, they could stack up to a considerable portion of CPU time, say on Mac when one has a dense collection view. Moreover, it isn't too bad to save some joules, is it?


P.S. I am not sure how common one would use Signal.merge with hundreds of inner signals. But say if this example has inner signals come and go quickly in a random pattern, binary search in remove(using:) beyond a certain threshold might help.

Copy link
Contributor

@mdiep mdiep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall approach looks good now. Just a couple things need revision.

fileprivate var elements: ContiguousArray<BagElement<Element>> = []
/// A uniquely identifying token for removing a value that was inserted into a
/// Bag.
public typealias Token = UInt64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make this a struct—not just a typealias. A type alias won't prevent someone from passing a random UInt64 to remove(using:). Tokens shouldn't be publicly instantiable.

fileprivate var elements: ContiguousArray<Element> = []
fileprivate var tokens: ContiguousArray<Token> = []

private var identifier: UInt64 = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this nextToken?

@mdiep mdiep merged commit 7526ed0 into master May 2, 2017
@mdiep mdiep deleted the bag-refactor branch May 2, 2017 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants