Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] ContiguousArray & ArraySlice: Stop swapping self in wUMBP #38898

Merged
merged 1 commit into from
Aug 17, 2021

Conversation

lorentey
Copy link
Member

Implements #38867 for ContiguousArray and ArraySlice -- these have the same unnecessary swapping logic.

Implements #38867 for ContiguousArray and ArraySlice -- these have the same unnecessary swapping logic.
@lorentey
Copy link
Member Author

@swift-ci test

@lorentey
Copy link
Member Author

@swift-ci benchmark

@swift-ci
Copy link
Contributor

Performance (x86_64): -O

Regression OLD NEW DELTA RATIO
DictionaryOfAnyHashableStrings_insert 2772 5026 +81.3% 0.55x
Set.isDisjoint.Box25 319 454 +42.3% 0.70x (?)
Set.isDisjoint.Int25 235 307 +30.6% 0.77x (?)
StringRemoveDupes 234 256 +9.4% 0.91x (?)
ArrayInClass 1665 1820 +9.3% 0.91x
DistinctClassFieldAccesses 343 374 +9.0% 0.92x
Array2D 6480 7008 +8.1% 0.92x
ArrayPlusEqualFiveElementCollection 7474 8066 +7.9% 0.93x (?)
Set.subtracting.Empty.Box 13 14 +7.7% 0.93x (?)
XorLoop 1688 1816 +7.6% 0.93x (?)
RandomTree.insert.Unmanaged.fast 198 213 +7.6% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
FlattenListFlatMap 5689 3676 -35.4% 1.55x (?)
Breadcrumbs.MutatedIdxToUTF16.ASCII 4 3 -25.0% 1.33x
FlattenListLoop 4524 4132 -8.7% 1.09x (?)
LessSubstringSubstring 38 35 -7.9% 1.09x
EqualStringSubstring 38 35 -7.9% 1.09x (?)
EqualSubstringSubstringGenericEquatable 38 35 -7.9% 1.09x (?)
EqualSubstringString 38 35 -7.9% 1.09x (?)
LessSubstringSubstringGenericComparable 38 35 -7.9% 1.09x
SortStringsUnicode 2785 2590 -7.0% 1.08x

Code size: -O

Performance (x86_64): -Osize

Regression OLD NEW DELTA RATIO
ObjectiveCBridgeStubFromNSDate 5630 6550 +16.3% 0.86x (?)
DistinctClassFieldAccesses 330 361 +9.4% 0.91x
ArrayInClass 1645 1795 +9.1% 0.92x
ArrayPlusEqualSingleElementCollection 1598 1739 +8.8% 0.92x (?)
Set.isDisjoint.Box.Empty 138 150 +8.7% 0.92x (?)
Array2D 6224 6736 +8.2% 0.92x (?)
XorLoop 1560 1688 +8.2% 0.92x (?)
ArrayAppendReserved 1230 1330 +8.1% 0.92x (?)
ArrayPlusEqualFiveElementCollection 6845 7400 +8.1% 0.93x (?)
ArrayAppend 1400 1510 +7.9% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
Breadcrumbs.MutatedUTF16ToIdx.ASCII 4 3 -25.0% 1.33x
Breadcrumbs.MutatedIdxToUTF16.ASCII 4 3 -25.0% 1.33x
LessSubstringSubstring 39 35 -10.3% 1.11x
FlattenListLoop 4288 3889 -9.3% 1.10x (?)
EqualSubstringSubstring 38 35 -7.9% 1.09x
EqualStringSubstring 38 35 -7.9% 1.09x (?)
EqualSubstringSubstringGenericEquatable 38 35 -7.9% 1.09x
EqualSubstringString 38 35 -7.9% 1.09x
LessSubstringSubstringGenericComparable 38 35 -7.9% 1.09x
SortSortedStrings 68 63 -7.4% 1.08x (?)
SortStringsUnicode 2815 2615 -7.1% 1.08x (?)
NSStringConversion.Rebridge.Long 162 151 -6.8% 1.07x (?)

Code size: -Osize

Performance (x86_64): -Onone

Regression OLD NEW DELTA RATIO
DataToStringSmall 3650 4350 +19.2% 0.84x (?)
String.data.Empty 67 78 +16.4% 0.86x (?)
ArrayInClass 4460 4910 +10.1% 0.91x
ErrorHandling 3480 3750 +7.8% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
ObjectiveCBridgeStubFromArrayOfNSString2 2700 2480 -8.1% 1.09x (?)
DataReplaceSmallBuffer 3900 3600 -7.7% 1.08x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 8-Core Intel Xeon E5
  Processor Speed: 3 GHz
  Number of Processors: 1
  Total Number of Cores: 8
  L2 Cache (per Core): 256 KB
  L3 Cache: 25 MB
  Memory: 64 GB

lorentey added a commit to lorentey/swift-collections that referenced this pull request Aug 17, 2021
This reintroduces retain/release operations for the empty array until swiftlang/swift#38898 lands.

Mitigate the performance costs of this by refactoring code to reduce the size of the inlined bubbleUpMin/Max invocations.
lorentey added a commit to lorentey/swift-collections that referenced this pull request Aug 17, 2021
This reintroduces retain/release operations for the empty array until swiftlang/swift#38898 lands.

Mitigate the performance costs of this by refactoring code to reduce the size of the inlined bubbleUpMin/Max invocations.
Copy link
Contributor

@glessard glessard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fewer instructions to do the same thing. LGTM

@lorentey
Copy link
Member Author

Surprisingly, swapping self with [] has a measurable effect sometimes: in the case of apple/swift-collections#78, the compiler isn't able to eliminate retains/releases of the empty array singleton in Heap.insert benchmarks, which makes the withUnsafeMutableBufferPointer variant slower than the regular Array one!

@lorentey lorentey merged commit 8b8fdfa into main Aug 17, 2021
@lorentey lorentey deleted the no-swaps-the-sequel branch August 17, 2021 04:04
lorentey added a commit to apple/swift-collections that referenced this pull request Aug 17, 2021
* [Heap] Don't always inline large functions

* [docs] Use a stable link to the Atkinson article

(The original link was pointing to course materials at a random university.)

* Remove stray import

* Revert "[Heap] Don't always inline large functions"

This reverts commit 07ffd65.

* [Heap] Enable code coverage collection in Xcode scheme

* [Heap] Speed up invariant checking

* [Heap] Precalculate levels for each offset

`_Node` is a new struct that consists of a storage offset (the old index) along with its level in the tree. The level can be incrementally calculated, saving some time vs counting bits whenever it's needed.

* [Heap] Switch to using unsafe buffer pointers

Introduce `Heap._UnsafeHandle` (a thin wrapper around an unsafe buffer pointer) and rebase most heap algorithms on top of that instead of array operations.

This simplifies things by reducing (hopefully) unnecessary index validation, resulting in some measurable performance improvements.

* [Heap] Stop force-inlining bubbleUp; mark it releasenone

Not inlining such a large function speeds things up by leaving some headroom for the optimizer to make better inlining decisions elsewhere. (Force inlining this resulted in the compiler not inlining the closure passed to `_update` instead, which isn't a great tradeoff.

To speed things up, mark `bubbleUp` with `@_effects(releasenone)`. This may be questionable (because it calls `Element.<`), but it results in better codegen, making `insert` match the performance of `std::priority_queue`.

* [Heap] Rework removals

* [Heap] Remove dead code

* [Heap] Switch to using _ContiguousArray as storage

* [Heap] insert<S>(contentsOf:): add fast path for count == 0 case

* [Heap] Perf pass on trickleDown code paths

This improves popMin/popMax (and the sequence initializer) by reviewing trickleDown and optimizing things:

- Replace swapAt with a scheme where we keep a hold in the storage buffer
- Slightly shorten min/max dependency chain in primary sink loop

In exchange, we get even less readable code.

* [Heap] bubbleUp: remove @_effects attribute

This reintroduces retain/release operations for the empty array until swiftlang/swift#38898 lands.

Mitigate the performance costs of this by refactoring code to reduce the size of the inlined bubbleUpMin/Max invocations.

* [Heap] Finetune Heap.insert
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants