[stdlib] ContiguousArray & ArraySlice: Stop swapping self in wUMBP #38898

lorentey · 2021-08-16T21:27:53Z

Implements #38867 for ContiguousArray and ArraySlice -- these have the same unnecessary swapping logic.

lorentey · 2021-08-16T21:28:00Z

@swift-ci test

lorentey · 2021-08-16T21:28:09Z

@swift-ci benchmark

swift-ci · 2021-08-16T22:03:04Z

Performance (x86_64): -O

Regression	OLD	NEW	DELTA	RATIO
DictionaryOfAnyHashableStrings_insert	2772	5026	+81.3%	0.55x
Set.isDisjoint.Box25	319	454	+42.3%	0.70x (?)
Set.isDisjoint.Int25	235	307	+30.6%	0.77x (?)
StringRemoveDupes	234	256	+9.4%	0.91x (?)
ArrayInClass	1665	1820	+9.3%	0.91x
DistinctClassFieldAccesses	343	374	+9.0%	0.92x
Array2D	6480	7008	+8.1%	0.92x
ArrayPlusEqualFiveElementCollection	7474	8066	+7.9%	0.93x (?)
Set.subtracting.Empty.Box	13	14	+7.7%	0.93x (?)
XorLoop	1688	1816	+7.6%	0.93x (?)
RandomTree.insert.Unmanaged.fast	198	213	+7.6%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
FlattenListFlatMap	5689	3676	-35.4%	1.55x (?)
Breadcrumbs.MutatedIdxToUTF16.ASCII	4	3	-25.0%	1.33x
FlattenListLoop	4524	4132	-8.7%	1.09x (?)
LessSubstringSubstring	38	35	-7.9%	1.09x
EqualStringSubstring	38	35	-7.9%	1.09x (?)
EqualSubstringSubstringGenericEquatable	38	35	-7.9%	1.09x (?)
EqualSubstringString	38	35	-7.9%	1.09x (?)
LessSubstringSubstringGenericComparable	38	35	-7.9%	1.09x
SortStringsUnicode	2785	2590	-7.0%	1.08x

Code size: -O

Performance (x86_64): -Osize

Regression	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeStubFromNSDate	5630	6550	+16.3%	0.86x (?)
DistinctClassFieldAccesses	330	361	+9.4%	0.91x
ArrayInClass	1645	1795	+9.1%	0.92x
ArrayPlusEqualSingleElementCollection	1598	1739	+8.8%	0.92x (?)
Set.isDisjoint.Box.Empty	138	150	+8.7%	0.92x (?)
Array2D	6224	6736	+8.2%	0.92x (?)
XorLoop	1560	1688	+8.2%	0.92x (?)
ArrayAppendReserved	1230	1330	+8.1%	0.92x (?)
ArrayPlusEqualFiveElementCollection	6845	7400	+8.1%	0.93x (?)
ArrayAppend	1400	1510	+7.9%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
Breadcrumbs.MutatedUTF16ToIdx.ASCII	4	3	-25.0%	1.33x
Breadcrumbs.MutatedIdxToUTF16.ASCII	4	3	-25.0%	1.33x
LessSubstringSubstring	39	35	-10.3%	1.11x
FlattenListLoop	4288	3889	-9.3%	1.10x (?)
EqualSubstringSubstring	38	35	-7.9%	1.09x
EqualStringSubstring	38	35	-7.9%	1.09x (?)
EqualSubstringSubstringGenericEquatable	38	35	-7.9%	1.09x
EqualSubstringString	38	35	-7.9%	1.09x
LessSubstringSubstringGenericComparable	38	35	-7.9%	1.09x
SortSortedStrings	68	63	-7.4%	1.08x (?)
SortStringsUnicode	2815	2615	-7.1%	1.08x (?)
NSStringConversion.Rebridge.Long	162	151	-6.8%	1.07x (?)

Code size: -Osize

Performance (x86_64): -Onone

Regression	OLD	NEW	DELTA	RATIO
DataToStringSmall	3650	4350	+19.2%	0.84x (?)
String.data.Empty	67	78	+16.4%	0.86x (?)
ArrayInClass	4460	4910	+10.1%	0.91x
ErrorHandling	3480	3750	+7.8%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeStubFromArrayOfNSString2	2700	2480	-8.1%	1.09x (?)
DataReplaceSmallBuffer	3900	3600	-7.7%	1.08x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 8-Core Intel Xeon E5
  Processor Speed: 3 GHz
  Number of Processors: 1
  Total Number of Cores: 8
  L2 Cache (per Core): 256 KB
  L3 Cache: 25 MB
  Memory: 64 GB

This reintroduces retain/release operations for the empty array until swiftlang/swift#38898 lands. Mitigate the performance costs of this by refactoring code to reduce the size of the inlined bubbleUpMin/Max invocations.

glessard

Fewer instructions to do the same thing. LGTM

lorentey · 2021-08-17T04:04:25Z

Surprisingly, swapping self with [] has a measurable effect sometimes: in the case of apple/swift-collections#78, the compiler isn't able to eliminate retains/releases of the empty array singleton in Heap.insert benchmarks, which makes the withUnsafeMutableBufferPointer variant slower than the regular Array one!

* [Heap] Don't always inline large functions * [docs] Use a stable link to the Atkinson article (The original link was pointing to course materials at a random university.) * Remove stray import * Revert "[Heap] Don't always inline large functions" This reverts commit 07ffd65. * [Heap] Enable code coverage collection in Xcode scheme * [Heap] Speed up invariant checking * [Heap] Precalculate levels for each offset `_Node` is a new struct that consists of a storage offset (the old index) along with its level in the tree. The level can be incrementally calculated, saving some time vs counting bits whenever it's needed. * [Heap] Switch to using unsafe buffer pointers Introduce `Heap._UnsafeHandle` (a thin wrapper around an unsafe buffer pointer) and rebase most heap algorithms on top of that instead of array operations. This simplifies things by reducing (hopefully) unnecessary index validation, resulting in some measurable performance improvements. * [Heap] Stop force-inlining bubbleUp; mark it releasenone Not inlining such a large function speeds things up by leaving some headroom for the optimizer to make better inlining decisions elsewhere. (Force inlining this resulted in the compiler not inlining the closure passed to `_update` instead, which isn't a great tradeoff. To speed things up, mark `bubbleUp` with `@_effects(releasenone)`. This may be questionable (because it calls `Element.<`), but it results in better codegen, making `insert` match the performance of `std::priority_queue`. * [Heap] Rework removals * [Heap] Remove dead code * [Heap] Switch to using _ContiguousArray as storage * [Heap] insert<S>(contentsOf:): add fast path for count == 0 case * [Heap] Perf pass on trickleDown code paths This improves popMin/popMax (and the sequence initializer) by reviewing trickleDown and optimizing things: - Replace swapAt with a scheme where we keep a hold in the storage buffer - Slightly shorten min/max dependency chain in primary sink loop In exchange, we get even less readable code. * [Heap] bubbleUp: remove @_effects attribute This reintroduces retain/release operations for the empty array until swiftlang/swift#38898 lands. Mitigate the performance costs of this by refactoring code to reduce the size of the inlined bubbleUpMin/Max invocations. * [Heap] Finetune Heap.insert

[stdlib] ContiguousArray & ArraySlice: Stop swapping self in wUMBP

48fa06b

Implements #38867 for ContiguousArray and ArraySlice -- these have the same unnecessary swapping logic.

glessard approved these changes Aug 17, 2021

View reviewed changes

lorentey merged commit 8b8fdfa into main Aug 17, 2021

lorentey deleted the no-swaps-the-sequel branch August 17, 2021 04:04

lorentey mentioned this pull request Aug 17, 2021

[Heap] Performance tweaks apple/swift-collections#78

Merged

7 tasks

lorentey mentioned this pull request Aug 17, 2021

[Heap] Express heap operations on UnsafeMutableBufferPointer apple/swift-collections#75

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[stdlib] ContiguousArray & ArraySlice: Stop swapping self in wUMBP #38898

[stdlib] ContiguousArray & ArraySlice: Stop swapping self in wUMBP #38898

lorentey commented Aug 16, 2021

lorentey commented Aug 16, 2021

lorentey commented Aug 16, 2021

swift-ci commented Aug 16, 2021

glessard left a comment

lorentey commented Aug 17, 2021

[stdlib] ContiguousArray & ArraySlice: Stop swapping self in wUMBP #38898

[stdlib] ContiguousArray & ArraySlice: Stop swapping self in wUMBP #38898

Conversation

lorentey commented Aug 16, 2021

lorentey commented Aug 16, 2021

lorentey commented Aug 16, 2021

swift-ci commented Aug 16, 2021

Performance (x86_64): -O

Code size: -O

Performance (x86_64): -Osize

Code size: -Osize

Performance (x86_64): -Onone

Code size: -swiftlibs

glessard left a comment

Choose a reason for hiding this comment

lorentey commented Aug 17, 2021