Adds initial benchmarking for axom::Array vs. std::vector #1469

kennyweiss · 2024-11-08T06:55:39Z

Summary

This PR is progress towards Benchmark Axom Array vs C++ array and std::vector #287
(see also Benchmark axom::Array performance #922)
It adds initial benchmarking of axom::Array vs. std::vector
- Specifically, it compares construction times, performance of push_back and emplace_back and some iterator operations.
- I decided to stop here in this first attempt before the testing got too involved
This PR also updates BLT to get a bugfix related to running benchmarks (Prevent general tests from running while running benchmarks custom target blt#698) and fixes our existing gbenchmark tests for slic and slam
It also adds ENABLE_BENCHMARKS to a debug and a release config CI job and runs the benchmarks as part of the testing

Details

Here are the initial results for a Release Clang config on an LC CTS-2 cluster comparing axom::Array vs. std::vector templates on several types:

int
std::pair<int,int>
Wrapper<int> -- a struct that wraps an int
std::string

For simplicity, these results only use a single array size -- $2^{16} == 65,536$ elements, although it's easy to test other sizes and/or types.

>./tests/core_benchmark_array 

Running ./tests/core_benchmark_array
Run on (224 X 3800.68 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x112)
  L1 Instruction 32 KiB (x112)
  L2 Unified 2048 KiB (x112)
  L3 Unified 107520 KiB (x2)
Load Average: 0.00, 0.00, 0.33
---------------------------------------------------------------------------------------------------------
Benchmark                                                               Time             CPU   Iterations
---------------------------------------------------------------------------------------------------------
Array::ctor<int>/65536                                               4558 ns         4553 ns       153089
vector::ctor<int>/65536                                              4297 ns         4293 ns       163069
Array::push_back_startEmpty<int>/65536                             243490 ns       243157 ns         3180
Array::emplace_back_startEmpty<int>/65536                          247124 ns       246788 ns         2834
Array::push_back_initialReserve<int>/65536                         129280 ns       129155 ns         5419
Array::emplace_back_initialReserve<int>/65536                      136120 ns       135991 ns         5137
vector::push_back_startEmpty<int>/65536                            143477 ns       143213 ns         4886
vector::emplace_back_startEmpty<int>/65536                         143364 ns       143085 ns         4892
vector::push_back_initialReserve<int>/65536                         40123 ns        40085 ns        17463
vector::emplace_back_initialReserve<int>/65536                      40122 ns        40087 ns        17461
Array::iterate_range<int>/65536                                     31360 ns        31328 ns        22596
Array::iterate_direct<int>/65536                                    32914 ns        32881 ns        22270
vector::iterate_range<int>/65536                                    34642 ns        34611 ns        20225
vector::iterate_direct<int>/65536                                   34669 ns        34636 ns        20224
Array::ctor<std::pair<int, int>>/65536                               8809 ns         8800 ns        79545
vector::ctor<std::pair<int, int>>/65536                              8554 ns         8545 ns        81923
Array::push_back_startEmpty<std::pair<int, int>>/65536             390602 ns       389792 ns         1794
Array::emplace_back_startEmpty<std::pair<int, int>>/65536          385204 ns       384419 ns         1820
Array::push_back_initialReserve<std::pair<int, int>>/65536         135640 ns       135499 ns         5165
Array::emplace_back_initialReserve<std::pair<int, int>>/65536      130796 ns       130653 ns         5355
vector::push_back_startEmpty<std::pair<int, int>>/65536            350294 ns       349552 ns         2004
vector::emplace_back_startEmpty<std::pair<int, int>>/65536         355547 ns       354832 ns         1972
vector::push_back_initialReserve<std::pair<int, int>>/65536         66364 ns        66296 ns        10087
vector::emplace_back_initialReserve<std::pair<int, int>>/65536      63207 ns        63139 ns        11517
Array::iterate_range<std::pair<int, int>>/65536                     51925 ns        51877 ns        13492
Array::iterate_direct<std::pair<int, int>>/65536                    31522 ns        31494 ns        22959
vector::iterate_range<std::pair<int, int>>/65536                    34649 ns        34616 ns        20223
vector::iterate_direct<std::pair<int, int>>/65536                   34650 ns        34614 ns        20222
Array::ctor<Wrapper<int>>/65536                                      4548 ns         4543 ns       154093
vector::ctor<Wrapper<int>>/65536                                    17349 ns        17333 ns        40383
Array::push_back_startEmpty<Wrapper<int>>/65536                    146199 ns       146058 ns         4796
Array::emplace_back_startEmpty<Wrapper<int>>/65536                 146684 ns       146531 ns         4762
Array::push_back_initialReserve<Wrapper<int>>/65536                129391 ns       129263 ns         5412
Array::emplace_back_initialReserve<Wrapper<int>>/65536             135634 ns       135504 ns         5193
vector::push_back_startEmpty<Wrapper<int>>/65536                    46341 ns        46297 ns        15124
vector::emplace_back_startEmpty<Wrapper<int>>/65536                 46343 ns        46297 ns        15120
vector::push_back_initialReserve<Wrapper<int>>/65536                40126 ns        40086 ns        17463
vector::emplace_back_initialReserve<Wrapper<int>>/65536             40123 ns        40087 ns        17461
Array::iterate_range<Wrapper<int>>/65536                            35463 ns        35428 ns        20013
Array::iterate_direct<Wrapper<int>>/65536                           30262 ns        30233 ns        21990
vector::iterate_range<Wrapper<int>>/65536                           34650 ns        34616 ns        20222
vector::iterate_direct<Wrapper<int>>/65536                          34644 ns        34612 ns        20224
Array::ctor<std::string>/65536                                      88505 ns        88357 ns         7923
vector::ctor<std::string>/65536                                     82147 ns        82016 ns         8534
Array::push_back_startEmpty<std::string>/65536                    4404390 ns      4395382 ns          159
Array::emplace_back_startEmpty<std::string>/65536                 4472433 ns      4463544 ns          157
Array::push_back_initialReserve<std::string>/65536                4277672 ns      4269434 ns          164
Array::emplace_back_initialReserve<std::string>/65536             4290957 ns      4283060 ns          164
vector::push_back_startEmpty<std::string>/65536                   4237623 ns      4229098 ns          165
vector::emplace_back_startEmpty<std::string>/65536                4220593 ns      4211613 ns          166
vector::push_back_initialReserve<std::string>/65536               4175780 ns      4167776 ns          168
vector::emplace_back_initialReserve<std::string>/65536            4164180 ns      4156289 ns          168
Array::iterate_range<std::string>/65536                            212227 ns       211870 ns         3301
Array::iterate_direct<std::string>/65536                           212656 ns       212243 ns         3292
vector::iterate_range<std::string>/65536                           217987 ns       217601 ns         3209
vector::iterate_direct<std::string>/65536                          220336 ns       219948 ns         3192

My quick read is that std::vector can be several times faster than axom::Array for push_back and emplace_back on simple types, even when we reserve storage ahead of time (compare e.g. the lines with push_back_initialReserve)

src/axom/core/tests/core_benchmark_array.cpp

rhornung67

Overall, looks good. I commented with a couple of questions.

src/axom/core/tests/core_benchmark_array.cpp

publixsubfan · 2024-11-11T22:45:42Z

Oof, those numbers for primitive types are not pretty. Did you see anything obvious popping out that might’ve been the cause @kennyweiss?

kennyweiss · 2024-11-11T23:06:21Z

Oof, those numbers for primitive types are not pretty. Did you see anything obvious popping out that might’ve been the cause @kennyweiss?

I haven't had a chance to dig too deeply -- this PR is mostly shining a light on the problem.
<speculation> I'm guessing that std::vector is able to inline a bunch of calls and the overhead is due to explicit function calls and std::forward </speculation>,

BradWhitlock · 2024-11-11T23:10:57Z

src/axom/core/tests/core_benchmark_array.cpp

+
+// Custom fmt formatter for ArrayFeatureBenchmarks
+template <>
+struct axom::fmt::formatter<ArrayFeatureBenchmarks>


I'm surprised fmt requires this much code to print an enum.

Agreed -- it's because I'm using the enum for flags and wanted to print all enabled features, e.g.

>./tests/core_benchmark_array -f constructors insertion [INFO] Parsed and processed command line arguments: [INFO] - Array sizes: 65536 [INFO] - Array features to test: Constructors|Insertion # <-- Output for two flags ... >./tests/core_benchmark_array -f constructors insertion -f all [INFO] Parsed and processed command line arguments: [INFO] - Array sizes: 65536 [INFO] - Array features to test: All # <-- Output for "all" flags

Happy to improve it if you know of a better way.

BradWhitlock · 2024-11-11T23:14:40Z

src/axom/core/tests/core_benchmark_array.cpp

+  // clang-format off
+  if((args_benchmark_features & ArrayFeatureBenchmarks::Constructors) != ArrayFeatureBenchmarks::None)
+  {
+    benchmark::RegisterBenchmark(tname("Array::ctor"), &ctor<axom::Array<T>>)->Apply(CustomArgs);


Taking the address of the functions surprised me here.

I think it's because it's registering it at runtime, so needs a function pointer rather than using a macro (?)

src/axom/slic/tests/slic_benchmark_asserts.cpp

BradWhitlock · 2024-11-11T23:18:05Z

src/axom/quest/Shaper.cpp

@@ -161,6 +161,7 @@ void Shaper::loadShapeInternal(const klee::Shape& shape,
                               double& revolvedVolume)
 {
  using axom::utilities::string::endsWith;
+  AXOM_UNUSED_VAR(percentError);  // only currently used with C2C


Thanks for fixing - I think I caused that warning.

BradWhitlock

Looks good to me.

Allows users to control which array features to test as well as the sizes of the arrays.

It was taking about 2 minutes before and is now taking less than a second

Use Array::reserve rather than Array::resize for initial allocation

Avoided calling `typeid` on a pointer type to avoid `-Wpotentially-evaluated-expression`.

kennyweiss added Core Issues related to Axom's 'core' component Testing Issues related to testing Axom Performance Issues related to code performance labels Nov 8, 2024

kennyweiss self-assigned this Nov 8, 2024

rhornung67 reviewed Nov 8, 2024

View reviewed changes

src/axom/core/tests/core_benchmark_array.cpp Show resolved Hide resolved

rhornung67 reviewed Nov 8, 2024

View reviewed changes

src/axom/core/tests/core_benchmark_array.cpp Outdated Show resolved Hide resolved

rhornung67 approved these changes Nov 8, 2024

View reviewed changes

kennyweiss force-pushed the feature/kweiss/benchmark-array branch from bda418b to 9bf3442 Compare November 11, 2024 22:33

kennyweiss requested review from white238, cyrush, BradWhitlock, rhornung67, nselliott, publixsubfan, bmhan12 and Arlie-Capps November 11, 2024 22:36

kennyweiss commented Nov 11, 2024

View reviewed changes

src/axom/core/tests/core_benchmark_array.cpp Outdated Show resolved Hide resolved

rhornung67 approved these changes Nov 11, 2024

View reviewed changes

BradWhitlock reviewed Nov 11, 2024

View reviewed changes

src/axom/slic/tests/slic_benchmark_asserts.cpp Outdated Show resolved Hide resolved

BradWhitlock reviewed Nov 11, 2024

View reviewed changes

BradWhitlock approved these changes Nov 11, 2024

View reviewed changes

kennyweiss added 5 commits November 12, 2024 11:53

Updates blt submodule to include fixes for running benchmarks

8524768

Updates and fixes benchmark tests for slic macros/asserts

a9f5abf

Updates slam benchmarks to current BLT gbenchmark

3dd4ee7

Fixes warning about unused variable

be9ee9e

Initial benchmarking of axom::Array vs. std::vector

ff3240e

kennyweiss and others added 19 commits November 12, 2024 11:53

Removes unnecessary comment

822a9d7

Fixes some release config warnings for clang

7a45b56

Adds additional types to the array benchmarks

eaac8c3

Cleanup asserts

b430d9f

Adds functions to compare iterator access times

c80eaec

Cleans up and unifies push_back and emplace_back tests

358768b

Bugfix for triggered asserts in debug config

744f597

Limit time for slam benchmarks when run through run_benchmarks target

2453003

Enables benchmarks in our release CI plan

b006b46

Fixes some warnings for configs that do not use C2C

1a89720

Pass USE_BENCHMARKS to docker in azure-pipelines

178c9ab

Adds benchmarks to a debug config in our azure pipelines CI

20a90a4

Use #if defined() instead of #ifdef per PR suggestion

b365e90

Adds command line args to array benchmarks

37ccb00

Allows users to control which array features to test as well as the sizes of the arrays.

Reduce (default) size of slam_array benchmarks

d811122

It was taking about 2 minutes before and is now taking less than a second

Fix type

1c55719

Fixes slic benchmarks in debug builds and reduces output verbosity

5fd3728

Fixes a warning about an unused variable when not using Umpire

9755964

Bugfix in array benchmarks

f1f9325

Use Array::reserve rather than Array::resize for initial allocation

kennyweiss force-pushed the feature/kweiss/benchmark-array branch from 4b89107 to f7f6a25 Compare November 12, 2024 20:29

Fixes some warnings in sina tests and examples

3d0fbf2

Avoided calling `typeid` on a pointer type to avoid `-Wpotentially-evaluated-expression`.

kennyweiss force-pushed the feature/kweiss/benchmark-array branch from f7f6a25 to 3d0fbf2 Compare November 12, 2024 20:53

Arlie-Capps approved these changes Nov 12, 2024

View reviewed changes

kennyweiss merged commit 54bb1c1 into develop Nov 12, 2024
13 checks passed

kennyweiss deleted the feature/kweiss/benchmark-array branch November 12, 2024 23:57

kennyweiss mentioned this pull request Nov 13, 2024

Improve performance of axom::Array::push_back #1471

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds initial benchmarking for axom::Array vs. std::vector #1469

Adds initial benchmarking for axom::Array vs. std::vector #1469

kennyweiss commented Nov 8, 2024 •

edited

Loading

rhornung67 left a comment

publixsubfan commented Nov 11, 2024

kennyweiss commented Nov 11, 2024

BradWhitlock Nov 11, 2024

kennyweiss Nov 12, 2024

BradWhitlock Nov 11, 2024

kennyweiss Nov 12, 2024

BradWhitlock Nov 11, 2024

BradWhitlock left a comment

Adds initial benchmarking for axom::Array vs. std::vector #1469

Adds initial benchmarking for axom::Array vs. std::vector #1469

Conversation

kennyweiss commented Nov 8, 2024 • edited Loading

Summary

Details

rhornung67 left a comment

Choose a reason for hiding this comment

publixsubfan commented Nov 11, 2024

kennyweiss commented Nov 11, 2024

BradWhitlock Nov 11, 2024

Choose a reason for hiding this comment

kennyweiss Nov 12, 2024

Choose a reason for hiding this comment

BradWhitlock Nov 11, 2024

Choose a reason for hiding this comment

kennyweiss Nov 12, 2024

Choose a reason for hiding this comment

BradWhitlock Nov 11, 2024

Choose a reason for hiding this comment

BradWhitlock left a comment

Choose a reason for hiding this comment

kennyweiss commented Nov 8, 2024 •

edited

Loading