Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds initial benchmarking for axom::Array vs. std::vector #1469

Merged
merged 25 commits into from
Nov 12, 2024

Conversation

kennyweiss
Copy link
Member

@kennyweiss kennyweiss commented Nov 8, 2024

Summary

Details

Here are the initial results for a Release Clang config on an LC CTS-2 cluster comparing axom::Array vs. std::vector templates on several types:

  • int
  • std::pair<int,int>
  • Wrapper<int> -- a struct that wraps an int
  • std::string

For simplicity, these results only use a single array size -- $2^{16} == 65,536$ elements, although it's easy to test other sizes and/or types.

>./tests/core_benchmark_array 

Running ./tests/core_benchmark_array
Run on (224 X 3800.68 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x112)
  L1 Instruction 32 KiB (x112)
  L2 Unified 2048 KiB (x112)
  L3 Unified 107520 KiB (x2)
Load Average: 0.00, 0.00, 0.33
---------------------------------------------------------------------------------------------------------
Benchmark                                                               Time             CPU   Iterations
---------------------------------------------------------------------------------------------------------
Array::ctor<int>/65536                                               4558 ns         4553 ns       153089
vector::ctor<int>/65536                                              4297 ns         4293 ns       163069
Array::push_back_startEmpty<int>/65536                             243490 ns       243157 ns         3180
Array::emplace_back_startEmpty<int>/65536                          247124 ns       246788 ns         2834
Array::push_back_initialReserve<int>/65536                         129280 ns       129155 ns         5419
Array::emplace_back_initialReserve<int>/65536                      136120 ns       135991 ns         5137
vector::push_back_startEmpty<int>/65536                            143477 ns       143213 ns         4886
vector::emplace_back_startEmpty<int>/65536                         143364 ns       143085 ns         4892
vector::push_back_initialReserve<int>/65536                         40123 ns        40085 ns        17463
vector::emplace_back_initialReserve<int>/65536                      40122 ns        40087 ns        17461
Array::iterate_range<int>/65536                                     31360 ns        31328 ns        22596
Array::iterate_direct<int>/65536                                    32914 ns        32881 ns        22270
vector::iterate_range<int>/65536                                    34642 ns        34611 ns        20225
vector::iterate_direct<int>/65536                                   34669 ns        34636 ns        20224
Array::ctor<std::pair<int, int>>/65536                               8809 ns         8800 ns        79545
vector::ctor<std::pair<int, int>>/65536                              8554 ns         8545 ns        81923
Array::push_back_startEmpty<std::pair<int, int>>/65536             390602 ns       389792 ns         1794
Array::emplace_back_startEmpty<std::pair<int, int>>/65536          385204 ns       384419 ns         1820
Array::push_back_initialReserve<std::pair<int, int>>/65536         135640 ns       135499 ns         5165
Array::emplace_back_initialReserve<std::pair<int, int>>/65536      130796 ns       130653 ns         5355
vector::push_back_startEmpty<std::pair<int, int>>/65536            350294 ns       349552 ns         2004
vector::emplace_back_startEmpty<std::pair<int, int>>/65536         355547 ns       354832 ns         1972
vector::push_back_initialReserve<std::pair<int, int>>/65536         66364 ns        66296 ns        10087
vector::emplace_back_initialReserve<std::pair<int, int>>/65536      63207 ns        63139 ns        11517
Array::iterate_range<std::pair<int, int>>/65536                     51925 ns        51877 ns        13492
Array::iterate_direct<std::pair<int, int>>/65536                    31522 ns        31494 ns        22959
vector::iterate_range<std::pair<int, int>>/65536                    34649 ns        34616 ns        20223
vector::iterate_direct<std::pair<int, int>>/65536                   34650 ns        34614 ns        20222
Array::ctor<Wrapper<int>>/65536                                      4548 ns         4543 ns       154093
vector::ctor<Wrapper<int>>/65536                                    17349 ns        17333 ns        40383
Array::push_back_startEmpty<Wrapper<int>>/65536                    146199 ns       146058 ns         4796
Array::emplace_back_startEmpty<Wrapper<int>>/65536                 146684 ns       146531 ns         4762
Array::push_back_initialReserve<Wrapper<int>>/65536                129391 ns       129263 ns         5412
Array::emplace_back_initialReserve<Wrapper<int>>/65536             135634 ns       135504 ns         5193
vector::push_back_startEmpty<Wrapper<int>>/65536                    46341 ns        46297 ns        15124
vector::emplace_back_startEmpty<Wrapper<int>>/65536                 46343 ns        46297 ns        15120
vector::push_back_initialReserve<Wrapper<int>>/65536                40126 ns        40086 ns        17463
vector::emplace_back_initialReserve<Wrapper<int>>/65536             40123 ns        40087 ns        17461
Array::iterate_range<Wrapper<int>>/65536                            35463 ns        35428 ns        20013
Array::iterate_direct<Wrapper<int>>/65536                           30262 ns        30233 ns        21990
vector::iterate_range<Wrapper<int>>/65536                           34650 ns        34616 ns        20222
vector::iterate_direct<Wrapper<int>>/65536                          34644 ns        34612 ns        20224
Array::ctor<std::string>/65536                                      88505 ns        88357 ns         7923
vector::ctor<std::string>/65536                                     82147 ns        82016 ns         8534
Array::push_back_startEmpty<std::string>/65536                    4404390 ns      4395382 ns          159
Array::emplace_back_startEmpty<std::string>/65536                 4472433 ns      4463544 ns          157
Array::push_back_initialReserve<std::string>/65536                4277672 ns      4269434 ns          164
Array::emplace_back_initialReserve<std::string>/65536             4290957 ns      4283060 ns          164
vector::push_back_startEmpty<std::string>/65536                   4237623 ns      4229098 ns          165
vector::emplace_back_startEmpty<std::string>/65536                4220593 ns      4211613 ns          166
vector::push_back_initialReserve<std::string>/65536               4175780 ns      4167776 ns          168
vector::emplace_back_initialReserve<std::string>/65536            4164180 ns      4156289 ns          168
Array::iterate_range<std::string>/65536                            212227 ns       211870 ns         3301
Array::iterate_direct<std::string>/65536                           212656 ns       212243 ns         3292
vector::iterate_range<std::string>/65536                           217987 ns       217601 ns         3209
vector::iterate_direct<std::string>/65536                          220336 ns       219948 ns         3192

My quick read is that std::vector can be several times faster than axom::Array for push_back and emplace_back on simple types, even when we reserve storage ahead of time (compare e.g. the lines with push_back_initialReserve)

@kennyweiss kennyweiss added Core Issues related to Axom's 'core' component Testing Issues related to testing Axom Performance Issues related to code performance labels Nov 8, 2024
@kennyweiss kennyweiss self-assigned this Nov 8, 2024
Copy link
Member

@rhornung67 rhornung67 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good. I commented with a couple of questions.

@publixsubfan
Copy link
Contributor

Oof, those numbers for primitive types are not pretty. Did you see anything obvious popping out that might’ve been the cause @kennyweiss?

@kennyweiss
Copy link
Member Author

Oof, those numbers for primitive types are not pretty. Did you see anything obvious popping out that might’ve been the cause @kennyweiss?

I haven't had a chance to dig too deeply -- this PR is mostly shining a light on the problem.
<speculation> I'm guessing that std::vector is able to inline a bunch of calls and the overhead is due to explicit function calls and std::forward </speculation>,


// Custom fmt formatter for ArrayFeatureBenchmarks
template <>
struct axom::fmt::formatter<ArrayFeatureBenchmarks>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised fmt requires this much code to print an enum.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed -- it's because I'm using the enum for flags and wanted to print all enabled features, e.g.

>./tests/core_benchmark_array -f constructors insertion
[INFO] Parsed and processed command line arguments: 
[INFO] - Array sizes: 65536 
[INFO] - Array features to test: Constructors|Insertion   # <-- Output for two flags

...
>./tests/core_benchmark_array -f constructors insertion -f all
[INFO] Parsed and processed command line arguments: 
[INFO] - Array sizes: 65536 
[INFO] - Array features to test: All                      # <-- Output for "all" flags

Happy to improve it if you know of a better way.

// clang-format off
if((args_benchmark_features & ArrayFeatureBenchmarks::Constructors) != ArrayFeatureBenchmarks::None)
{
benchmark::RegisterBenchmark(tname("Array::ctor"), &ctor<axom::Array<T>>)->Apply(CustomArgs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking the address of the functions surprised me here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's because it's registering it at runtime, so needs a function pointer rather than using a macro (?)

@@ -161,6 +161,7 @@ void Shaper::loadShapeInternal(const klee::Shape& shape,
double& revolvedVolume)
{
using axom::utilities::string::endsWith;
AXOM_UNUSED_VAR(percentError); // only currently used with C2C
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing - I think I caused that warning.

Copy link
Member

@BradWhitlock BradWhitlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@kennyweiss kennyweiss force-pushed the feature/kweiss/benchmark-array branch from 4b89107 to f7f6a25 Compare November 12, 2024 20:29
Avoided calling `typeid` on a pointer type to avoid `-Wpotentially-evaluated-expression`.
@kennyweiss kennyweiss force-pushed the feature/kweiss/benchmark-array branch from f7f6a25 to 3d0fbf2 Compare November 12, 2024 20:53
@kennyweiss kennyweiss merged commit 54bb1c1 into develop Nov 12, 2024
13 checks passed
@kennyweiss kennyweiss deleted the feature/kweiss/benchmark-array branch November 12, 2024 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core Issues related to Axom's 'core' component Performance Issues related to code performance Testing Issues related to testing Axom
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants