Use better maps to store visitor state #4459

asl · 2024-02-23T06:21:49Z

This PR imports modern hash map implementation from https://github.com/greg7mdp/parallel-hashmap Essentially it's a hash map extracted from Abseil plus some additional stuff on top of that. Note that while it is a drop-in replacement of std::unordered_map it has different guarantees wrt iterator invalidation as it uses open-addressing scheme.

Advantages of this map is:

Much faster lookups
Much less malloc traffic
Better memory usage

On our downstream codebase just this switch alone yields ~2.1x compile time reduction on some large P4C apps.

The PR contains several commits to measure effects:

Use of map itself (gtestp4c-phmap below)
Preallocation of 16 slots (we know that in majority of cases this would be enough) (gtestp4c-phmap-16 below)
Switch from std::hash to Utils::Hash (gtestp4c-phmap-16-hash below)

To compare with hvec_map I also benchmarked wrt it (gtestp4c-hmap). The benchmarking results are:

Command	Mean [s]	Min [s]	Max [s]	Relative
`gtestp4c-baseline --gtest_filter=P4CParserUnroll.switch_20160512`	8.375 ± 0.177	8.118	8.648	1.25 ± 0.04
`gtestp4c-phmap --gtest_filter=P4CParserUnroll.switch_20160512`	6.886 ± 0.173	6.563	7.061	1.03 ± 0.03
`gtestp4c-phmap-16 --gtest_filter=P4CParserUnroll.switch_20160512`	6.722 ± 0.118	6.539	6.900	1.00 ± 0.03
`gtestp4c-phmap-16-hash --gtest_filter=P4CParserUnroll.switch_20160512`	6.705 ± 0.147	6.546	7.038	1.00
`gtestp4c-hmap --gtest_filter=P4CParserUnroll.switch_20160512`	7.547 ± 0.099	7.356	7.689	1.13 ± 0.03

To summarize: it yields 25% improvements against the baseline std::unordered_map and 13% improvements towards hvec_map. As I already said, for some of our downstream apps the difference is more than two-fold, the compilation time reduced from 5m 14s down to 2m 29s (hvec_map was not that attractive there sadly). I also saw some marginal reduction of peak RSS from 15.3 GiB down to 14.87 GiB.

To give more benchmarking results, here are the whole gtest times (here the proportion of Visitor code is much less):

Command	Mean [s]	Min [s]	Max [s]	Relative
`gtestp4c-baseline`	14.398 ± 0.310	13.920	14.923	1.17 ± 0.03
`gtestp4c-hmap`	13.083 ± 0.280	12.683	13.496	1.06 ± 0.03
`gtestp4c-phmap-16-hash`	12.338 ± 0.256	12.003	12.877	1.00

asl · 2024-02-23T06:32:41Z

For the record, I also tried https://github.com/Tessil/robin-map – it was slower than phmap for visitors. And it is not a drop-in replacement, so code changed are required.

fruffy · 2024-02-23T13:22:25Z

Just two notes:

We may soon get Abseil "for free" because the compiler's Protobuf dependency will pull it in with Protobuf 22.x: https://protobuf.dev/news/2022-08-03/#abseil-dep. We simply haven't pulled the trigger on an upgrade.
P4Testgen heavily depends on the visitors' multiple dispatch so it should be affected by these changes. I will try to add a benchmark. We could also write up independent visitor microbenchmarks.

grg · 2024-02-23T15:15:39Z

We may soon get Abseil "for free" because the compiler's Protobuf dependency will pull it in with Protobuf 22.x: https://protobuf.dev/news/2022-08-03/#abseil-dep. We simply haven't pulled the trigger on an upgrade.

Is there an ETA for when Protobuf 22.x upgrade is likely to occur? Wondering whether it makes sense for this change to wait so that it can avoid adding and later removing the Abseil files?

Edit: Rereading the PR comment again ("Essentially it's a hash map extracted from Abseil plus some additional stuff on top of that"), perhaps waiting for Abseil as part of a Protobuf upgrade is insufficient.

fruffy · 2024-02-23T15:46:30Z

We may soon get Abseil "for free" because the compiler's Protobuf dependency will pull it in with Protobuf 22.x: https://protobuf.dev/news/2022-08-03/#abseil-dep. We simply haven't pulled the trigger on an upgrade.

Is there an ETA for when Protobuf 22.x upgrade is likely to occur? Wondering whether it makes sense for this change to wait so that it can avoid adding and later removing the Abseil files?

Edit: Rereading the PR comment again ("Essentially it's a hash map extracted from Abseil plus some additional stuff on top of that"), perhaps waiting for Abseil as part of a Protobuf upgrade is insufficient.

It might be as simple as bumping the version here: https://github.com/p4lang/p4c/blob/main/cmake/Protobuf.cmake#L35

Main reason I have not done that is because, well, we have a lot of boost code flying around and adding another dependency to a utility framework before cleaning the other one up seemed excessive.

I am tracking this in an issue (#3898), the hard part is replacing boost::format with std::format.

fruffy · 2024-02-23T15:51:00Z

Another thing to do is to break down our boost dependencies into the independent modules which are now available. For example:
https://github.com/boostorg/format
https://github.com/boostorg/multiprecision

That would simplify things instead of pulling in libboost-all-dev always.

asl · 2024-02-23T16:34:30Z

We may soon get Abseil "for free" because the compiler's Protobuf dependency will pull it in with Protobuf 22.x: https://protobuf.dev/news/2022-08-03/#abseil-dep. We simply haven't pulled the trigger on an upgrade.

Note that it started as a library from there. But since that time it had some changes here and there and also there are some notable differences. Certainly it would worth checking and either keeping it or replacing with abseil maps.

P4Testgen heavily depends on the visitors' multiple dispatch so it should be affected by these changes. I will try to add a benchmark. We could also write up independent visitor microbenchmarks.

Nothing wrt multiple dispatch here :) We are reducing malloc / GC traffic here and speeding up basic operations (insert / find).

grg · 2024-02-23T17:05:48Z

ir/visitor.cpp

    visited_t visited;

 public:
+    Tracker() : visited(16) {}


Please add a very brief comment explaining the number 16 🙂 E.g., "pre-allocating 16 slots yields a performance improvement and is sufficient for most cases".

I want to avoid a "huh?" response when people see this code later on.

Yeah! Likely we will change with inlined pre-allocated storage in the future, but one step at a time.

asl · 2024-02-23T17:08:30Z

All right. I went ahead and checked with abseil maps. Looks like they added some improvements there since phmap was forked. I would definitely say we'd need to upgrade ASAP:

Command	Mean [s]	Min [s]	Max [s]	Relative
`gtestp4c-baseline --gtest_filter=P4CParserUnroll.switch_20160512`	8.381 ± 0.187	8.155	8.652	1.41 ± 0.05
`gtestp4c-phmap-16-hash --gtest_filter=P4CParserUnroll.switch_20160512`	6.611 ± 0.107	6.490	6.840	1.11 ± 0.04
`gtestp4c-abseil --gtest_filter=P4CParserUnroll.switch_20160512`	5.935 ± 0.160	5.712	6.278	1.00
`test/gtestp4c-hmap --gtest_filter=P4CParserUnroll.switch_20160512`	7.541 ± 0.195	7.233	7.907	1.27 ± 0.05

I have some other PRs that replace maps on hot codepaths, so I'd prefer us to choose map before submitting it.

asl · 2024-02-23T17:09:09Z

I am tracking this in an issue (#3898), the hard part is replacing boost::format with std::format.

If abseil is a dependency, then you could use absl::format as well :)

fruffy · 2024-02-23T18:15:18Z

Nothing wrt multiple dispatch here :) We are reducing malloc / GC traffic here and speeding up basic operations (insert / find).

Doesn't this change speed up traversal for Transform and Inspector, too? I'd assume it would help with interpreters like P4Testgen.

All right. I went ahead and checked with abseil maps. Looks like they added some improvements there since phmap was forked. I would definitely say we'd need to upgrade ASAP:

Well one thing we could try is to bump the Protobuf version and see what happens. If nothing breaks we could add a PR for it, then rebase this one top and link against the absl map. In parallel, we can try to clean up the other dependencies.

asl · 2024-02-23T18:17:02Z

Doesn't this change speed up traversal for Transform and Inspector, too? I'd assume it would help with interpreters like P4Testgen.

Yes, it speeds-up the traverse a lot. But not the multiple-dispatch itself.

Well one thing we could try is to bump the Protobuf version and see what happens. If nothing breaks we could add a PR for it, then rebase this one top and link against the absl map. In parallel, we can try to clean up the other dependencies.

Let me give it a try and see what will happen :)

fruffy · 2024-02-23T18:22:35Z

If abseil is a dependency, then you could use absl::format as well :)

Iirc the problem is the difference in format specifiers. Boost uses %1%, %2%,... which may not be supported in the other frameworks. The migration is finicky.

asl · 2024-02-23T18:23:59Z

Iirc the problem is the difference in format specifiers. Boost uses %1%, %2%,... which may not be supported in the other frameworks. The migration is finicky.

Yes. I do not recall how well std::format is supported across the compilers off-hand. Migration of specifiers is a pain, I confirm (I once migrated large codebase to https://github.com/fmtlib/fmt)

ChrisDodd · 2024-02-26T00:43:05Z

I'd be interested in seeing why hvec_map is so much slower than this, given it is using essentially the same algorithms. You could try giving it an initial capacity of 16 as well, and maybe a different hash function (what is the default has function for pointers -- just identity? That would seem to be pretty good.)

asl · 2024-02-26T00:46:37Z

You could try giving it an initial capacity of 16 as well, and maybe a different hash function (what is the default has function for pointers -- just identity? That would seem to be pretty good.)

The benchmarking above already do this (pre-allocation of capacity plus better hash function, we do compare apples-to-apples). I tried hvec_map in other cases as well (PRs pending) and the overall outcome is the same: hvec_map might be faster than unordered_map, but always slower than these "swiss hash maps" (from abseil or separate phmap).

asl · 2024-02-27T00:20:01Z

Rebased on top of abseil. Here are the updated results (the differences with tables above is the non-unity build here):

Command	Mean [s]	Min [s]	Max [s]	Relative
`gtestp4c-baseline --gtest_filter=P4CParserUnroll.switch_20160512`	8.467 ± 0.164	8.150	8.682	1.38 ± 0.06
`gtestp4c --gtest_filter=P4CParserUnroll.switch_20160512`	6.134 ± 0.214	5.861	6.446	1.00

grg

LGTM, but please please give the other reviewers a chance to provide feedback before merging.

(The speedups sound great!)

ir/visitor.cpp

asl · 2024-02-27T01:53:03Z

LGTM, but please please give the other reviewers a chance to provide feedback before merging.

(The speedups sound great!)

Absolutely!

grg · 2024-02-27T02:01:51Z

One follow-up: it would be good to include a small section in the developer docs (e.g., docs/README.md -- maybe a "performance considerations" section?) as to why some collections are now Abseil. You could also consider adding guidance on when code might benefit from Abseil collections and how to switch.

Feel free to do this as a separate PR or part of #4473.

asl · 2024-02-27T02:05:05Z

One follow-up: it would be good to include a small section in the developer docs (e.g., docs/README.md -- maybe a "performance considerations" section?) as to why some collections are now Abseil. You could also consider adding guidance on when code might benefit from Abseil collections and how to switch.

Good idea. I think I'd also mention InlinedVector

vlstill

The speedup does look very interesting, thank you.

We could probably get even more by replacing standard maps sprinkled around the code base, but that would be significantly more work (and we would also need a better ordered_map/ordered_set).

ir/visitor.cpp

asl · 2024-02-27T13:30:20Z

We could probably get even more by replacing standard maps sprinkled around the code base, but that would be significantly more work (and we would also need a better ordered_map/ordered_set

See #4473 just in case :) Some care should be taken here as these maps have a bi different semantics about iterator invalidation and pointers stability, so we cannot just automatically replaces ones with others. However, it is definitely beneficial in few hot code paths

fruffy · 2024-02-27T15:26:03Z

ir/CMakeLists.txt

@@ -64,6 +64,7 @@ set (BASE_IR_DEF_FILES
 set (IR_DEF_FILES ${IR_DEF_FILES} ${BASE_IR_DEF_FILES} PARENT_SCOPE)

 add_library (ir STATIC ${IR_SRCS})
+target_link_libraries(ir absl::flat_hash_map)


Not sure how we want to manage linking... Should each dependent target link absl::flat_hash_map, should ir export it? Also unclear to me what the best approach is to keep binary size small.
I am guess we should make PUBLIC explicit here.

This is good question. Abseil is quite fine grained. So, all small pieces are exposed as small library.

What I can see from the present cmakefiles:

Order dependencies (add_dependency) are used instead of proper use dependencies (target_link_libraries). Therefore interface dependencies (include paths, etc.) are not properly exposed. In some cases this is alleviated via explicit adding of include paths

Some dependencies are clearly missed. E.g. backends / midend definitely pull pieces from frontend. However, this is only addressed during the link time, not during the build time. Likely this required explicit order dependencies mentioned above to fix highly-parallel builds.

I'd not bother about "binary size" here as these are libraries, so all unused code would be removed by linker. I am seeing few possibilities:

Make explicit global abseil dependencies. Via e.g. P4_LIB_DEPS. Still, it is not used consistently and usually only for final binaries, not for libraries

Go for fine grained approach

I'd probably try to use the second one:

If abseil is exposed via headers, then do PUBLIC dependency

If abseil is only used in cpp as implementation, fine – it's PRIVATE

I'll revise this PR wrt dependencies

add_dependency was used because historically all the libs were built independently and linked together by each back end. However, to ensure proper build order in parallel builds, add_dependency was needed.

The fine-grained approach works for me. #4474 implements some of this as I ran into some issues downstream.

add_dependency was used because historically all the libs were built independently and linked together by each back end. However, to ensure proper build order in parallel builds, add_dependency was needed.

Yes. But this is not how the dependencies should be done. Things are pulling headers, so this is a build-time dependency, not a link-time... #4473 also has some... we'd need to unify all somehow

With binary size I meant the final compiler binaries, which can be huge. I believe this is primarily caused by all the templating and ir-generated header, but also by the chaotic linking, which also seems to affect linking time. It can be excruciatingly slow.

Yes. But this is not how the dependencies should be done. Things are pulling headers, so this is a build-time dependency, not a link-time... #4473 also has some... we'd need to unify all somehow

Yeah I agree with you, the compiler infrastructure has been going through various phases of structuring with each back end doing their own thing.

Co-authored-by: Glen Gibb <glen.gibb@alumni.stanford.edu>

fruffy · 2024-02-27T17:33:16Z

@asl I removed this from the merge queue because @vlstill and @grg should still give their final OK. I believe @grg already gave the go.

asl · 2024-02-27T17:33:47Z

@asl I removed this from the merge queue because @vlstill and @grg should still give their final OK.

Ok, no problem

asl requested review from fruffy, grg, vlstill and ChrisDodd February 23, 2024 06:21

asl force-pushed the visitor-improve-maps branch from 8c17523 to e1b2302 Compare February 23, 2024 06:35

fruffy added the core Topics concerning the core segments of the compiler (frontend, midend, parser) label Feb 23, 2024

grg reviewed Feb 23, 2024

View reviewed changes

asl mentioned this pull request Feb 26, 2024

Bump protobuf version and add Abseil as compiler dependency. #4463

Merged

asl force-pushed the visitor-improve-maps branch from 6ae6c0d to b0f076a Compare February 27, 2024 00:19

asl requested a review from grg February 27, 2024 00:20

asl mentioned this pull request Feb 27, 2024

Use abseil maps even more #4473

Merged

grg approved these changes Feb 27, 2024

View reviewed changes

ir/visitor.cpp Show resolved Hide resolved

ir/visitor.cpp Outdated Show resolved Hide resolved

asl force-pushed the visitor-improve-maps branch from 56dd486 to 4a152b3 Compare February 27, 2024 01:55

asl added the run-validation Use this tag to trigger a Validation CI run. label Feb 27, 2024

asl force-pushed the visitor-improve-maps branch from 4a152b3 to e910044 Compare February 27, 2024 01:55

vlstill reviewed Feb 27, 2024

View reviewed changes

ir/visitor.cpp Show resolved Hide resolved

ir/visitor.cpp Show resolved Hide resolved

fruffy reviewed Feb 27, 2024

View reviewed changes

asl and others added 8 commits February 27, 2024 08:17

Make use of abseil flat_hash_map

7048f4f

Preallocate at least 16 slots

89324b1

Use better hash

ff3bff1

Do not do lookup twice to get final result

567f2fc

Clarify comment

432ef91

Clarify docstring

eb0a8a2

Co-authored-by: Glen Gibb <glen.gibb@alumni.stanford.edu>

Link abseil privately to IR

5e10ce2

Clarify iterator usage

491fe28

asl force-pushed the visitor-improve-maps branch from e910044 to 16f06a3 Compare February 27, 2024 16:33

asl requested review from vlstill and fruffy February 27, 2024 16:33

Refined finalResult semantics`

def2fed

asl force-pushed the visitor-improve-maps branch from 16f06a3 to def2fed Compare February 27, 2024 16:34

fruffy approved these changes Feb 27, 2024

View reviewed changes

asl enabled auto-merge February 27, 2024 16:55

asl added this pull request to the merge queue Feb 27, 2024

fruffy removed this pull request from the merge queue due to a manual request Feb 27, 2024

vlstill approved these changes Feb 27, 2024

View reviewed changes

fruffy added this pull request to the merge queue Feb 27, 2024

Merged via the queue into p4lang:main with commit bfd65e1 Feb 27, 2024
16 checks passed

asl deleted the visitor-improve-maps branch February 27, 2024 20:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use better maps to store visitor state #4459

Use better maps to store visitor state #4459

asl commented Feb 23, 2024 •

edited

Loading

asl commented Feb 23, 2024 •

edited

Loading

fruffy commented Feb 23, 2024

grg commented Feb 23, 2024 •

edited

Loading

fruffy commented Feb 23, 2024

fruffy commented Feb 23, 2024

asl commented Feb 23, 2024 •

edited

Loading

grg Feb 23, 2024

asl Feb 23, 2024

asl commented Feb 23, 2024 •

edited

Loading

asl commented Feb 23, 2024

fruffy commented Feb 23, 2024

asl commented Feb 23, 2024

fruffy commented Feb 23, 2024

asl commented Feb 23, 2024

ChrisDodd commented Feb 26, 2024

asl commented Feb 26, 2024 •

edited

Loading

asl commented Feb 27, 2024

grg left a comment

asl commented Feb 27, 2024

grg commented Feb 27, 2024

asl commented Feb 27, 2024

vlstill left a comment

asl commented Feb 27, 2024

fruffy Feb 27, 2024

asl Feb 27, 2024

asl Feb 27, 2024

fruffy Feb 27, 2024

asl Feb 27, 2024 •

edited

Loading

fruffy Feb 27, 2024

fruffy Feb 27, 2024

fruffy commented Feb 27, 2024 •

edited

Loading

asl commented Feb 27, 2024

Use better maps to store visitor state #4459

Use better maps to store visitor state #4459

Conversation

asl commented Feb 23, 2024 • edited Loading

asl commented Feb 23, 2024 • edited Loading

fruffy commented Feb 23, 2024

grg commented Feb 23, 2024 • edited Loading

fruffy commented Feb 23, 2024

fruffy commented Feb 23, 2024

asl commented Feb 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asl commented Feb 23, 2024 • edited Loading

asl commented Feb 23, 2024

fruffy commented Feb 23, 2024

asl commented Feb 23, 2024

fruffy commented Feb 23, 2024

asl commented Feb 23, 2024

ChrisDodd commented Feb 26, 2024

asl commented Feb 26, 2024 • edited Loading

asl commented Feb 27, 2024

grg left a comment

Choose a reason for hiding this comment

asl commented Feb 27, 2024

grg commented Feb 27, 2024

asl commented Feb 27, 2024

vlstill left a comment

Choose a reason for hiding this comment

asl commented Feb 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asl Feb 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fruffy commented Feb 27, 2024 • edited Loading

asl commented Feb 27, 2024

asl commented Feb 23, 2024 •

edited

Loading

asl commented Feb 23, 2024 •

edited

Loading

grg commented Feb 23, 2024 •

edited

Loading

asl commented Feb 23, 2024 •

edited

Loading

asl commented Feb 23, 2024 •

edited

Loading

asl commented Feb 26, 2024 •

edited

Loading

asl Feb 27, 2024 •

edited

Loading

fruffy commented Feb 27, 2024 •

edited

Loading