-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce p4c compile time #4674
Comments
Thanks for this! I haven't looked into this for some time, but from my time at Tofino I remember some investigations (the deeper once mostly lead nowhere because what happened to Tofino) and some conclusions.
Yes, that matches our observations. We were discussing if precompiled headers would be something that could help with this, but I don't think we got to a serious try. Might be worth investigating. On top of that, I was concerned with build process and linking as that was low-hanging fruit for us, this will likely work for others (but a lot probably use it already):
Maybe we should isolate the P4-14 input support and make it optional. Not all backends need it.
Now we have actually a lot of control over many of these. We can add explicit instantiation declarations for selected instances (e.g. |
I think it's mostly because we're having 200+ IR::Node descendants with lots of virtual methods and wide inheritance. Lots of small things that add together...
Yeah. Though this was
Actually, they could be as both |
And this one is interesting:
Actually, every |
A lot of the error handling code is from C++11 times, we now have more ways of handling variadics, it might be worth refactoring it. I can even see a way when we just take the argument pack and project it either directly to string, or to a type-erased object (more likely, as we still need to handle source info) and then instead of doing the recursion (if we can't get rid of it) based on all the type combinations, we just instantiate it based on all the lengths of error parametrization. Then we could again apply explicit instatiations if needed. |
Thanks for the analysis. Yes, compile time has been a constant pain point. The Tofino compiler is much worse because it adds a lot more IR nodes which blows up compile time and linking even further. The only other open-source issue I remember is #3980, which is concerned with splitting Other than that I have periodically been running IWYU to get some better sense what is being included/used in some classes: #3767
We could move the
I tried some refactoring in #3774 once. The challenge is to preserve the source location accurately, which may require some C++20 features. If you wanted to get rid of the macro setup.. |
The problem is header boost dependency. So, moving out will not help a lot :) |
That would require not only C++20, but more importantly GCC 11... Maybe once Ubuntu 20 goes out of support we can consider updating? Other old popular systems (RHEL 8) already have to depend on non-default GCC now. EDIT: to get |
Not sure I follow. If we move out |
Right. But |
I am thinking it is not that widely used. Usually only in places where we work with constants. It might be possible to factor it out. Second, we used to have GMP support actually: #3485 But it was removed because of licensing concerns/CI waste. It could be brought back, if there is really a need. |
Ok, maybe good to go then, yes. |
Building a debug version of a downstream compiler with original:
With explicit intantiations of
... so something like almost 7 % in this case. Not a huge difference, but noticeable for such an easy change I'd say. I'll try a few more template-instantiation tricks and then open a PR with this. |
@vlstill One step at a time. There is no single place that contributes to the compile time, but 7% improvement is already a lot. |
For release the story is similar. Also I forgot to mention this is a non-unity build. A release build this time, otherwise the same comparison:
So little over 9 % for release. |
Well, the thing it: it is a part of |
Currently p4c compile time are quite large compared to other compilers / project given the codebase size. Likely this question / issue was already raised before (otherwise, why there are unity builds here), but I was unable to find the corresponding issue.
I tried to check, if there are any low-hanging fruits here. Looks like not, still few cases are interested. Attaches is the time race report as generated by clang. It could be visualized via Chrome tracing backend, or better, via Speedoscope or Perfetto. The file in question is helpers.cpp from gtest testsuite, though similar patterns are everywhere. The file in question has the longest frontend parse time across the codebase (13 seconds for me).
Interesting observations:
big_int.h
that is includes viabig_int_utils.h
that is included viastringify.h
almost everywhere (as it is subsequently included viaerror_handler.h
used byexceptions.h
, the latter providesBUG_CHECK
, etc). Maybe this could be refactored a bit better. Though, big ints are everywhere, so it will still be here.ir-generated.h
(actually more due to later template instantiations)frontends/common/parseInput.h
with the majority of time spent into v1.0 converters. This likely could be refactoredThere are some expensive template instantiations here
Template sets that took longest to instantiate:
Here are the stats across the whole codebase:
Templates that took longest to instantiate:
Template sets that took longest to instantiate:
The file with time profile:
helpers.cpp.json.gz
The text was updated successfully, but these errors were encountered: