Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline some IR methods and constructors #5030

Merged
merged 1 commit into from
Nov 27, 2024
Merged

Inline some IR methods and constructors #5030

merged 1 commit into from
Nov 27, 2024

Conversation

asl
Copy link
Contributor

@asl asl commented Nov 24, 2024

It does not really make sense to emit them in ir-generated.cpp as they are mostly one-liners and could be further simplified.

@asl asl requested review from vlstill, ChrisDodd and fruffy November 24, 2024 09:46
@asl asl added the core Topics concerning the core segments of the compiler (frontend, midend, parser) label Nov 24, 2024
Copy link
Contributor

@ChrisDodd ChrisDodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were originally all made non-inline to try to speed up the build. Not sure how much effect it has one way or the other.

@asl
Copy link
Contributor Author

asl commented Nov 24, 2024

@fruffy Looks like mac runner image is again broken. But why do we need pkg-config there?

Copy link
Collaborator

@fruffy fruffy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of this change? For me it makes sense to have all the implementations in one place. Otherwise you end up jumping between files when reading up on a class.

@fruffy Looks like mac runner image is again broken. But why do we need pkg-config there?

Looks like a new runner is deployed, there is already a fix here: #5028

@@ -530,7 +530,7 @@ int IrClass::generateConstructor(const ctor_args_t &arglist, const IrMethod *use
}

if (kind == NodeKind::Abstract) ctor->access = IrElement::Protected;
ctor->inImpl = true;
ctor->inImpl = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will have major effect on the size of the ir-generated.h header. Is this necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, currently all simple the constructors are outlined. This case covers those that are not in defined explicitly and just initialize the fields. As a result lots of function calls are emitted as compiler is unable to inline those obviously.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some deeply nested inheritance trees. Wouldn't this lead to a blowup if every constructor can be inlined?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are also adding ~4k LoC to the header in the worst case (with Tofino)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some deeply nested inheritance trees. Wouldn't this lead to a blowup if every constructor can be inlined?

Not quite. Remember that inlining decision is done by compiler. Not every function declared inline will be inlined. Also, explicitly defined constructors w/o inline will still be outlined, there is a way to control that. Implicit ones are the ones that initialize the fields just forwarding the arguments.

@asl
Copy link
Contributor Author

asl commented Nov 24, 2024

What is the benefit of this change? For me it makes sense to have all the implementations in one place. Otherwise you end up jumping between files when reading up on a class.

The main purpose of this is to ensure that simple methods are properly inlined. E.g. does not make any sense to outline one-liners like

    inline bool isBuiltin() const { return name == ParserState::accept || name == ParserState::reject; }

We just pay extra function call for no reason. Defining things inline ensures compiler is able to optimize them further on.

@fruffy
Copy link
Collaborator

fruffy commented Nov 24, 2024

Is there a measurable effect on performance? We should be mindful not to trade of too much usability/readability and binary size for this.

Copy link
Collaborator

@fruffy fruffy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we merge it would be nice to see some numbers on performance gain versus compile time increase. If compile time increases significantly and there is no measurable performance difference I do not think we should merge this.

If, on the other hand, compile time doesn't measurably change we can merge.

@vlstill
Copy link
Contributor

vlstill commented Nov 25, 2024

In case the compile-time impact is high, how does this change compare to enabling LTO in the build? What I remember from Tofino days, we had rather significant speedup by enabling LTO, presumably exactly because it allows inlining more functions. The advantage is that LTO can be enabled only for release builds so normal development build speeds are not affected.

On a slight tangent, there was an idea of speeding up compilation by using PCH (precompiled header in GCC/clang) for ir-generated.h. As far as I know, it was never tried and it was not completely clear if it was doable, but maybe it is worth investigating for the compilation speed (I don't think C++ modules will "save" us in in any reasonably close future giving their current state and more importantly the tool support state and requirements for P4C build).

@asl
Copy link
Contributor Author

asl commented Nov 25, 2024

In case the compile-time impact is high, how does this change compare to enabling LTO in the build? What I remember from Tofino days, we had rather significant speedup by enabling LTO, presumably exactly because it allows inlining more functions. The advantage is that LTO can be enabled only for release builds so normal development build speeds are not affected.

I believe LTO contributes a lot to devirtualization and further optimization opportunities uncovered due to this.

On a slight tangent, there was an idea of speeding up compilation by using PCH (precompiled header in GCC/clang) for ir-generated.h. As far as I know, it was never tried and it was not completely clear if it was doable, but maybe it is worth investigating for the compilation speed

This is good idea, yes, that is worth trying. Modules are likely unfeasible due to circular dependencies here and there. But PCH / PTH might give a decent speedup.

@asl
Copy link
Contributor Author

asl commented Nov 25, 2024

So, the performance results are interesting.

Here are the build time for ir-generated.cpp:

main:

Executed in   37.11 secs    fish           external
   usr time   36.05 secs    0.14 millis   36.05 secs
   sys time    0.87 secs    3.76 millis    0.87 secs

ir-inline:

Executed in   31.97 secs    fish           external
   usr time   30.93 secs    0.09 millis   30.93 secs
   sys time    0.76 secs    4.82 millis    0.76 secs

These numbers are pretty stable, so for me it's ~38 second on average on main and ~32 seconds on ir-inline.

The compile time for "ordinary" C++ files looks similar to me, I'm showing individual files here, but averages from multiple runs are pretty much the same:

ir/base.cpp:
main:

Executed in    3.01 secs    fish           external
   usr time    3.63 secs    0.12 millis    3.63 secs
   sys time    0.86 secs    2.33 millis    0.86 secs

ir-inline:

Executed in    3.09 secs    fish           external
   usr time    3.69 secs    0.12 millis    3.69 secs
   sys time    0.84 secs    2.25 millis    0.84 secs

And here is frontend/def_use.cpp:
main:

Executed in    5.13 secs    fish           external
   usr time    5.65 secs    0.25 millis    5.65 secs
   sys time    0.91 secs    1.63 millis    0.91 secs

ir-inline:

Executed in    5.19 secs    fish           external
   usr time    5.76 secs    0.14 millis    5.76 secs
   sys time    0.93 secs    2.52 millis    0.92 secs

The overall compile time for the whole p4c is pretty noisy. I tried to replicate it several times, but overall the differences are below the noise level.

The performance of the p4c itself improved slightly. Which is even surprising for pretty flat profile dominated by memory allocation, copies and constant clones :) Sadly, I cannot run larger apps with GC off, but GC itself introduced lots of variation to catch ~1-2% difference.

Command Mean [s] Min [s] Max [s] Relative
test/gtestp4c-main --gtest_filter=P4CParserUnroll.switch_20160512 2.945 ± 0.032 2.889 3.211 1.01 ± 0.01
test/gtestp4c --gtest_filter=P4CParserUnroll.switch_20160512 2.909 ± 0.027 2.857 2.977 1.00

@fruffy Please let me know if there are any other checks that you'd want me to run

Copy link
Collaborator

@fruffy fruffy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange results indeed. Thanks for the detailed analysis!
It looks like build time is not adversely affected but it is hard to measure.

to outline them as they are mostly one-liners and could be further simplified.

Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
@asl asl added this pull request to the merge queue Nov 26, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Nov 27, 2024
@asl asl added this pull request to the merge queue Nov 27, 2024
Merged via the queue into p4lang:main with commit c348f07 Nov 27, 2024
19 checks passed
@asl asl deleted the ir-inline branch November 27, 2024 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Topics concerning the core segments of the compiler (frontend, midend, parser)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants