Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize optimizer step profiler #15506

Merged
merged 4 commits into from
Oct 15, 2024
Merged

Generalize optimizer step profiler #15506

merged 4 commits into from
Oct 15, 2024

Conversation

cameel
Copy link
Member

@cameel cameel commented Oct 11, 2024

This PR moves the simple profiling setup we had in Suite.cpp and moves it into a separate class that can be easily used in random places.

It also switches to static storage for the metrics, which means that we get a single report with total running time over all the Yul objects rather than one per object. Originally the profiler was used with small contracts from test/benchmarks/ where it was probably manageable, but when profiling big projects like Uniswap or OpenZeppelin, the aggregate data is much more useful.

Using the profiler is now as easy as dropping a PROFILER_PROBE() macro in the scope that is meant to be measured. The default probes are still placed only around the optimizer steps, but new ones can be added ad-hoc, locally.

The PR also tweaks the output to make it easy to post it in comments in a readable form. It also tracks the number of calls in addition to the timing.

CMakeLists.txt Outdated Show resolved Hide resolved
@cameel
Copy link
Member Author

cameel commented Oct 11, 2024

Here's what the output looks like when profiling a simple empty contract.

Before

Performance metrics of optimizer steps
======================================
  0.000% (0 s): VarDeclInitializer
  0.000% (0 s): ForLoopConditionOutOfBody
  0.098% (1e-06 s): SSAReverser
  0.098% (1e-06 s): Rematerialiser
  0.098% (1e-06 s): ExpressionInliner
  0.197% (2e-06 s): UnusedFunctionParameterPruner
  0.197% (2e-06 s): FunctionSpecializer
  0.295% (3e-06 s): ExpressionJoiner
  0.295% (3e-06 s): ForLoopConditionIntoBody
  0.394% (4e-06 s): CircularReferencesPruner
  0.394% (4e-06 s): StructuralSimplifier
  0.394% (4e-06 s): ForLoopInitRewriter
  0.394% (4e-06 s): BlockFlattener
  0.394% (4e-06 s): ControlFlowSimplifier
  0.492% (5e-06 s): ConditionalSimplifier
  0.492% (5e-06 s): FunctionGrouper
  0.689% (7e-06 s): ConditionalUnsimplifier
  0.689% (7e-06 s): FunctionHoister
  0.984% (1e-05 s): SSATransform
  1.083% (1.1e-05 s): LoopInvariantCodeMotion
  1.083% (1.1e-05 s): LiteralRematerialiser
  1.083% (1.1e-05 s): EqualStoreEliminator
  1.575% (1.6e-05 s): EquivalentFunctionCombiner
  1.673% (1.7e-05 s): ExpressionSplitter
  1.969% (2e-05 s): FullInliner
  2.264% (2.3e-05 s): UnusedStoreEliminator
  2.461% (2.5e-05 s): DeadCodeEliminator
  2.657% (2.7e-05 s): LoadResolver
  2.756% (2.8e-05 s): UnusedAssignEliminator
  3.937% (4e-05 s): UnusedPruner
  6.004% (6.1e-05 s): CommonSubexpressionEliminator
 64.862% (0.000659 s): ExpressionSimplifier
--------------------------------------
    100% (0.001 s)
Performance metrics of optimizer steps
======================================
  0.000% (0 s): BlockFlattener
  0.000% (0 s): StructuralSimplifier
  0.000% (0 s): ControlFlowSimplifier
  0.000% (0 s): ForLoopConditionIntoBody
  0.000% (0 s): ForLoopConditionOutOfBody
  0.000% (0 s): ForLoopInitRewriter
  0.131% (1e-06 s): VarDeclInitializer
  0.131% (1e-06 s): FunctionGrouper
  0.262% (2e-06 s): UnusedFunctionParameterPruner
  0.262% (2e-06 s): SSAReverser
  0.262% (2e-06 s): FunctionHoister
  0.394% (3e-06 s): ExpressionInliner
  0.525% (4e-06 s): FunctionSpecializer
  0.919% (7e-06 s): ExpressionJoiner
  1.181% (9e-06 s): CircularReferencesPruner
  1.181% (9e-06 s): Rematerialiser
  1.706% (1.3e-05 s): SSATransform
  1.837% (1.4e-05 s): EquivalentFunctionCombiner
  3.018% (2.3e-05 s): ConditionalSimplifier
  3.150% (2.4e-05 s): EqualStoreEliminator
  3.543% (2.7e-05 s): DeadCodeEliminator
  3.543% (2.7e-05 s): ConditionalUnsimplifier
  3.806% (2.9e-05 s): ExpressionSplitter
  4.068% (3.1e-05 s): LoopInvariantCodeMotion
  4.199% (3.2e-05 s): UnusedStoreEliminator
  4.462% (3.4e-05 s): FullInliner
  5.774% (4.4e-05 s): LiteralRematerialiser
  6.168% (4.7e-05 s): ExpressionSimplifier
  6.430% (4.9e-05 s): LoadResolver
  6.955% (5.3e-05 s): UnusedAssignEliminator
 14.436% (0.00011 s): UnusedPruner
 21.654% (0.000165 s): CommonSubexpressionEliminator
--------------------------------------
    100% (0.001 s)

After

PERFORMANCE METRICS FOR PROFILED SCOPES

| Time % | Time       | Calls   | Scope                          |
|-------:|-----------:|--------:|--------------------------------|
|   0.0% |    0.000 s |      10 | ForLoopConditionOutOfBody      |
|   0.0% |    0.000 s |       4 | ForLoopInitRewriter            |
|   0.2% |    0.000 s |      20 | BlockFlattener                 |
|   0.2% |    0.000 s |       4 | SSAReverser                    |
|   0.2% |    0.000 s |       6 | ForLoopConditionIntoBody       |
|   0.2% |    0.000 s |       2 | UnusedFunctionParameterPruner  |
|   0.2% |    0.000 s |      11 | StructuralSimplifier           |
|   0.2% |    0.000 s |       2 | ExpressionInliner              |
|   0.3% |    0.000 s |       2 | VarDeclInitializer             |
|   0.3% |    0.000 s |      12 | ControlFlowSimplifier          |
|   0.3% |    0.000 s |       2 | FunctionSpecializer            |
|   0.4% |    0.000 s |      14 | FunctionGrouper                |
|   0.5% |    0.000 s |       8 | ExpressionJoiner               |
|   0.5% |    0.000 s |       6 | Rematerialiser                 |
|   0.5% |    0.000 s |       4 | FunctionHoister                |
|   0.7% |    0.000 s |      18 | CircularReferencesPruner       |
|   1.6% |    0.000 s |       7 | ConditionalSimplifier          |
|   1.8% |    0.000 s |       8 | EquivalentFunctionCombiner     |
|   1.9% |    0.000 s |       9 | ConditionalUnsimplifier        |
|   2.1% |    0.000 s |       2 | EqualStoreEliminator           |
|   2.5% |    0.000 s |       8 | LoopInvariantCodeMotion        |
|   2.5% |    0.000 s |       8 | SSATransform                   |
|   2.7% |    0.000 s |       6 | ExpressionSplitter             |
|   3.0% |    0.000 s |       8 | DeadCodeEliminator             |
|   3.3% |    0.000 s |       6 | FullInliner                    |
|   3.3% |    0.000 s |       2 | UnusedStoreEliminator          |
|   4.1% |    0.000 s |      19 | LiteralRematerialiser          |
|   4.9% |    0.000 s |      10 | UnusedAssignEliminator         |
|   5.9% |    0.000 s |       6 | LoadResolver                   |
|   9.0% |    0.000 s |      20 | UnusedPruner                   |
|  14.7% |    0.000 s |      23 | CommonSubexpressionEliminator  |
|  32.2% |    0.001 s |       9 | ExpressionSimplifier           |
| 100.0% |    0.002 s |     276 | **TOTAL**                      |

@cameel
Copy link
Member Author

cameel commented Oct 11, 2024

So, as can be seen in run 36438, the code does compile on all platforms.

The three CLI test jobs fail, but that was already the case before due to the extra output that the profiler prints unconditionally. Perhaps we should add a proper flag for it, but in this PR I'm just leaving it as is.

@cameel cameel marked this pull request as ready for review October 11, 2024 17:16
libsolutil/Profiler.h Outdated Show resolved Hide resolved
libsolutil/Profiler.cpp Outdated Show resolved Hide resolved
libsolutil/Profiler.h Outdated Show resolved Hide resolved
libsolutil/Profiler.h Outdated Show resolved Hide resolved
libsolutil/Profiler.h Outdated Show resolved Hide resolved
@cameel cameel merged commit 7264b47 into develop Oct 15, 2024
72 of 73 checks passed
@cameel cameel deleted the profiler-probe branch October 15, 2024 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants