Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PGO: update the existing benchmarks workflow to enable PGO builds #13884

Merged
merged 55 commits into from
Oct 3, 2024

Conversation

1pkg
Copy link
Member

@1pkg 1pkg commented Aug 15, 2024

Motivation/summary

This PR implements changes outlined in #13859. It updates the existing benchmarks workflow to run standalone APM Server instance that produces a relevant CPU profile for PGO, then it copies, uploads and injects the obtained CPU profile into a PR, see example.

Benchmarks

The existing benchmarks results turned to be too unreliable to base PGO on. Because of the underlying dependency on ElasticSearch the difference in the throughput results could go above 10% from a workflow to workflow. The table below provides a view with the existing benchmarks results sample.

image

This all renders incremental PGO performance gains hard to observe and measure. Therefore, in this PR a new benchmark mode is introduced, which swaps ElasticSearch with a stubbed API http server (Moxy). Thus allowing us to better isolate and elevate APM Server performance component inside the benchmarks. The table below provides a view with the new isolated benchmarks results sample.

image

Using the benchmarks result sample data we can clearly observe that the results deviation for the new benchmark mode is in an order of magnitude lower in comparison to the existing ES based benchmarks. And now PGO performance improvements could be reliably observed.

The standalone APM Server benchmarks mode consists of running 3 separate EC2 instances in a VPC for apmbench, apm-server and moxy. Existing benchmark_executor and standalone_apm_server terraform modules are reused and a similar new terraform module moxy is created.

Results

PGO enabled builds show 5% performance gain on average across the standalone APM Server benchmarks workflow.

Checklist

For functional changes, consider:

  • Is it observable through the addition of either logging or metrics?
  • Is its use being published in telemetry to enable product improvement?
  • Have system tests been added to avoid regression?

How to test these changes

To observe and validate the changes please refer to the indexed PGO benchmarks results.

Related issues

#13859

@1pkg 1pkg self-assigned this Aug 15, 2024
Copy link
Contributor

mergify bot commented Aug 15, 2024

This pull request does not have a backport label. Could you fix it @1pkg? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-7.17 is the label to automatically backport to the 7.17 branch.
  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Aug 15, 2024
@1pkg 1pkg force-pushed the inject-build-pgo-profile branch 2 times, most recently from 576dbb8 to b710c57 Compare August 22, 2024 21:21
@1pkg 1pkg force-pushed the inject-build-pgo-profile branch 7 times, most recently from 1727304 to a6abd96 Compare August 22, 2024 21:56
@v1v v1v added the backport-8.x Automated backport to the 8.x branch with mergify label Sep 10, 2024
@mergify mergify bot removed the backport-skip Skip notification from the automated backport with mergify label Sep 10, 2024
@1pkg 1pkg requested a review from v1v October 1, 2024 22:42
@1pkg
Copy link
Member Author

1pkg commented Oct 1, 2024

Final results after feedback from @v1v to properly set github access token.

The PGO standalone benchmark workflow run link -> resulted in the next PGO update PR link.

While the old benchmark against ES cloud works as expected without regression link.

kruskall
kruskall previously approved these changes Oct 2, 2024
v1v
v1v previously approved these changes Oct 2, 2024
axw
axw previously approved these changes Oct 2, 2024
Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work - thank you for all the cleanups along the way!

testing/benchmark/variables.tf Outdated Show resolved Hide resolved
testing/benchmark/outputs.tf Outdated Show resolved Hide resolved
testing/benchmark/main.tf Outdated Show resolved Hide resolved
testing/benchmark/Makefile Outdated Show resolved Hide resolved
.github/workflows/benchmarks.yml Show resolved Hide resolved
Comment on lines 228 to 235
- name: Open PGO PR
if: ${{ env.RUN_STANDALONE == 'true' && github.ref == 'refs/heads/main' }}
run: make push-pgo-pr
env:
WORKSPACE_PATH: ${{ github.workspace }}
PROFILE_PATH: ${{ env.WORKING_DIRECTORY }}/${{ env.BENCHMARK_CPU_OUT }}
GITHUB_TOKEN: ${{ steps.get_token.outputs.token }}
WORKFLOW: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}/attempts/${{ github.run_attempt }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if instead of creating a new PR on every benchmark run, could we just push a commit to the branch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's slightly risky to enable auto pushes to main branch right away, I'd prefer to start with more controlled PR based approach so we develop the confidence that this pipeline works well. Afterwards we can simplify and enable the direct merge.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added a small update to the push-pgo-pr script so it enables auto merging too for PRs. This way we will only need to give it 1 approval and the pipeline tests need to pass.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.

@1pkg 1pkg dismissed stale reviews from kruskall, v1v, and axw via 77af1fa October 2, 2024 17:30
@1pkg 1pkg requested review from axw and v1v October 2, 2024 17:47
@1pkg 1pkg force-pushed the inject-build-pgo-profile branch 2 times, most recently from c3f4efd to 0d68b85 Compare October 2, 2024 18:24
@1pkg 1pkg requested a review from kruskall October 2, 2024 18:54
Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@1pkg 1pkg merged commit 5af8cf4 into main Oct 3, 2024
16 checks passed
@1pkg 1pkg deleted the inject-build-pgo-profile branch October 3, 2024 02:19
mergify bot pushed a commit that referenced this pull request Oct 3, 2024
Add a benchmark workflow mode with automation to collect, preserve, and inject CPU profiles, enabling PGO builds.

The new workflow will run on a schedule and raise a special pull request that includes the most recent representative CPU profile, which will be inserted as the `default.pgo` file into the main package and automatically used in the build pipeline. The actual schedule and the model for raising pull requests with updated profiles are subject to further revisions. This new workflow mode uses a lightweight output destination - a mock proxy (Moxy) from apm-perf to better isolate the performance component of the APM Server.

(cherry picked from commit 5af8cf4)
mergify bot added a commit that referenced this pull request Oct 11, 2024
…) (#14245)

Add a benchmark workflow mode with automation to collect, preserve, and inject CPU profiles, enabling PGO builds.

The new workflow will run on a schedule and raise a special pull request that includes the most recent representative CPU profile, which will be inserted as the `default.pgo` file into the main package and automatically used in the build pipeline. The actual schedule and the model for raising pull requests with updated profiles are subject to further revisions. This new workflow mode uses a lightweight output destination - a mock proxy (Moxy) from apm-perf to better isolate the performance component of the APM Server.

(cherry picked from commit 5af8cf4)

Co-authored-by: Kostiantyn Masliuk <1pkg@protonmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
@1pkg 1pkg mentioned this pull request Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants