-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First try on benchmark CI #315
Conversation
For experimentation reasons we can make a |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## benchx #315 +/- ##
=======================================
Coverage 97.26% 97.26%
=======================================
Files 115 115
Lines 6795 6795
=======================================
Hits 6609 6609
Misses 186 186 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for taking this on! The general setup seems okay to me, I didn't review the benchmark contents in detail because that's not the point of this PR: let's first get it working
suite = BenchmarkGroup() | ||
include("core.jl") | ||
serialbenchmarks = [ | ||
"serial/core.jl", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use joinpath
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah. with benchdir = dirname(@__FILE__)
and the joinpath(benchdir, ...)
. yeah that's the way it should be.
benchmark/tune.json
Outdated
@@ -0,0 +1 @@ | |||
[{"Julia":"1.9.4","BenchmarkTools":"1.0.0"},[["BenchmarkGroup",{"data":{"centrality":["BenchmarkGroup",{"data":{"graphs":["BenchmarkGroup",{"data":{"closeness_centrality":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"betweenness_centrality":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"katz_centrality":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"degree_centrality":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":15,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}],"digraphs":["BenchmarkGroup",{"data":{"closeness_centrality":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"betweenness_centrality":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"katz_centrality":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"degree_centrality":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":10,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"pagerank":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}]},"tags":[]}],"parallel":["BenchmarkGroup",{"data":{"egonet":["BenchmarkGroup",{"data":{"vertexfunction":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"twohop":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}]},"tags":[]}],"core":["BenchmarkGroup",{"data":{"nv":["BenchmarkGroup",{"data":{"graphs":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":723,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"digraphs":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":784,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}],"edges":["BenchmarkGroup",{"data":{"graphs":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":7,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"digraphs":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":8,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}],"has_edge":["BenchmarkGroup",{"data":{"graphs":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":7,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"digraphs":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":8,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}]},"tags":[]}],"connectivity":["BenchmarkGroup",{"data":{"graphs":["BenchmarkGroup",{"data":{"connected_components":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}],"digraphs":["BenchmarkGroup",{"data":{"strongly_connected_components":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}]},"tags":[]}],"insertions":["BenchmarkGroup",{"data":{"SG(n,e) Generation":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}],"traversals":["BenchmarkGroup",{"data":{"graphs":["BenchmarkGroup",{"data":{"bfs_tree":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"dfs_tree":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}],"digraphs":["BenchmarkGroup",{"data":{"bfs_tree":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"dfs_tree":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}]},"tags":[]}],"serial":["BenchmarkGroup",{"data":{"egonet":["BenchmarkGroup",{"data":{"vertexfunction":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"twohop":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}]},"tags":[]}],"edges":["BenchmarkGroup",{"data":{"fillp":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"fille":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"tsume":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"tsump":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}]},"tags":[]}]]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Why is described on the Caching Parameters docs of BenchmarkTools.jl. Basically we win in twofold:
- tuning is slow and it doesn't have to happen everytime when benchmark.
- better consistency as the same evaluations/samples are being done in every benchmark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we find a way to automate the re-generation of the json every time the suite object is updated? Otherwise people will forget
"complete100" => complete_digraph(100), "path500" => path_digraph(500) | ||
) | ||
|
||
GRAPHS = Dict{String,Graph}( | ||
const GRAPHS = Dict{String,Graph}( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC benchmarking is always done on all of these, and the total time is reported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The benchmark results are accumulated across all Graphs
and DiGraphs
. I find it too granular to get benchmarks for each one of the Graphs. For me, It's better to add them up.
Now the const
is not really needed I think, since the Graph
is always interpolated in the benchmark, but it also doesn't harm.
Agreed
It would be useful to track the performance of the main branch over longer periods of time. For pull requests however we don't really care. Where are benchmarks for Julia itself stored for example?
Can we base this on a GitHub label, like
Knock yourself out! |
I'd be curious to have @simonschoelly's opinion on the general approach |
I believe that is technically possible. Maybe the label should be more like |
The PR is failing because the benchmark baseline (i.e. the master branch) doesn't follow the requirements. I believe this is okey, and I wouldn't want to introduce complicated if statements just for this first PR. From the second on, once the master branch is properly structured, there should be no problem. |
In a separate github repo |
See JuliaGraphs/JuliaGraphs-meta#12 for the roadmap |
(WIP) TODO: store results and PR label triggering
I merged on the branch with the following TODOs pending before pushing to master:
|
* First try on benchmark CI (#315) (WIP) TODO: store results and PR label triggering * Test consecutive PR (#346) Merge all new advancements on the `benchx` branch in order to do a follow up consecutive PR to test github actions * Trivial change, test PR (#347) * Trivial change, test PR * Add PR write permission * Test full permissions * Granular permissions * try write-all * try label event * labeled in pull_request * Store results as artifact and trigger a consecutive workflow_run * Workflow chain almost complete (WIP) * correct .benchmarkci path * Add benchx support branch * Just try pull_request_target instead * Remove target * Get rid of s to get the chain going --------- Co-authored-by: Guillaume Dalle <22795598+gdalle@users.noreply.github.com> * Simplify for master merge * Fixed formatting * Integrate review comments --------- Co-authored-by: Guillaume Dalle <22795598+gdalle@users.noreply.github.com>
#310
Options
As mentioned in the discourse thread https://discourse.julialang.org/t/recommendation-for-ci-benchmark-to-catch-regression/104698 the available options to have CI benchmarks are
Implementation
In this approach PkgBenchmark.jl and BenchmarkCI.jl are used.
Benchmarks
Regarding the benchmarks in
./benchmark/*
directory I did nothing but translating them to a compatible formulation.This means that the
SUITE:BenchmarkGroup
must be defined in./benchmark/benchmark.jl
All options above need the same form (with minor changes for
benchmark-action/github-action-benchmark
).This means that we can continue authoring more benchmarks with the security that the code is reusable even if in the CI approach changes in the future.
CI
This is described in the single file
/.github/workflows/benchmark.yaml
There are 2 paths: if the event is a
pull_request
or if it is apush
.If the event is a
pull_request
a comparison is made (usingBenchmarkCI.judge
) between the base and the target refs.This will also post a summary in the PR as a comment.
The base and target benchmarks are uploaded as artifacts in the workflow.
If the event is a
push
(or a merge) the benchmarks from the commit are run (usingPkgBenchmark.benchmarkpkg
).A
median
estimation of the benchmarks is uploaded as an artifact in the workflow.Further discussion
Strategy
Benchmarks are just running on
ubuntu-latest
andjulia-v1
.I don't see any reason to populate the workflow with more data.
Artifacts uploads
Artifacts uploads remain available only for 90 days.
After that, the only thing that will stay are the comment-posts in the PRs.
If we want to store all benchmarks we can discuss on some alternatives.
I would like that to be part of the repo, since it's way easier to manage, but it's also completely useless for the users to get.
Permissions and approval
Right now the PR benchmark is run everytime a new update of the branch is pushed.
That might not be necessary.
I think it would be better to have some kind of a manual trigger to start the machinery when other more obvious issues are already resolved.
Also, it is insecure to allow everyone to run or modify the workflow, but I think that's already solved in the organization level.
Single and multithread
Right now both single and multithread benchmarking code is given 2 threads.
We can seperate that.
Example
See how a PR can look
filchristou#3
which triggers the workflow
https://github.com/filchristou/Graphs.jl/actions/runs/6889672830
After a PR is merged, a benchmark will run again
https://github.com/filchristou/Graphs.jl/actions/runs/6889854479
Overall I cannot promise it will work perfectly directly, since I was testing stuff in the fork and not on the main or protected branch. Also, I am not very familiar with the Graphs.jl way of things, so I guess we can tune several preferences in the future.