-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC/WIP] Tools for measuring cycles and cpu_times and tricking out LLVM #92
base: master
Are you sure you want to change the base?
Conversation
end | ||
|
||
""" | ||
getProcessTime() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a very Julian name, both in that it starts with "get" and that it's in camel case. I'd just call it processtime()
. Likewise for getThreadTime
, I'd call that threadtime()
.
Cycles spent is an extremely relevant metric in itself, often far more relevant than times. So I'd say, measure and report both, as well as the implied measured frequency. Converting cycles to nanoseconds is bad; if any conversion makes sense, then it is nanoseconds -> cycles. By reporting measured frequency, the user is also empowered to spot problems like frequency drop due to AVX2, etc (some CPUs scale down frequency when some vector instructions are used). |
Do you know of anyway to measure cycles in a platform portable way (e.g.) something that works for ARM and PPC? Originally I went forward with #94 since cputime is an important measure as well (how much time did we actually spent in a program and not sleeping/in the kernel). Anyway I won't have time to work on either, so I would happy if someone could pick this up and bring it to conclusion. |
So one of the things that has me come back to this PR is that https://perf.rust-lang.org/ defaults to But maybe the better pathway is to use LinuxPerf.jl to build that infrastructure. |
I recently started exploring options for more precise and low-level benchmarking tools.
As it is this PR is notready to be included in
BenchmarkTools
, but should provide a starting point for discussions.clobber()
andescape()
Two methods to prevent certain compiler optimisations on the LLVM level. (see https://youtu.be/nXaxk27zwlk?t=2441)
clobber()
is a memory barrier that forces the compiler to flush all writes to memory andescape
is an method to preventLLVM from optimising a value away since we are faking a store of it.
escape()
is not quite done since it can't handel boxed valuesand it would be easier to write if we could depend on LLVM.jl
bench_start()
andbench_end()
Inspired by https://github.com/dterei/gotsc and https://www.intel.com/content/www/us/en/embedded/training/ia-32-ia-64-benchmark-code-execution-paper.html
Since CPUs can do speculative execution reordering and a bunch of other shenanigans this is a very careful series of instructions that tries to prevent as much of that
as possible and thus should give a as precise as possible estimate of the number of cycles it takes for a block of code to run. These instructions are not completely noise free
since we still are running in user-space and the current implementation is x86_64 only (and requires a series of processor features). It is also tricky to convert cycles
to time spend. If we use this method it should be opt-in and we need to method variance and overhead.
getProcessTime()
andgetThreadTime()
I got curious and looked into what google/benchmark is using for time measurement and it turns out they actual measure two things.
run time and cpu time, where the latter is the time that a process is actually spend being run. The current implementation is Linux only but can get extended to to all platforms we
care about. For runtime measurement they uses http://en.cppreference.com/w/cpp/chrono/high_resolution_clock. Currently we are using
uv_hrtime
fromlibuv
.Both
uv_hrtime
and the c++ timer will under Unix fall back toclock_gettime(CLOCK_MONOTONIC, ...)
similar to my implementation ofgetProcessTime
.What should we do?
I think taking a lead from google/benchmark and also measuring CPU time vs just runtime would be a first good actionable item. I am much
less sure about what to do with
1.
and2.
and if they are useful forBenchmarkTools.jl
, that needs further evaluation and for that I currently don't have time.