-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move compile timing to inside @time
's main timing block. Fixes >100% compilation time reports
#41286
Conversation
base/timing.jl
Outdated
@@ -207,6 +207,12 @@ macro time(ex) | |||
local stats = gc_num() | |||
local compile_elapsedtime = cumulative_compile_time_ns_before() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the order of this and the next line should be swapped, since compile time should be a subset of total time. (same at the end of the block)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree but I didn't do that originally because adding the compile time lookups would add a small amount of time to the time report, and would make small duration things look slower. Maybe it's insignificant and people should be using @btime
for very fast things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made this change. It does seem like the right thing to do and the overhead seems too small to register
Master
julia> @time rand();
0.000000 seconds
This PR
julia> @time rand();
0.000000 seconds
base/timing.jl
Outdated
@@ -207,6 +207,12 @@ macro time(ex) | |||
local stats = gc_num() | |||
local compile_elapsedtime = cumulative_compile_time_ns_before() | |||
local elapsedtime = time_ns() | |||
## ensure time samplers are compiled | |||
compile_elapsedtime = cumulative_compile_time_ns_after() - compile_elapsedtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be enough just to call cumulative_compile_time_ns_after
? Looks like that is the only thing that might not have been compiled yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. That does fix it.
I went with precompile
instead of just calling them because cumulative_compile_time_ns_after
disables timing on the thread, which I worried could complicate nested @time
's
This is after the latest commit
julia> @time map(x -> 2x, 1:3);
0.054205 seconds (48.70 k allocations: 2.691 MiB, 99.65% compilation time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the other ordering, is precompiling even necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I just tested and it's not. Removed!
ec31de2
to
1441e28
Compare
Here's an example of >100% compile time: julia> using Distributed
julia> split = Sys.free_memory() / ((1<<30) * Sys.CPU_THREADS) < 2
false
julia> add_per = split ? Sys.CPU_THREADS÷2 : Sys.CPU_THREADS
36
julia> addprocs(add_per, exeflags="--project=$(Base.active_project())");
julia> @everywhere begin
@time using DiffEqFlux, StaticArrays, ProgressMeter, UnPack
using DiffEqFlux: initial_params, sciml_train
end
0.000295 seconds (1.37 k allocations: 107.141 KiB, 826.61% compilation time)
From worker 7: 0.000266 seconds (1.35 k allocations: 104.953 KiB, 938.51% compilation time)
From worker 17: 0.000272 seconds (1.35 k allocations: 104.953 KiB, 913.14% compilation time)
From worker 19: 0.000262 seconds (1.35 k allocations: 104.953 KiB, 936.60% compilation time)
From worker 24: 0.000261 seconds (1.35 k allocations: 104.953 KiB, 941.09% compilation time)
From worker 26: 0.000255 seconds (1.35 k allocations: 104.953 KiB, 962.57% compilation time)
From worker 9: 0.000258 seconds (1.35 k allocations: 104.953 KiB, 1004.29% compilation time)
From worker 29: 0.000255 seconds (1.35 k allocations: 104.953 KiB, 966.03% compilation time)
From worker 31: 0.000265 seconds (1.35 k allocations: 104.953 KiB, 930.41% compilation time)
From worker 23: 0.000265 seconds (1.35 k allocations: 104.953 KiB, 929.11% compilation time)
From worker 15: 0.000262 seconds (1.35 k allocations: 104.953 KiB, 947.68% compilation time)
From worker 14: 0.000262 seconds (1.35 k allocations: 104.953 KiB, 951.89% compilation time)
From worker 22: 0.000264 seconds (1.35 k allocations: 104.953 KiB, 938.09% compilation time)
From worker 13: 0.000266 seconds (1.35 k allocations: 104.953 KiB, 935.63% compilation time)
From worker 8: 0.000241 seconds (1.35 k allocations: 104.953 KiB, 1043.94% compilation time)
From worker 21: 0.000240 seconds (1.35 k allocations: 104.953 KiB, 1040.41% compilation time)
From worker 10: 0.000232 seconds (1.35 k allocations: 104.953 KiB, 1073.77% compilation time)
From worker 2: 0.000252 seconds (1.35 k allocations: 104.953 KiB, 1023.88% compilation time)
From worker 3: 0.000258 seconds (1.35 k allocations: 104.953 KiB, 990.59% compilation time)
From worker 25: 0.000251 seconds (1.35 k allocations: 104.953 KiB, 1014.71% compilation time)
From worker 30: 0.000246 seconds (1.35 k allocations: 104.953 KiB, 1032.69% compilation time)
From worker 5: 0.000272 seconds (1.35 k allocations: 104.953 KiB, 914.95% compilation time)
From worker 20: 0.000263 seconds (1.35 k allocations: 104.953 KiB, 968.01% compilation time)
From worker 11: 0.000261 seconds (1.35 k allocations: 104.953 KiB, 956.71% compilation time)
From worker 18: 0.000262 seconds (1.35 k allocations: 104.953 KiB, 953.40% compilation time)
From worker 4: 0.000272 seconds (1.35 k allocations: 104.953 KiB, 914.89% compilation time)
From worker 33: 0.000241 seconds (1.35 k allocations: 104.953 KiB, 1043.96% compilation time)
From worker 27: 0.000244 seconds (1.35 k allocations: 104.953 KiB, 1075.19% compilation time)
From worker 34: 0.000239 seconds (1.35 k allocations: 104.953 KiB, 1072.98% compilation time)
From worker 12: 0.000260 seconds (1.35 k allocations: 104.953 KiB, 984.52% compilation time)
From worker 32: 0.000248 seconds (1.35 k allocations: 104.953 KiB, 1015.97% compilation time)
From worker 36: 0.000234 seconds (1.35 k allocations: 104.953 KiB, 1133.23% compilation time)
From worker 28: 0.000228 seconds (1.35 k allocations: 104.953 KiB, 1149.49% compilation time)
From worker 35: 0.000234 seconds (1.35 k allocations: 104.953 KiB, 1149.36% compilation time)
From worker 16: 0.000255 seconds (1.35 k allocations: 104.953 KiB, 1025.12% compilation time)
From worker 6: 0.000249 seconds (1.35 k allocations: 104.953 KiB, 1031.25% compilation time)
From worker 37: 0.000243 seconds (1.35 k allocations: 104.953 KiB, 1058.20% compilation time)
julia> versioninfo()
Julia Version 1.6.2-pre.2
Commit ff1827d117* (2021-05-02 02:37 UTC)
Platform Info:
OS: Linux (x86_64-generic-linux)
CPU: Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, cascadelake)
Environment:
JULIA_NUM_THREADS = 36 |
1441e28
to
6da3aac
Compare
@chriselrod that seems like a different issue, perhaps Distributed-related. Could different processes be using the same thread-specific compile timers? I didn't think so, but I don't have another explanation. This PR is more about the slightly > 100% reports |
@JeffBezanson I think this is RTM? |
Bump |
@time
's inner timing operations are compiled. Fixes >100% compilation time reports@time
's main timing block. Fixes >100% compilation time reports
Fixes #41281
It seems like the case in #41281 that results in
@time
reporting >100% compilation time is fixed by pre-running the timing sampling lines within@time
to ensure they're compiled(?)Perhaps instead of this PR, I wondered if there's some type instability/invalidation going on that could be fixed, given that
@time 1+1
is already baked into the sysimage?Or, perhaps this PR is a failsafe solution?
Master
This PR
If anyone has any other
@time
calls that result in >100% I'd happily test them with this