Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

20-30x slowdown in GC for a long running program #50115

Open
ptiede opened this issue Jun 9, 2023 · 3 comments
Open

20-30x slowdown in GC for a long running program #50115

ptiede opened this issue Jun 9, 2023 · 3 comments
Labels
GC Garbage collector performance Must go faster

Comments

@ptiede
Copy link

ptiede commented Jun 9, 2023

I originally posted this in the julia slack and was told I should open an issue about this.
I am running some MCMC analysis that requires me to evaluate a function and its derivative millions of times, and I am seeing
a massive performance hit (20-30x) after a few hours, even though nothing about the function has changed.

At the beginning of the julia session, I get the following benchmark results

julia> @benchmark Zygote.gradient($ℓ ,$x)
BenchmarkTools.Trial: 4202 samples with 1 evaluation.
 Range (min  max):  820.410 μs    4.862 ms  ┊ GC (min  max):  0.00%  76.75%
 Time  (median):     880.962 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):     1.186 ms ± 964.736 μs  ┊ GC (mean ± σ):  24.69% ± 21.94%

  ██▃▁                                                    ▁▃▂▂  ▁
  ████▅▅▄▁▁▁▁▁▃▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃█████ █
  820 μs        Histogram: log(frequency) by time       4.39 ms <

 Memory estimate: 3.91 MiB, allocs estimate: 1256.

After evaluating the same function for a few hours, I get the following:

julia> @benchmark Zygote.gradient($ℓ ,$x)
BenchmarkTools.Trial: 202 samples with 1 evaluation.
 Range (min … max):  926.079 μs … 287.831 ms  ┊ GC (min … max):  0.00% … 99.40%
 Time  (median):     971.196 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):    26.149 ms ±  80.649 ms  ┊ GC (mean ± σ):  96.03% ± 28.41%

  █                                                              
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▇ ▅
  926 μs        Histogram: log(frequency) by time        287 ms <

 Memory estimate: 3.91 MiB, allocs estimate: 1256.

The major difference is the GC time which is 99% of the runtime. The function does allocate and has some type instabilities due to Zygote, but there is no internal state that I am modifying (e.g., appending to an array). Additionally, I am not out of RAM, and my RAM usage sits around 20% the entire time. I should also note that the GC doesn't seem to be monotonically increasing; at some point, something happens, and suddenly, the GC pause drastically increases, and it never gets better.

Chatting with @oscardssmith, I found out that the GC is locked to full sweep, and every time the GC pauses, I see something like

GC: pause 312.46ms. collected 6.487096MB. full 

even if I trigger it with GC.gc(false). I don't have an MWE yet since the code is pretty lengthy, and it takes a few hours on my machine to trigger it, but I'll continue trying to find something that reproduces this. If it is useful, I can set up a repo with the code I am using to trigger this, but it is pretty large.

Version Info

Julia Version 1.9.0
Commit 8e630552924 (2023-05-07 11:25 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × AMD Ryzen 9 7950X 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver3)
  Threads: 1 on 32 virtual cores

The project is

Status `~/Dropbox (Smithsonian External)/InnerShadowMSRI/Project.toml`
GC: pause 297.28ms. collected 43.146137MB. full 
  [336ed68f] CSV v0.10.11
  [13f3f980] CairoMakie v0.10.5
  [99d987ce] Comrade v0.8.0 `~/.julia/dev/Comrade`
  [a4336a5c] ComradeAHMC v0.3.1 `~/.julia/dev/Comrade/lib/ComradeAHMC`
  [26988f03] ComradeOptimization v0.1.3 `https://github.com/ptiede/Comrade.jl.git:lib/ComradeOptimization#main`
  [992eb4ea] CondaPkg v0.2.18
  [a93c6f00] DataFrames v1.5.0
  [31c24e10] Distributions v0.25.95
  [ced4e74d] DistributionsAD v0.6.47
⌅ [7da242da] Enzyme v0.11.0
  [033835bb] JLD2 v0.4.31
  [996a588d] LogDensityProblemsAD v1.5.0
  [872c559c] NNlib v0.8.20
  [d9ec5142] NamedTupleTools v0.14.3
  [3e6eede4] OptimizationBBO v0.1.4
  [4e6fcdb7] OptimizationNLopt v0.1.5
  [36348300] OptimizationOptimJL v0.1.8
  [43a3c2be] PairPlots v1.1.1
  [91a5bcdd] Plots v1.38.15
⌃ [3d61700d] Pyehtim v0.1.1
⌅ [2913bbd2] StatsBase v0.33.21
  [f3b207a7] StatsPlots v0.15.5
  [9d95f2ec] TypedTables v1.4.3
  [b1ba175b] VLBIImagePriors v0.2.1 `~/.julia/dev/VLBIImagePriors`
  [e88e6eb3] Zygote v0.6.62
Info Packages marked with ⌃ and ⌅ have new versions available, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated`
@oscardssmith oscardssmith added performance Must go faster GC Garbage collector labels Jun 9, 2023
@d-netto
Copy link
Member

d-netto commented Jun 9, 2023

Is this reproducible on 1.8?

@ptiede
Copy link
Author

ptiede commented Jun 9, 2023

I am checking that now, but I also noticed another strange thing. When I just evaluate the function

function test(l, x)
    for i in 1:10_000
             arrs = map(_->Zygote.gradient(l, x), 1:1_000)
             arrs = nothing
    end 
end

my memory monotonically increases very quickly and I trigger a OOM on 1.9. When I enable GC logging I typically see something like

GC: pause 2.67ms. collected 48.309920MB. incr 

but as far as I can tell I never see a full sweep triggered.

For 1.8 when I run the exact same program my memory definitely doesn't increase as fast and I the GC logging typically reports

GC: pause 10.51ms. collected 799.020600MB. incr 

and every once in a while, I get a feel sweep triggered

GC: pause 102.53ms. collected 1338.146312MB. full 

I am not sure if what I am seeing is related to the original issue but something changed between 1.8 and 1.9 for me.

@KristofferC
Copy link
Member

Looks a bit like #49545 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector performance Must go faster
Projects
None yet
Development

No branches or pull requests

4 participants