Add bit to the GC tag to turn GC in image search O(1) and also increase gc interval when encountering many pointers #49185

gbaraldi · 2023-03-29T21:09:35Z

OmniPackage is broken currently so I can't really compare to current master.

vtjnash

LGTM

vchuravy · 2023-03-29T22:11:37Z

That was way less work than I expected. In the future we should add convert the bit to a "permanent" generation. That might be beneficial for things like Symbols?

gbaraldi · 2023-03-29T22:13:38Z

I churned quite a on trying to add a bit to the gc field and then me and jameson just decided it wasn't worth it.
But yeah maybe we could set that bit for other permalloced things and try to do smarter things

vchuravy · 2023-03-29T22:14:48Z

@nanosoldier runtests(configuration = (julia_flags=["--pkgimages=yes"], buildflags=["LLVM_ASSERTIONS=1", "FORCE_ASSERTIONS=1"],), vs_configuration = (julia_flags=["--pkgimages=yes"], buildflags = ["LLVM_ASSERTIONS=1", "FORCE_ASSERTIONS=1"],))

d-netto

Would be nice to do some cleanup to delete the Eytzinger tree functions and also benchmark it on a OmniPackage-like workload.

Overall, SGTM.

gbaraldi · 2023-03-29T22:52:07Z

The eytzinger tree functions are still used in staticdata, but I guess I can move them there,

d-netto · 2023-03-29T23:52:47Z

On a slightly tweaked version of OmniPackage:

cdc37ce

julia> @time include("src/OmniPackage.jl")
 33.007272 seconds (54.86 M allocations: 2.926 GiB, 16.88% gc time, 13.39% compilation time: 81% of which was recompilation)
Main.OmniPackage

d586b0c

julia> @time include("src/OmniPackage.jl")
 32.287114 seconds (54.80 M allocations: 2.935 GiB, 14.86% gc time, 13.74% compilation time: 80% of which was recompilation)
Main.OmniPackage

d-netto · 2023-03-29T23:53:37Z

diff --git a/src/OmniPackage.jl b/src/OmniPackage.jl
index f671cb6..9b4f8fd 100644
--- a/src/OmniPackage.jl
+++ b/src/OmniPackage.jl
@@ -20,7 +20,7 @@ using AbstractMCMC,
   FFTW,
   FillArrays,
   FiniteDiff,
-  Flux,
+  # Flux,
   ForwardDiff,
   GLM,
   GlobalSensitivity,
@@ -36,13 +36,13 @@ using AbstractMCMC,
   Loess,
   MacroTools,
   Markdown,
-  Makie,
+  # Makie,
   MCMCChains,
   MCMCDiagnosticTools,
   MLJ,
   MLJBase,
-  MLJFlux,
-  MLJXGBoostInterface,
+  # MLJFlux,
+  # MLJXGBoostInterface,
   MacroTools,
   Missings,
   ModelingToolkit,

d-netto · 2023-03-29T23:59:18Z

(These packages were causing some version incompatibility on my machine, so just commenting out for now).

JeffBezanson · 2023-03-30T00:53:31Z

That's...disappointing?

vchuravy · 2023-03-30T00:56:22Z

That's...disappointing?

I think it means that the binary tree search recovered most of the performance, but I like this approach better :)

gbaraldi · 2023-03-30T01:18:33Z

It might be interesting to see where we are spending the time now, because it seems GC is now just a component instead of the main thing.

oscardssmith · 2023-03-30T01:29:49Z

I think part of that might be that the modifications removed the GPU/ML part of the stack which is a lot of packages.

pchintalapudi · 2023-03-30T04:02:31Z

That's...disappointing?

So I suspect (with no evidence) that the reason for this is cache misses. Using a bit inside the object requires a load from the pointer to check the bit, which can thrash the cache with lots of objects that are scattered throughout memory. On the other hand, the eytzinger tree uses a relatively small side table and an icache and dcache-friendly layout and compares only on the pointer's integer representation, which doesn't load the value and therefore won't thrash the cache in the same way.

If this is the case, a more bits-friendly benchmark that will make the eytzinger tree worse is one where there are many super-tiny packages with one object each (since the eytzinger tree will store 2 pointers per package, this makes the side table less cache friendly). Inversely, a more tree-friendly benchmark would be where there's just a giant sysimg with many many objects (since the eytzinger tree won't kill the cache and the bits will need an extra load from every object).

nanosoldier · 2023-03-30T10:08:48Z

Your job failed. Consult the server logs for more details (cc @maleadt).

KristofferC · 2023-03-30T11:53:26Z

With full Omnipackage:

# 1.8.5
julia> @time include("src/OmniPackage.jl")
 97.392319 seconds (405.39 M allocations: 23.826 GiB, 4.93% gc time, 8.59% compilation time: 58% of which was recompilation)

# 1.9.0-rc1 with binary search
julia>: @time include("src/OmniPackage.jl")
 50.408919 seconds (67.12 M allocations: 4.045 GiB, 3.31% gc time, 3.76% compilation time: 64% of which was recompilation)

# 1.9.0-rc1 with GC tag
julia> @time include("src/OmniPackage.jl")
 49.604748 seconds (67.59 M allocations: 4.092 GiB, 3.22% gc time, 3.76% compilation time: 64% of which was recompilation)

It's not really fair to compare the GC time on 1.8 vs 1.9 because the allocations on 1.9 are so much better, but only looking at the total time, 1.9 is doing pretty good.

gbaraldi · 2023-03-30T12:36:01Z

@pchintalapudi we do check the same pointer a couple of lines above so if it's the case that we are messing up the caches maybe we can be a little more cache friendly here, but since we don't check too too much between I kinda doubt that the object isn't in the cache. What might happen is that it might mess up with the branch predicition. I will play with vtune a bit and maybe it says something.

gbaraldi · 2023-03-30T14:57:15Z

While playing with this I saw a mark phase in the profiler that looked quite odd.

That big slab is gc_try_claim_and_push.
In particular the lines

     *nptr |= 1;
    if (gc_try_setmark_tag(o, GC_MARKED))

With the big one being *nptr |= 1;
The call above the stack for them is gc_mark_objarray which kinda makes sense, this is probably the mfills array from the code in the issue, but I wonder if all those dereferences aren't destroying the cache and if there's some way of doing this better.
It's kind of the worst case being about 25 million boxes.

d-netto · 2023-03-30T15:14:32Z

It wouldn't be surprising if we had a lot of cache misses coming from gc_try_claim_and_push, since we need to fetch the object tag when enqueuing it.

In the lines:

if (!gc_old(o->header) && nptr)
        *nptr |= 1;
if (gc_try_setmark_tag(o, GC_MARKED))
        gc_markqueue_push(mq, obj);

the o->header introduces a data dependency, so I think the profiling could be showing that we are just stalled in the subsequent instructions waiting for the line to be read from memory.

gbaraldi · 2023-03-30T15:20:44Z

Yeah, not sure if there's something we could do here? You played a bit with prefetching and other things so maybe? Also those numbers I was showing there, I was able to get better performance by tinkering a bit, but we then started doing very large collections which might increase the memory footprint as a whole. The current thing gets us to a similar place to 1.8 .

gbaraldi · 2023-03-30T15:25:06Z

@d-netto could I bother you to run the GC benchmarks on this vs master?

gbaraldi · 2023-03-30T17:50:55Z

@nanosoldier runbenchmarks(!"scalar", vs=":master")

gbaraldi · 2023-03-30T23:06:23Z

The only question I have for this is that the second GC.gc() call after running the test takes way too long, and it's all in marking. Not sure what could cause that behaviour. It's all in the mark_objarray call, but not sure why other versions don't do that

using Random: seed!
seed!(1)

abstract type Cell end

struct CellA<:Cell
    a::Int
end

struct CellB<:Cell
    b::String
end

function fillcells!(mc::Array{Cell})
    for ind in eachindex(mc)
        mc[ind] = ifelse(rand() > 0.5, CellA(ind), CellB(string(ind)))
    end
    return mc
end

mcells = Array{Cell}(undef, 4000, 4000 )
t1 = @elapsed fillcells!(mcells)
t2 = @elapsed fillcells!(mcells)

println("filling: $t1 s\nfilling again: $t2 s")

@time GC.gc()
@time GC.gc()

nanosoldier · 2023-03-30T23:39:46Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

gbaraldi · 2023-03-31T13:40:16Z

Just for reference, this also fixes the bad behaviour on #49120

topolarity · 2023-04-05T22:35:50Z

Did the second commit in this PR actually get reviewed, or was that tacked on at the end?

KristofferC · 2023-04-06T13:34:52Z

Did the second commit in this PR actually get reviewed, or was that tacked on at the end?

It was confirmed to fix #49120 by making sure that the behavior in #49120 (comment) was fixed.

Why, is something going wrong somewhere?

topolarity · 2023-04-06T23:53:01Z

Why, is something going wrong somewhere?

Nope, I was just curious whether it was included in the benchmarks that had already been run or not

oscar-system/Oscar.jl#2187 is a bit strange, but @gbaraldi's already investigating and the root cause isn't clear yet.

Add bit to the GC tag

d586b0c

vtjnash reviewed Mar 29, 2023

View reviewed changes

d-netto added GC Garbage collector pkgimage labels Mar 29, 2023

vchuravy requested a review from d-netto March 29, 2023 22:15

vchuravy added the backport 1.9 Change should be backported to release-1.9 label Mar 29, 2023

d-netto reviewed Mar 29, 2023

View reviewed changes

KristofferC mentioned this pull request Mar 30, 2023

Backports for 1.9.0-rc2 #48935

Merged

52 tasks

KristofferC mentioned this pull request Mar 30, 2023

GC: Large amounts of GC time spent scanning permanent pkgimage allocations #48923

Closed

gbaraldi force-pushed the image_bit branch from 549093c to d01dc3c Compare March 30, 2023 23:13

Fix interval for many pointers

450c1fa

gbaraldi force-pushed the image_bit branch from d01dc3c to b45c5ff Compare March 31, 2023 13:31

gbaraldi added the don't squash Don't squash merge label Mar 31, 2023

gbaraldi mentioned this pull request Mar 31, 2023

The new marking loop has a regression when marking arrays of pointers #49205

Closed

KristofferC added the merge me PR is reviewed. Merge when all tests are passing label Mar 31, 2023

gbaraldi changed the title ~~Add bit to the GC tag to turn GC in image search O(1)~~ Add bit to the GC tag to turn GC in image search O(1) and also increase gc interval when encountering many pointers Mar 31, 2023

gbaraldi force-pushed the image_bit branch from b45c5ff to 450c1fa Compare March 31, 2023 15:29

KristofferC merged commit 8b19f5f into JuliaLang:master Mar 31, 2023

oscardssmith removed the merge me PR is reviewed. Merge when all tests are passing label Mar 31, 2023

KristofferC removed the backport 1.9 Change should be backported to release-1.9 label Mar 31, 2023

fingolfin mentioned this pull request Apr 2, 2023

CI tests failing with Julia nightly / Ubuntu oscar-system/Oscar.jl#2187

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add bit to the GC tag to turn GC in image search O(1) and also increase gc interval when encountering many pointers #49185

Add bit to the GC tag to turn GC in image search O(1) and also increase gc interval when encountering many pointers #49185

gbaraldi commented Mar 29, 2023 •

edited

Loading

vtjnash left a comment

vchuravy commented Mar 29, 2023

gbaraldi commented Mar 29, 2023

vchuravy commented Mar 29, 2023

d-netto left a comment

gbaraldi commented Mar 29, 2023

d-netto commented Mar 29, 2023

d-netto commented Mar 29, 2023

d-netto commented Mar 29, 2023

JeffBezanson commented Mar 30, 2023

vchuravy commented Mar 30, 2023

gbaraldi commented Mar 30, 2023

oscardssmith commented Mar 30, 2023

pchintalapudi commented Mar 30, 2023

nanosoldier commented Mar 30, 2023

KristofferC commented Mar 30, 2023

gbaraldi commented Mar 30, 2023

gbaraldi commented Mar 30, 2023 •

edited

Loading

d-netto commented Mar 30, 2023 •

edited

Loading

gbaraldi commented Mar 30, 2023

gbaraldi commented Mar 30, 2023

gbaraldi commented Mar 30, 2023

gbaraldi commented Mar 30, 2023

nanosoldier commented Mar 30, 2023

gbaraldi commented Mar 31, 2023

topolarity commented Apr 5, 2023

KristofferC commented Apr 6, 2023

topolarity commented Apr 6, 2023

Add bit to the GC tag to turn GC in image search O(1) and also increase gc interval when encountering many pointers #49185

Add bit to the GC tag to turn GC in image search O(1) and also increase gc interval when encountering many pointers #49185

Conversation

gbaraldi commented Mar 29, 2023 • edited Loading

vtjnash left a comment

Choose a reason for hiding this comment

vchuravy commented Mar 29, 2023

gbaraldi commented Mar 29, 2023

vchuravy commented Mar 29, 2023

d-netto left a comment

Choose a reason for hiding this comment

gbaraldi commented Mar 29, 2023

d-netto commented Mar 29, 2023

d-netto commented Mar 29, 2023

d-netto commented Mar 29, 2023

JeffBezanson commented Mar 30, 2023

vchuravy commented Mar 30, 2023

gbaraldi commented Mar 30, 2023

oscardssmith commented Mar 30, 2023

pchintalapudi commented Mar 30, 2023

nanosoldier commented Mar 30, 2023

KristofferC commented Mar 30, 2023

gbaraldi commented Mar 30, 2023

gbaraldi commented Mar 30, 2023 • edited Loading

d-netto commented Mar 30, 2023 • edited Loading

gbaraldi commented Mar 30, 2023

gbaraldi commented Mar 30, 2023

gbaraldi commented Mar 30, 2023

gbaraldi commented Mar 30, 2023

nanosoldier commented Mar 30, 2023

gbaraldi commented Mar 31, 2023

topolarity commented Apr 5, 2023

KristofferC commented Apr 6, 2023

topolarity commented Apr 6, 2023

gbaraldi commented Mar 29, 2023 •

edited

Loading

gbaraldi commented Mar 30, 2023 •

edited

Loading

d-netto commented Mar 30, 2023 •

edited

Loading