Fix for quadratic batching in #99 #100

tclements · 2022-01-12T20:04:49Z

This PR greatly reduces the time and memory use for batching large numbers of graphs as described in #99 . The key here was using the optimized reduce(hcat,...) and reduce(vcat,...) to concatenate a large number of Nd arrays. Timings and memory usage:

Old Version:

using BenchmarkTools
using GraphNeuralNetworks

for ngraphs in 2 .^ (10:12)
    gs = [rand_graph(4, 6, ndata=ones(8, 4)) for _ in 1:ngraphs]
    println("\n=======================\nBatchsize = $ngraphs graphs\n=======================\n")
    b = @benchmark GraphNeuralNetworks.blockdiag($gs...)
    display(b)
end

=======================
Batchsize = 1024 graphs
=======================

BenchmarkTools.Trial: 76 samples with 1 evaluation.
 Range (min … max):  51.236 ms … 117.125 ms  ┊ GC (min … max): 16.32% … 24.58%
 Time  (median):     63.494 ms               ┊ GC (median):    21.49%
 Time  (mean ± σ):   66.366 ms ±  10.085 ms  ┊ GC (mean ± σ):  20.39% ±  3.10%

        ▂▂ ▅▅ █ ▅█ █ █▅     ▂        ▂
  ▅▁▁▁▁▁██▅███████▅█▁████▅▁▁█▅▅▅▅█▅▁▁█▁█▁▁█▁▅▁▅▁▁▁▁▁▁▁▁▅▁▁▁▁▁▅ ▁
  51.2 ms         Histogram: frequency by time         93.7 ms <

 Memory estimate: 196.20 MiB, allocs estimate: 73941.

=======================
Batchsize = 2048 graphs
=======================

BenchmarkTools.Trial: 20 samples with 1 evaluation.
 Range (min … max):  225.263 ms … 300.921 ms  ┊ GC (min … max): 18.52% … 17.75%
 Time  (median):     247.739 ms               ┊ GC (median):    17.96%
 Time  (mean ± σ):   256.413 ms ±  23.805 ms  ┊ GC (mean ± σ):  17.74% ±  0.95%

  ▁    █▁   ▁▁ ▁▁ ▁ █▁   ▁        ▁       ▁       ▁ ▁▁   ▁    ▁
  █▁▁▁▁██▁▁▁██▁██▁█▁██▁▁▁█▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁█▁▁▁▁▁▁▁█▁██▁▁▁█▁▁▁▁█ ▁
  225 ms           Histogram: frequency by time          301 ms <

 Memory estimate: 776.31 MiB, allocs estimate: 149717.

=======================
Batchsize = 4096 graphs
=======================

BenchmarkTools.Trial: 5 samples with 1 evaluation.
 Range (min … max):  922.417 ms …   1.126 s  ┊ GC (min … max): 16.82% … 18.22%
 Time  (median):     976.010 ms              ┊ GC (median):    16.93%
 Time  (mean ± σ):      1.009 s ± 80.602 ms  ┊ GC (mean ± σ):  17.26% ±  0.59%

  █            █ █                      █                    █
  █▁▁▁▁▁▁▁▁▁▁▁▁█▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  922 ms          Histogram: frequency by time          1.13 s <

 Memory estimate: 3.02 GiB, allocs estimate: 301269.

New version:

using BenchmarkTools
using GraphNeuralNetworks

for ngraphs in 2 .^ (10:12)
    gs = [rand_graph(4, 6, ndata=ones(8, 4)) for _ in 1:ngraphs]
    println("\n=======================\nBatchsize = $ngraphs graphs\n=======================\n")
    b = @benchmark GraphNeuralNetworks.batch($gs)
    display(b)
end

=======================
Batchsize = 1024 graphs
=======================

BenchmarkTools.Trial: 7919 samples with 1 evaluation.
 Range (min … max):  359.600 μs …  15.122 ms  ┊ GC (min … max):  0.00% … 88.54%
 Time  (median):     483.500 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   627.847 μs ± 906.415 μs  ┊ GC (mean ± σ):  17.96% ± 11.83%

  ██▅▂▂▁▁                                                       ▂
  █████████▇▇▅▆▅▅▅▃▄▁▄▁▁▃▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▃▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▆▅▇▆▇ █
  360 μs        Histogram: log(frequency) by time        6.7 ms <

 Memory estimate: 1.38 MiB, allocs estimate: 13389.

=======================
Batchsize = 2048 graphs
=======================

BenchmarkTools.Trial: 3931 samples with 1 evaluation.
 Range (min … max):  727.700 μs … 13.003 ms  ┊ GC (min … max):  0.00% … 82.01%
 Time  (median):       1.033 ms              ┊ GC (median):     0.00%
 Time  (mean ± σ):     1.268 ms ±  1.190 ms  ┊ GC (mean ± σ):  16.59% ± 15.12%

  ▅█▇█▆▃▁▁▁                                                    ▁
  ██████████▇▆▃▅▅▃▄▁▁▃▃▃▃▁▃▁▁▁▁▁▁▁▃▁▁▃▁▃▁▁▁▁▁▁▁▁▁▁▁▁▃▃▆▆▆▇█▇█▇ █
  728 μs        Histogram: log(frequency) by time      7.43 ms <

 Memory estimate: 2.75 MiB, allocs estimate: 26705.

=======================
Batchsize = 4096 graphs
=======================

BenchmarkTools.Trial: 1308 samples with 1 evaluation.
 Range (min … max):  1.877 ms … 36.357 ms  ┊ GC (min … max):  0.00% … 87.78%
 Time  (median):     2.747 ms              ┊ GC (median):     0.00%
 Time  (mean ± σ):   3.809 ms ±  2.920 ms  ┊ GC (mean ± σ):  15.78% ± 18.05%

  ▂▃▆█▇▆▄▂▂▂ ▂▂▂▁▁
  ████████████████▇█▅▆▇▆▆▇▆▅▅▅▆▅▅▁▅▅▅▄▅▆▁▆▆▄▇▆█▇▇▆▅▅▇▆▅▄▅▅▅▄ █
  1.88 ms      Histogram: log(frequency) by time     13.3 ms <

 Memory estimate: 5.50 MiB, allocs estimate: 53343.

tclements · 2022-01-12T20:36:51Z

Just tested, my fix will fail on 3D arrays due to using hcat..coming up with a solution for that now.

tclements · 2022-01-12T22:53:13Z

Fast N-d array concatenation (for N > 2) seems to be a common problem that needs to be solved in Base JuliaLang/julia#21672. Might need to table this PR until that is implemented or use an explicit solution for 3D arrays.

src/GNNGraphs/utils.jl

CarloLucibello · 2022-01-13T06:43:59Z

I think we can introduce something like this:

cat_fatures(xs::AbstractVector{<:AbstractVector}) = reduce(vcat, xs)
cat_fatures(xs::AbstractVector{<:AbstractMatrix}) = reduce(hcat, xs)
cat_fatures(xs::AbstractVector{<:AbstractArray{T,N}}) where {T,N} = reduce((x1, x2) -> cat(x1, x2, dims=N), xs)

src/GNNGraphs/transform.jl

src/GNNGraphs/utils.jl

CarloLucibello · 2022-01-13T07:07:26Z

src/GNNGraphs/transform.jl

+        graph = (s, t, w)
+        graph_indicator = vcat([ones_like(ei[1],Int,nodes[ii]) .+ (ii - 1) for (ii,ei) in enumerate(edge_indices)]...)
+    elseif all(y -> isa(y, ADJMAT_T), [g.graph for g in gs] )
+        graph = blockdiag([g.graph for g in gs]...)


this blockdiag method does not exist for dense matrices.
Maybe we can use dispatch as follows:

Flux.batch(gs::Vector{<:GNNGraph{<:COO_T}) = ... # specialized method Flux.batch(gs::Vector{<:GNNGraph{<:SPARSE_T}) = ... # specialized method Flux.batch(gs::Vector{<:GNNGraph{<:SPARSE_T}) = blockdiag(gs...) # old slow fallback

Thanks for the comments! I'll work on the dispatching

Co-authored-by: Carlo Lucibello <carlo.lucibello@gmail.com>

CarloLucibello · 2022-01-29T14:51:55Z

thanks @tclements, I finished up and merged this in #122

Fix for quadratic batching in JuliaGraphs#99

3e2d3cf

CarloLucibello reviewed Jan 13, 2022

View reviewed changes

src/GNNGraphs/utils.jl Outdated Show resolved Hide resolved

CarloLucibello reviewed Jan 13, 2022

View reviewed changes

src/GNNGraphs/transform.jl Outdated Show resolved Hide resolved

src/GNNGraphs/transform.jl Outdated Show resolved Hide resolved

src/GNNGraphs/utils.jl Outdated Show resolved Hide resolved

CarloLucibello reviewed Jan 13, 2022

View reviewed changes

tclements and others added 4 commits January 13, 2022 12:34

Update src/GNNGraphs/utils.jl

d307c26

Co-authored-by: Carlo Lucibello <carlo.lucibello@gmail.com>

Update src/GNNGraphs/transform.jl

a304a23

Co-authored-by: Carlo Lucibello <carlo.lucibello@gmail.com>

Update src/GNNGraphs/transform.jl

d4e0688

Co-authored-by: Carlo Lucibello <carlo.lucibello@gmail.com>

Update src/GNNGraphs/utils.jl

3dbf70a

Co-authored-by: Carlo Lucibello <carlo.lucibello@gmail.com>

CarloLucibello mentioned this pull request Jan 29, 2022

faster batching #122

Merged

CarloLucibello closed this Jan 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for quadratic batching in #99 #100

Fix for quadratic batching in #99 #100

tclements commented Jan 12, 2022 •

edited

Loading

tclements commented Jan 12, 2022

tclements commented Jan 12, 2022

CarloLucibello commented Jan 13, 2022

CarloLucibello Jan 13, 2022 •

edited

Loading

tclements Jan 13, 2022

CarloLucibello commented Jan 29, 2022

Fix for quadratic batching in #99 #100

Fix for quadratic batching in #99 #100

Conversation

tclements commented Jan 12, 2022 • edited Loading

tclements commented Jan 12, 2022

tclements commented Jan 12, 2022

CarloLucibello commented Jan 13, 2022

CarloLucibello Jan 13, 2022 • edited Loading

Choose a reason for hiding this comment

tclements Jan 13, 2022

Choose a reason for hiding this comment

CarloLucibello commented Jan 29, 2022

tclements commented Jan 12, 2022 •

edited

Loading

CarloLucibello Jan 13, 2022 •

edited

Loading