Tidy up `Maxout` #1794

mcabbott · 2021-11-30T23:08:37Z

Maxout is from #698 . This:

adds pretty printing
changes the explicit signature to Maxout(layer, layer, layer), rather than providing a tuple, to be more like other layers (with deprecation)
adds more examples to the docstring, and combines the two
changes not to use mapreduce. I see now this was a performance choice at the time, discussed here Add MaxOut layer #647 (comment) , but with Zygote this is much slower.

Before:

julia> using Flux

julia> m3 = Maxout(() -> Dense(5, 7, tanh), 3)
Maxout{Tuple{Dense{typeof(tanh), Matrix{Float32}, Vector{Float32}}, Dense{typeof(tanh), Matrix{Float32}, Vector{Float32}}, Dense{typeof(tanh), Matrix{Float32}, Vector{Float32}}}}((Dense(5, 7, tanh), Dense(5, 7, tanh), Dense(5, 7, tanh)))

julia> x = rand(Float32, 5, 11);

julia> @btime gradient(sum∘m3, $x);
  min 112.792 μs, mean 123.774 μs (930 allocations, 49.09 KiB. GC mean 3.71%)

After:

julia> m3 = Maxout(() -> Dense(5, 7, tanh), 3)
Maxout(
  Dense(5, 7, tanh),                    # 42 parameters
  Dense(5, 7, tanh),                    # 42 parameters
  Dense(5, 7, tanh),                    # 42 parameters
)                   # Total: 6 arrays, 126 parameters, 888 bytes.

julia> x = rand(Float32, 5, 11);

julia> @btime gradient(sum∘m3, $x);
  min 34.541 μs, mean 38.448 μs (493 allocations, 32.48 KiB. GC mean 6.63%)

mcabbott · 2021-11-30T23:13:03Z

In fact, the improvement is less clear at larger sizes. But I think we can optimise the gradient of max.(...) more easily than pairwise mapreduce?

julia> m = Maxout(() -> Dense(50, 70, tanh), 5);

julia> x = rand(Float32, 50, 110);

julia> @btime gradient(sum∘m, $x);
  min 806.917 μs, mean 1.248 ms (1268 allocations, 1.61 MiB. GC mean 7.54%)   # before
  min 782.125 μs, mean 919.414 μs (751 allocations, 1.24 MiB. GC mean 6.52%)  # after

julia> @btime m3($x); # forwards
  min 303.500 μs, mean 353.118 μs (28 allocations, 422.41 KiB. GC mean 5.61%)  # before
  min 324.584 μs, mean 367.592 μs (22 allocations, 331.89 KiB. GC mean 4.31%)  # after

ToucheSir · 2021-12-01T01:35:29Z

The stranger result to me is the forward pass. It's not clear why the before version would be faster, and it appears both versions are unrolling the per-element comparisons.

mcabbott · 2021-12-01T02:40:32Z

I have been surprised before by max, I think it's very careful about NaN etc. Still seems strange here, though.

julia> x, y, z = (rand(100,100) for _ in 1:99);

julia> @btime $y .= max.($x, 0.0, $z);
  min 8.361 μs, mean 8.406 μs (0 allocations)

julia> @btime $y .= clamp.($x, clamp.(0.0, $z, Inf), Inf);  # just ifelse
  min 2.403 μs, mean 2.443 μs (0 allocations)

ToucheSir · 2021-12-02T04:20:52Z

Come to think of it though, max.(...) might not be such a bad idea for the GPU path. I don't have the time to run benchmarks at just this moment, but if that is the case we could make a case for eating the slightly slower forward pass on CPU for fewer allocations and a faster backwards one?

mcabbott · 2021-12-02T05:12:27Z

I was going to suggest reverting to mapreduce for now, this PR can just be minor tidying up & the next one can look more at performance.

I wrote some gradients for max.(x,y,z), seems possible to make it roughly twice as quick on CPU. But it's more code than I thought it would be, how many high-performance hand-written kernels do we want in Zygote?
https://gist.github.com/mcabbott/48aa0f4e0ec730d39995ad9917cad8f1

~~The last commit is unrelated, but makes CPU tests pass on 1.7. Could be cherry-picked if someone wants to make a separate PR. Have not investigated why the result is slightly different.~~

src/layers/basic.jl

Co-authored-by: Brian Chen <ToucheSir@users.noreply.github.com>

mcabbott · 2021-12-13T18:06:41Z

bors r+

bors · 2021-12-13T18:51:02Z

Build succeeded:

buildkite/flux-dot-jl

mcabbott force-pushed the maxout branch from f60671a to 8120e06 Compare December 2, 2021 15:10

mcabbott added 2 commits December 11, 2021 23:08

tidy up maxout

2809636

restore mapreduce

33a55a3

mcabbott force-pushed the maxout branch from 8120e06 to 33a55a3 Compare December 12, 2021 04:08

mcabbott requested a review from ToucheSir December 13, 2021 15:57

ToucheSir requested changes Dec 13, 2021

View reviewed changes

src/layers/basic.jl Outdated Show resolved Hide resolved

src/layers/basic.jl Outdated Show resolved Hide resolved

mcabbott and others added 2 commits December 13, 2021 12:04

Update src/layers/basic.jl

c5200d5

Co-authored-by: Brian Chen <ToucheSir@users.noreply.github.com>

Update src/layers/basic.jl

74eb83b

Co-authored-by: Brian Chen <ToucheSir@users.noreply.github.com>

ToucheSir approved these changes Dec 13, 2021

View reviewed changes

bors bot merged commit fe803a1 into FluxML:master Dec 13, 2021

mcabbott deleted the maxout branch December 13, 2021 19:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tidy up `Maxout` #1794

Tidy up `Maxout` #1794

mcabbott commented Nov 30, 2021

mcabbott commented Nov 30, 2021

ToucheSir commented Dec 1, 2021

mcabbott commented Dec 1, 2021

ToucheSir commented Dec 2, 2021

mcabbott commented Dec 2, 2021 •

edited

Loading

mcabbott commented Dec 13, 2021

bors bot commented Dec 13, 2021

Tidy up Maxout #1794

Tidy up Maxout #1794

Conversation

mcabbott commented Nov 30, 2021

mcabbott commented Nov 30, 2021

ToucheSir commented Dec 1, 2021

mcabbott commented Dec 1, 2021

ToucheSir commented Dec 2, 2021

mcabbott commented Dec 2, 2021 • edited Loading

mcabbott commented Dec 13, 2021

bors bot commented Dec 13, 2021

Tidy up `Maxout` #1794

Tidy up `Maxout` #1794

mcabbott commented Dec 2, 2021 •

edited

Loading