Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make Maxout trainable #698

Merged
merged 1 commit into from
Mar 25, 2019
Merged

Conversation

oxinabox
Copy link
Member

Fix what I failed to do in #647

@MikeInnes MikeInnes merged commit 8a55969 into FluxML:master Mar 25, 2019
@MikeInnes
Copy link
Member

We never added a news item for this either, can you add a quick line there?

BerenMillidge pushed a commit to BerenMillidge/Flux.jl that referenced this pull request Dec 20, 2019
@mcabbott mcabbott mentioned this pull request Nov 30, 2021
bors bot added a commit that referenced this pull request Dec 13, 2021
1794: Tidy up `Maxout` r=mcabbott a=mcabbott

Maxout is from #698 . This:

* adds pretty printing
* changes the explicit signature to `Maxout(layer, layer, layer)`, rather than providing a tuple, to be more like other layers (with deprecation)
* adds more examples to the docstring, and combines the two
* changes not to use `mapreduce`. I see now this was a performance choice at the time, discussed here #647 (comment) , but with Zygote this is much slower.

Before:
```
julia> using Flux

julia> m3 = Maxout(() -> Dense(5, 7, tanh), 3)
Maxout{Tuple{Dense{typeof(tanh), Matrix{Float32}, Vector{Float32}}, Dense{typeof(tanh), Matrix{Float32}, Vector{Float32}}, Dense{typeof(tanh), Matrix{Float32}, Vector{Float32}}}}((Dense(5, 7, tanh), Dense(5, 7, tanh), Dense(5, 7, tanh)))

julia> x = rand(Float32, 5, 11);

julia> `@btime` gradient(sum∘m3, $x);
  min 112.792 μs, mean 123.774 μs (930 allocations, 49.09 KiB. GC mean 3.71%)
```
After:
```
julia> m3 = Maxout(() -> Dense(5, 7, tanh), 3)
Maxout(
  Dense(5, 7, tanh),                    # 42 parameters
  Dense(5, 7, tanh),                    # 42 parameters
  Dense(5, 7, tanh),                    # 42 parameters
)                   # Total: 6 arrays, 126 parameters, 888 bytes.

julia> x = rand(Float32, 5, 11);

julia> `@btime` gradient(sum∘m3, $x);
  min 34.541 μs, mean 38.448 μs (493 allocations, 32.48 KiB. GC mean 6.63%)
```

Co-authored-by: Michael Abbott <32575566+mcabbott@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants