Added GroupNorm Layer #696

shreyas-kowshik · 2019-03-25T06:20:30Z

The previous PR was closed due to some silly git errors from my side. Here is the original code added.

Update

src/layers/normalise.jl

staticfloat · 2019-03-25T18:32:30Z

src/layers/normalise.jl

+
+function(gn::GroupNorm)(x)
+  size(x,ndims(x)-1) == length(gn.β) || error("Group Norm expected $(length(gn.β)) channels, but got $(size(x,ndims(x)-1)) channels")
+  ndims(x) > 2 || error("Need to pass atleast 3 channels for Group Norm to work")


atleast -> at least

staticfloat · 2019-03-25T18:36:29Z

src/layers/normalise.jl

+
+mutable struct GroupNorm{F,V,W,N,T}
+  G::T # number of groups
+  N::T # Batch Size


I don't think we need to bake N into the GroupNorm; we should be able to allow variable batch sizes with this operation.

@staticfloat The size of the mean and variance matrices depends on N and is (Channels/Groups,Batch SIze). So should'nt that need the implementation of N?
I apologize if it's a silly question but this is what I understand so far.

You're right that the size of μ and σ² will change depending on N; but we don't need to know it beforehand. We can just broadcast μ and σ² up to the proper size when we get a new batch. (You are correct that we cannot have fully variable batch sizes; e.g. running with N=8 and then N=16 won't work; we will need to reset the μ and σ² before doing that).

You can initialize μ and σ² as size (G, 1) where G is the number of batches, then when you do this line:

gn.μ = (1 - mtm) .* gn.μ .+ mtm .* reshape(data(μ), (groups,batches))

gn.u will be automatically broadcast up to the proper size.

staticfloat · 2019-03-25T18:36:54Z

src/layers/normalise.jl

+  size(x,ndims(x)-1) == length(gn.β) || error("Group Norm expected $(length(gn.β)) channels, but got $(size(x,ndims(x)-1)) channels")
+  ndims(x) > 2 || error("Need to pass atleast 3 channels for Group Norm to work")
+  (size(x,ndims(x) -1))%gn.G == 0 || error("The number of groups ($(gn.G)) must divide the number of channels ($(size(x,ndims(x) -1)))")
+  (size(x,ndims(x)) == gn.N) || error("Number of samples in batch not equal to that passed")


I don't think we nee this check; we should be able to deal with variable batch sizes.

staticfloat · 2019-03-25T18:37:11Z

src/layers/normalise.jl

+  (size(x,ndims(x) -1))%gn.G == 0 || error("The number of groups ($(gn.G)) must divide the number of channels ($(size(x,ndims(x) -1)))")
+  (size(x,ndims(x)) == gn.N) || error("Number of samples in batch not equal to that passed")
+  # γ : (1,1...,C,1)
+  # β : (1,1...,C,1)


I'm not sure what these comments mean. Can you expand them or remove them?

staticfloat · 2019-03-25T18:48:21Z

src/layers/normalise.jl

+  groups = gn.G
+  channels = size(x, dims-1)
+  batches = size(x,dims)
+  channels_per_group = convert(Int32,div(channels,groups))


Is there a reason you explicitly want channels_per_group to be an Int32? div() should already give you an integral type.

staticfloat · 2019-03-25T18:49:55Z

src/layers/normalise.jl

+  else
+    T = eltype(x)
+    og_shape = size(x)
+    x = reshape(x,((size(x))[1:end-2]...,channels_per_group,groups,batches))


Clever use of reshape(), I suggest that you name this something other than x though, as it makes looking at things like ndims(x) below needlessly confusing.

shreyas-kowshik · 2019-03-26T16:12:58Z

@staticfloat Thank you for your feedback. I have made the requested changes.

staticfloat · 2019-03-26T22:11:34Z

Add some tests as well; I think there might be some problems with variable names and multiple code branches, so it will be good to do tests (both with an active layer and a !active layer). Take a look at the instance normalization tests for inspiration.

shreyas-kowshik · 2019-03-27T19:25:53Z

@staticfloat I have added the tests for Group Normalization. They are passing on my machine. Can you please review it once?

staticfloat · 2019-03-27T19:29:32Z

src/layers/normalise.jl

+          initβ = (i) -> zeros(Float32, i), initγ = (i) -> ones(Float32, i), 
+          ϵ = 1f-5, momentum = 0.1f0)
+
+chs is the numebr of channels, the channeld dimension of your input.


numebr, channeld both look to be typos.

staticfloat · 2019-03-27T19:30:01Z

src/layers/normalise.jl

+
+"""
+Group Normalization. 
+Known to improve the overall accuracy in case of classification and segmentation tasks.


Let's make the second line mention that this is a normalization layer that can perform better than Batch or Instance normalization.

staticfloat · 2019-03-27T19:30:32Z

src/layers/normalise.jl

+For an array of N dimensions, the (N-1)th index is the channel dimension.
+
+G is the number of groups along which the statistics would be computed.
+The number of groups must divide the number of channels for this to work.


I think a better way of saying this is that the number of channels must be an integer multiple of the number of groups.

staticfloat · 2019-03-27T19:31:18Z

src/layers/normalise.jl

+  else
+    T = eltype(x)
+    og_shape = size(x)
+    y = reshape(x,((size(x))[1:end-2]...,channels_per_group,groups,batches))


Since you're calculating the same thing for y in each branch, you can just pull that out of the if statement and do it once before.

staticfloat · 2019-03-27T19:31:43Z

test/layers/normalisation.jl

@@ -1,5 +1,5 @@
 using Flux: testmode!
-using Flux.Tracker: data
+using Flux.Tracker: data 


extra whitespace

test/layers/normalisation.jl

staticfloat · 2019-03-27T19:33:19Z

It's coming together! I have more minor comments this time around, once these are addressed I think we'll be ready to merge!

shreyas-kowshik · 2019-03-27T20:05:40Z

@staticfloat Thank you for the feedback. I have made the requested changes.

test/layers/normalisation.jl

MikeInnes · 2019-03-28T11:12:24Z

Looks like the tests are failing; if it's unrelated it might just need a merge of master.

johnnychen94 · 2019-03-28T12:39:41Z

src/Flux.jl

@@ -6,10 +6,8 @@ using Base: tail
 using MacroTools, Juno, Requires, Reexport, Statistics, Random
 using MacroTools: @forward

-export Chain, Dense, Maxout,


Maxout is accidentally deleted here.

@johnnychen94 Thanks!

johnnychen94

I didn't go inside the code details, just add some comments on your docstring. Since this's your first PR, it's better to read the Julia documentation style guide and check it.

https://docs.julialang.org/en/v1/manual/documentation/

johnnychen94 · 2019-03-28T13:37:46Z

src/layers/normalise.jl

+          initβ = (i) -> zeros(Float32, i), initγ = (i) -> ones(Float32, i), 
+          ϵ = 1f-5, momentum = 0.1f0)
+
+chs is the number of channels, the channel dimension of your input.


best practice of writting variable name is to add `` around it -- REPL can recognize it.

chs --> chs

G --> G

johnnychen94 · 2019-03-28T13:39:52Z

src/layers/normalise.jl

+Group Normalization. 
+This layer can outperform Batch-Normalization and Instance-Normalization.
+
+GroupNorm(chs::Integer, G::Integer, λ = identity;


According to https://docs.julialang.org/en/v1/manual/documentation/

Always show the signature of a function at the top of the documentation, with a four-space indent so that it is printed as Julia code.

johnnychen94 · 2019-03-28T13:43:19Z

src/layers/normalise.jl

+          GroupNorm(32,16)) # 32 channels, 16 groups (G = 16), thus 2 channels per group used          
+```
+
+Link : https://arxiv.org/pdf/1803.08494.pdf


Personally, I prefer to add a title to this link to info the users what it might be
"""
References:
[1] Wu, Y., & He, K. (2018). Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 3-19). https://arxiv.org/abs/1803.08494
"""

johnnychen94 · 2019-03-28T13:51:59Z

src/layers/normalise.jl

+  print(io, "GroupNorm($(join(size(l.β), ", "))")
+  (l.λ == identity) || print(io, ", λ = $(l.λ)")
+  print(io, ")")
+end


usually, it's best practice to add one newline at EOF

johnnychen94 · 2019-03-28T14:01:26Z

src/layers/normalise.jl

+end
+
+GroupNorm(chs::Integer, G::Integer, λ = identity;
+          initβ = (i) -> zeros(Float32, i), initγ = (i) -> ones(Float32, i), ϵ = 1f-5, momentum = 0.1f0) =


What you need is not add Float32 here. Instead, you need to do type conversion in the implementation details function(gn::GroupNorm)(x)

Thanks! I'll have a look at these and incorporate them in my commit.

johnnychen94 · 2019-03-28T14:18:26Z

src/layers/normalise.jl

+Link : https://arxiv.org/pdf/1803.08494.pdf
+"""
+
+mutable struct GroupNorm{F,V,W,N,T}


I prefer to add constraints GroupNorm{F<:Function, V<:Number, W<:Number, T<:Integer} and do type conversions with new constructors.
And, do we really need N here, is it possible to be absorbed by V or W?

shreyas-kowshik · 2019-03-28T15:55:53Z

@MikeInnes The tests related to GroupNorm have passed. The issue was due to the missing export of Maxout as pointed out by @johnnychen94 .

staticfloat · 2019-03-29T23:36:02Z

Thanks @shreyas-kowshik!

MikeInnes · 2019-04-04T16:31:22Z

Can you add this layer to the docs, and also an entry to NEWS.md?

johnnychen94 · 2019-04-05T09:44:58Z

I'm still suspect about the default initializer, we can't just let some type (in this case Float32) be the default type.

GroupNorm(chs::Integer, G::Integer, λ = identity;
          initβ = (i) -> zeros(Float32, i), initγ = (i) -> ones(Float32, i), ϵ = 1f-5, momentum = 0.1f0)

But since there'll be deprecation on these init keywords, it doesn't matter much. #671

shreyas-kowshik · 2019-04-05T17:46:44Z

@MikeInnes #728 adds GroupNorm to docs and NEWS.md

Merge pull request #1 from FluxML/master

b64a984

Update

MikeInnes requested a review from staticfloat March 25, 2019 14:12

staticfloat suggested changes Mar 25, 2019

View reviewed changes

Merge branch 'master' of https://github.com/FluxML/Flux.jl

35431e3

shreyas-kowshik force-pushed the group_norm_patch branch from 75355af to 595f1cf Compare March 26, 2019 16:11

Made Requested Changes

595f1cf

jpsamaroo and others added 2 commits March 28, 2019 00:51

Add note on reset! usage in recurrence docs

8033dca

Made a few fixes. Added tests

671aed9

staticfloat suggested changes Mar 27, 2019

View reviewed changes

shreyas-kowshik added 2 commits March 28, 2019 01:33

Made Requested Changes

61c1fbd

Corrected Group Size In Batch Norm Test For Group Norm

c810fd4

staticfloat reviewed Mar 27, 2019

View reviewed changes

test/layers/normalisation.jl Show resolved Hide resolved

staticfloat approved these changes Mar 27, 2019

View reviewed changes

johnnychen94 suggested changes Mar 28, 2019

View reviewed changes

Added export to Maxout

b6fcd1d

johnnychen94 reviewed Mar 28, 2019

View reviewed changes

Minor changes to docstring according to guidelines

4cb7b92

staticfloat merged commit 7418a2d into FluxML:master Mar 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added GroupNorm Layer #696

Added GroupNorm Layer #696

shreyas-kowshik commented Mar 25, 2019

staticfloat Mar 25, 2019

staticfloat Mar 25, 2019

shreyas-kowshik Mar 25, 2019

staticfloat Mar 25, 2019

staticfloat Mar 25, 2019

staticfloat Mar 25, 2019

staticfloat Mar 25, 2019

staticfloat Mar 25, 2019

shreyas-kowshik commented Mar 26, 2019

staticfloat commented Mar 26, 2019

shreyas-kowshik commented Mar 27, 2019

staticfloat Mar 27, 2019

staticfloat Mar 27, 2019

staticfloat Mar 27, 2019

staticfloat Mar 27, 2019

staticfloat Mar 27, 2019

staticfloat commented Mar 27, 2019

shreyas-kowshik commented Mar 27, 2019

MikeInnes commented Mar 28, 2019

johnnychen94 Mar 28, 2019

shreyas-kowshik Mar 28, 2019

johnnychen94 left a comment

johnnychen94 Mar 28, 2019

johnnychen94 Mar 28, 2019

johnnychen94 Mar 28, 2019

johnnychen94 Mar 28, 2019

johnnychen94 Mar 28, 2019

shreyas-kowshik Mar 28, 2019

johnnychen94 Mar 28, 2019

shreyas-kowshik commented Mar 28, 2019

staticfloat commented Mar 29, 2019

MikeInnes commented Apr 4, 2019

johnnychen94 commented Apr 5, 2019

shreyas-kowshik commented Apr 5, 2019

Added GroupNorm Layer #696

Added GroupNorm Layer #696

Conversation

shreyas-kowshik commented Mar 25, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shreyas-kowshik commented Mar 26, 2019

staticfloat commented Mar 26, 2019

shreyas-kowshik commented Mar 27, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

staticfloat commented Mar 27, 2019

shreyas-kowshik commented Mar 27, 2019

MikeInnes commented Mar 28, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnnychen94 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shreyas-kowshik commented Mar 28, 2019

staticfloat commented Mar 29, 2019

MikeInnes commented Apr 4, 2019

johnnychen94 commented Apr 5, 2019

shreyas-kowshik commented Apr 5, 2019