Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deprecate DepthwiseConv once we have groups in standard conv #1667

Closed
CarloLucibello opened this issue Jul 15, 2021 · 17 comments · Fixed by #1921
Closed

deprecate DepthwiseConv once we have groups in standard conv #1667

CarloLucibello opened this issue Jul 15, 2021 · 17 comments · Fixed by #1921

Comments

@CarloLucibello
Copy link
Member

DepthwiseConv is a grouped convolution with groups = channels_in

@DhairyaLGandhi
Copy link
Member

Tensorflow has both, and it seems like that would be a nice distinction to maintain.

@CarloLucibello
Copy link
Member Author

we could keep the layer/function but just as a wrapper around Conv/conv. This way we will fix #459

@DhairyaLGandhi
Copy link
Member

Why not add the required groups support to the dims object now that we have the infrastructure to do so? Wrapping it over Conv sounds inelegant. We have a better depthwise kernel for handling that case too.

@darsnack
Copy link
Member

I think the suggestion is that both Conv and DepthwiseConv call NNlib.conv with the groups specified in dims accordingly.

@ToucheSir ToucheSir transferred this issue from FluxML/NNlib.jl Jul 16, 2021
@ToucheSir
Copy link
Member

Am I correct to think that Conv and DepthwiseConv can have the exact same forward pass now that FluxML/NNlib.jl#289 is merged? Then DepthwiseConv could become a constructor or convenience function instead of its own layer.

@darsnack
Copy link
Member

You are correct, I think they could be tweaked to have the same code. And I was going to suggest it as a convenience constructor at first, but I think keeping it as its own layer might be nice for two reasons: a) future optimizations (minor) and b) seeing DepthwiseConv print out when examining model structure is useful. (b) also applies to easier dispatch. We could still share code while maintaining DepthwiseConv distinctly at the layer level.

@DhairyaLGandhi
Copy link
Member

How does CUDNN handle the depthwise conv

@CarloLucibello
Copy link
Member Author

It doesn't, one uses groups = channels_in in the standard convolution (the only one cudnn exposes)

@ToucheSir
Copy link
Member

Which is covered by FluxML/NNlibCUDA.jl#9, no additional machinery required :)

My main issue with having a separate DepthwiseConv is that it gives 2 ways to represent the same thing. Since we can dispatch on the number of groups now, is there a need for a separate named type? Flux has generally erred on the side of not having layers that are a more specific version of another layer, so this would be an exception.

@CarloLucibello
Copy link
Member Author

CarloLucibello commented Jul 16, 2021

We could want a separate layer for printing, discoverability, and to avoid breakage.

I would also add DepthwiseSeparableConv as its own layer or as an option to DepthwiseConv.
DepthwiseSeparableConv is DepthwiseConv followed by a Conv with a 1x1 kernel.

@ToucheSir
Copy link
Member

So if someone creates a Conv with groups == input channels, do we still print it out as a Conv? Certainly we still want a way for users to write DepthwiseConv and get a layer back, my point is that the distinction is not so clear cut.

@darsnack
Copy link
Member

I agree that it is not straightforward, but I am inclined to think that if someone creates a Conv with groups == input channels, then either
a) they aren't mentally thinking of this as a DepthwiseConv otherwise they would have just used it, or
b) they already know of the equivalence, and what prints out will not be surprising to them.

I think what's more likely is that if someone who doesn't know the equivalence calls DepthwiseConv and a Conv prints out, then they will be confused.

@DhairyaLGandhi
Copy link
Member

Having a depthwise conv seems strictly better. People who don't know/ don't want to know the groups setting can use DepthwiseConv and forget it. To those who understand the equivalence can choose between the two and probably still should prefer the DepthwiseConv for clarity of code. We should add a note to the DepthwiseConv layer stating that it's a special case of Conv + groups. Helps both categories of users. Conv already refers the depthwise case, which means we are doing a decent job of helping with discovering the feature.

@ToucheSir
Copy link
Member

I think what's more likely is that if someone who doesn't know the equivalence calls DepthwiseConv and a Conv prints out, then they will be confused.

My thought was to actually leverage the type system here. Since Julia is smart enough to print out aliases when they're defined, creating a Conv with the correct number of groups could show up as a DepthwiseConv if either we parameterize the Conv type with the channels and groups or add a bit of logic to show.

This assumes the only reason to have a separate type over a constructor is for better display of course. If so, I'm of the opinion that big compound names generally hide an abstraction waiting to be let out. For example, if we add DepthwiseSeparableConv do we add SpatialSeparableConv as well?

@DhairyaLGandhi
Copy link
Member

creating a Conv with the correct number of groups could show up as a DepthwiseConv

The value of having it isn't in the printing, it's in making code readable. Adding checks to return (or print) mismatched types would also needlessly complicate Flux, which I am strictly against.

We already have overloads for conv for CUDA, I don't see why not extend the same courtesy to depthwiseconv.

We haven't heard complaints over its existence, and I doubt anyone is going to be mad either, especially if it's made GPU friendly.

@ToucheSir
Copy link
Member

The value of having it isn't in the printing, it's in making code readable.

Are you referring to user code or library code? Because for user code, nothing changes. You call DepthwiseConv, and it gives you back something that does a DepthwiseConv. With printing support, it'd even show as a DepthwiseConv instead of a plain Conv.

If this is about internal complexity, which I assume is what you mean by

Adding checks to return (or print) mismatched types would also needlessly complicate Flux, which I am strictly against.

Then I think it's worth looking at what changes would have to be made. This:

struct Conv{N,M,F,A,V}
...
end

Becomes

struct Conv{InC,OutC,Groups,N,M,F,A,V} # type params left unabbreviated for clarity
...
end

const DepthwiseConv = Conv{InC,OutC,InC}

Since we don't presently have an easy way of querying the number of channels or groups from a Conv layer, this could be killing 2 birds with one stone?

Here are the all of the potential hangups I can think of:

  1. We don't want to add more type params to Conv. I have at best a hazy grasp of what the consequences would be, but I assume more unbound type params is not ideal.
  2. This defines one type that can be printed 2 different ways. There is a precedent for this with Vector, but one could imagine programmatically constructing a CNN and being confused by why no Convs are showing when the model is printed.
  3. Having a dedicated DepthwiseConv implementation makes it easier to support DepthwiseSeparableConv. e.g. using dispatch or using another (Conv) layer as "transform" like has been proposed in Fix Norm Layers, Again #1509 (comment).

@DhairyaLGandhi
Copy link
Member

Also ref FluxML/NNlibCUDA.jl#22 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants