Skip to content
This repository has been archived by the owner on Nov 22, 2022. It is now read-only.

implemented bottleneck separable convolutions #855

Closed

Conversation

shreydesai
Copy link
Contributor

Summary: Creates bottleneck layers for separable convolutions. Downsampling, convolving, and then upsampling significantly cuts the number of parameters with minimal loss in performance. This diff is a variant of the traditional bottleneck method -- instead of upsampling directly in the pointwise convolution, we split it into two pieces where the first convolution downsamples into a (sufficiently small) low dimension and the second convolution upsamples into the target (higher) dimension.

Differential Revision: D16563566

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Jul 30, 2019
shreydesai added a commit to shreydesai/pytext that referenced this pull request Aug 2, 2019
Summary:
Pull Request resolved: facebookresearch#855

Creates bottleneck layers for separable convolutions. Downsampling, convolving, and then upsampling significantly cuts the number of parameters with minimal loss in performance. This diff is a variant of the traditional bottleneck method -- instead of upsampling directly in the pointwise convolution, we split it into two pieces where the first convolution downsamples into a (sufficiently small) low dimension and the second convolution upsamples into the target (higher) dimension.

Example: Given an input with 256 channels, the depthwise channel spatially convolves it with 256 channels. Then, instead of projecting the channel space to 256 with the pointwise convolution, it is split into two pieces. The first pointwise convolution projects the channel space to 64, then the second pointwise convolution projects the channel space to 256 -- as we intended.

{F172681616}

Differential Revision: D16563566

fbshipit-source-id: e742abce052380a5f1174a4180276fdbbacc5b41
shreydesai added a commit to shreydesai/pytext that referenced this pull request Aug 2, 2019
Summary:
Pull Request resolved: facebookresearch#855

Creates bottleneck layers for separable convolutions. Downsampling, convolving, and then upsampling significantly cuts the number of parameters with minimal loss in performance. This diff is a variant of the traditional bottleneck method -- instead of upsampling directly in the pointwise convolution, we split it into two pieces where the first convolution downsamples into a (sufficiently small) low dimension and the second convolution upsamples into the target (higher) dimension.

Example: Given an input with 256 channels, the depthwise channel spatially convolves it with 256 channels. Then, instead of projecting the channel space to 256 with the pointwise convolution, it is split into two pieces. The first pointwise convolution projects the channel space to 64, then the second pointwise convolution projects the channel space to 256 -- as we intended.

{F172681616}

Differential Revision: D16563566

fbshipit-source-id: bff549728803062045a8540b069791e96c7944f1
shreydesai added a commit to shreydesai/pytext that referenced this pull request Aug 2, 2019
Summary:
Pull Request resolved: facebookresearch#855

Creates bottleneck layers for separable convolutions. Downsampling, convolving, and then upsampling significantly cuts the number of parameters with minimal loss in performance. This diff is a variant of the traditional bottleneck method -- instead of upsampling directly in the pointwise convolution, we split it into two pieces where the first convolution downsamples into a (sufficiently small) low dimension and the second convolution upsamples into the target (higher) dimension.

Example: Given an input with 256 channels, the depthwise channel spatially convolves it with 256 channels. Then, instead of projecting the channel space to 256 with the pointwise convolution, it is split into two pieces. The first pointwise convolution projects the channel space to 64, then the second pointwise convolution projects the channel space to 256 -- as we intended.

{F172681616}

Reviewed By: geof90

Differential Revision: D16563566

fbshipit-source-id: dd44d8eebea00dbb33130febd6fc7bda9d735aa7
Summary:
Pull Request resolved: facebookresearch#855

Creates bottleneck layers for separable convolutions. Downsampling, convolving, and then upsampling significantly cuts the number of parameters with minimal loss in performance. This diff is a variant of the traditional bottleneck method -- instead of upsampling directly in the pointwise convolution, we split it into two pieces where the first convolution downsamples into a (sufficiently small) low dimension and the second convolution upsamples into the target (higher) dimension.

Example: Given an input with 256 channels, the depthwise channel spatially convolves it with 256 channels. Then, instead of projecting the channel space to 256 with the pointwise convolution, it is split into two pieces. The first pointwise convolution projects the channel space to 64, then the second pointwise convolution projects the channel space to 256 -- as we intended.

{F172681616}

Reviewed By: geof90

Differential Revision: D16563566

fbshipit-source-id: 1cf310315739802aa5ea6a34efb4c8ab771a3c63
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 352b8be.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed Do not delete this pull request or issue due to inactivity. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants