Add sparse initialization #1454

atiyo · 2021-01-06T21:57:43Z

Add sparse initialization, documentation and tests. Trim whitespace in editted files.

This PR is intended to address one of the outstanding points in bringing Flux to parity with PyTorch's features so it partially addresses #1431 and fully addresses #1450.

The implementation follows the method given in PyTorch implementation: a normally-distributed array is created, then a fixed proportion of randomly chosen row-indices is zeroed out for every column. Like the PyTorch version, it is restricted to 2-d Arrays.

PR Checklist

Tests are added
Entry in NEWS.md
Documentation, if applicable
Final review from @dhairyagandhi96 (for API changes).

src/utils.jl

DhairyaLGandhi · 2021-01-07T06:35:06Z

src/utils.jl

+  end
+  rows, cols = dims
+  prop_zero = min(1.0, sparsity)
+  num_zeros = ceil(Integer, prop_zero * rows)


Use \div here

I assume you mean something like div(rows, 1/prop_zero)? This returns a float since prop_zero is a float, so would require further casting to an integer. I thought above was a bit easier to follow, but am happy to go with what you think is best.

\div{tab} should return an int

I might be missing something. I'm finding ÷ to behave the same way as div, i.e. it's returning a float for float values of prop_zero.

julia> prop_zero = 0.11; rows = 50; julia> ÷(rows, 1/prop_zero, RoundUp) 6.0

Using ÷ as an infix operator still returns a float, but also doesn't allow to specify a RoundingMode. We need to round up to maintain consistency with PyTorch.

src/utils.jl

DhairyaLGandhi · 2021-01-07T06:40:00Z

Thanks for looking into this! I've left a couple of thoughts in the implementation. We would need to use a different name though since sparse is already a function in a stdlib

src/utils.jl

CarloLucibello · 2021-01-08T00:42:35Z

Current implementation does sparse_array[1:num_zeros, :] .= 0, but it should randomize the zero positions in each row.
The pytorch code is:

def sparse_(tensor, sparsity, std=0.01):
    r"""Fills the 2D input `Tensor` as a sparse matrix, where the
    non-zero elements will be drawn from the normal distribution
    :math:`\mathcal{N}(0, 0.01)`, as described in `Deep learning via
    Hessian-free optimization` - Martens, J. (2010).

    Args:
        tensor: an n-dimensional `torch.Tensor`
        sparsity: The fraction of elements in each column to be set to zero
        std: the standard deviation of the normal distribution used to generate
            the non-zero values

    Examples:
        >>> w = torch.empty(3, 5)
        >>> nn.init.sparse_(w, sparsity=0.1)
    """
    if tensor.ndimension() != 2:
        raise ValueError("Only tensors with 2 dimensions are supported")

    rows, cols = tensor.shape
    num_zeros = int(math.ceil(sparsity * rows))

    with torch.no_grad():
        tensor.normal_(0, std)
        for col_idx in range(cols):
            row_indices = torch.randperm(rows)
            zero_indices = row_indices[:num_zeros]
            tensor[zero_indices, col_idx] = 0
    return tensor

We should follow them, swapping cols with rows, so sparse_array[row_idx, zero_indices] .= 0

CarloLucibello · 2021-01-08T00:48:03Z

Sorry, now I see that you randomly permute with mapslices(shuffle, x). So only my comment on swapping rows and cols compared to python's implementation applies

Co-authored-by: Brian Chen <ToucheSir@users.noreply.github.com>

atiyo · 2021-01-08T17:49:32Z

Sorry, now I see that you randomly permute with mapslices(shuffle, x). So only my comment on swapping rows and cols compared to python's implementation applies

I'm not clear on why we need to swap rows and cols compared to PyTorch. I understand the batch dimension is different, but as far as I could tell Flux uses similar shapes for weights.

E.g

In [1]: from torch import nn
In [2]: nn.Linear(1,2).weight.shape
Out[2]: torch.Size([2, 1])

julia> using Flux
julia> size(Dense(1,2).W)
(2, 1)

CarloLucibello · 2021-01-09T15:09:44Z

@atiyo you're right, I always thought that pytorch applies the transform x * W, but now I see that it does x * W^T instead. So this PR looks entirely fine to me. If we want to do some more name bikeshedding, init_sparse is an alternative proposal maybe more discoverable by tab-completion when looking for initialization methods.

Another consideration is that maybe we can move initialization functions to a submodule, but this doesn't have to be necessarily discussed here.

DhairyaLGandhi · 2021-01-09T15:44:09Z

Let's not move it to a submodule, doesn't seem worthwhile enough as a standalone to me

init_sparse would be better for discoverability, but somewhat inconsistent. I guess it's alright for now

CarloLucibello · 2021-01-12T16:35:20Z

bors r+

bors · 2021-01-12T17:30:15Z

Build succeeded:

buildkite/flux-dot-jl

atiyo added 2 commits January 6, 2021 21:43

Add sparse initialization.

5347aac

Update news with PR link. Fix docstring.

738f957

atiyo commented Jan 6, 2021

View reviewed changes

src/utils.jl Outdated Show resolved Hide resolved

atiyo commented Jan 6, 2021

View reviewed changes

src/utils.jl Outdated Show resolved Hide resolved

atiyo added 2 commits January 6, 2021 22:13

Import sparse into tests

758c1bf

Formatting

ce0150b

atiyo marked this pull request as ready for review January 6, 2021 22:25

DhairyaLGandhi reviewed Jan 7, 2021

View reviewed changes

CarloLucibello reviewed Jan 7, 2021

View reviewed changes

src/utils.jl Outdated Show resolved Hide resolved

atiyo added 2 commits January 7, 2021 21:59

sparse -> sparse_init. streamline zeroing.

1f47fbf

Correct sample output in docstring

1c9f0fc

atiyo requested review from CarloLucibello and DhairyaLGandhi January 7, 2021 22:33

ToucheSir reviewed Jan 7, 2021

View reviewed changes

src/utils.jl Outdated Show resolved Hide resolved

Streamline zero allocation

a31ddf8

Co-authored-by: Brian Chen <ToucheSir@users.noreply.github.com>

CarloLucibello approved these changes Jan 12, 2021

View reviewed changes

bors bot merged commit b917a32 into FluxML:master Jan 12, 2021

atiyo deleted the sparse_initialisation branch January 12, 2021 19:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sparse initialization #1454

Add sparse initialization #1454

atiyo commented Jan 6, 2021 •

edited by DhairyaLGandhi

Loading

DhairyaLGandhi Jan 7, 2021

atiyo Jan 7, 2021

DhairyaLGandhi Jan 9, 2021 •

edited

Loading

atiyo Jan 10, 2021

DhairyaLGandhi commented Jan 7, 2021

CarloLucibello commented Jan 8, 2021 •

edited

Loading

CarloLucibello commented Jan 8, 2021 •

edited

Loading

atiyo commented Jan 8, 2021

CarloLucibello commented Jan 9, 2021

DhairyaLGandhi commented Jan 9, 2021

CarloLucibello commented Jan 12, 2021

bors bot commented Jan 12, 2021

Add sparse initialization #1454

Add sparse initialization #1454

Conversation

atiyo commented Jan 6, 2021 • edited by DhairyaLGandhi Loading

PR Checklist

DhairyaLGandhi Jan 7, 2021

Choose a reason for hiding this comment

atiyo Jan 7, 2021

Choose a reason for hiding this comment

DhairyaLGandhi Jan 9, 2021 • edited Loading

Choose a reason for hiding this comment

atiyo Jan 10, 2021

Choose a reason for hiding this comment

DhairyaLGandhi commented Jan 7, 2021

CarloLucibello commented Jan 8, 2021 • edited Loading

CarloLucibello commented Jan 8, 2021 • edited Loading

atiyo commented Jan 8, 2021

CarloLucibello commented Jan 9, 2021

DhairyaLGandhi commented Jan 9, 2021

CarloLucibello commented Jan 12, 2021

bors bot commented Jan 12, 2021

atiyo commented Jan 6, 2021 •

edited by DhairyaLGandhi

Loading

DhairyaLGandhi Jan 9, 2021 •

edited

Loading

CarloLucibello commented Jan 8, 2021 •

edited

Loading

CarloLucibello commented Jan 8, 2021 •

edited

Loading