Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sparse initialization #1454

Merged
merged 7 commits into from
Jan 12, 2021
Merged

Add sparse initialization #1454

merged 7 commits into from
Jan 12, 2021

Conversation

atiyo
Copy link
Contributor

@atiyo atiyo commented Jan 6, 2021

Add sparse initialization, documentation and tests. Trim whitespace in editted files.

This PR is intended to address one of the outstanding points in bringing Flux to parity with PyTorch's features so it partially addresses #1431 and fully addresses #1450.

The implementation follows the method given in PyTorch implementation: a normally-distributed array is created, then a fixed proportion of randomly chosen row-indices is zeroed out for every column. Like the PyTorch version, it is restricted to 2-d Arrays.

PR Checklist

  • Tests are added
  • Entry in NEWS.md
  • Documentation, if applicable
  • Final review from @dhairyagandhi96 (for API changes).

src/utils.jl Outdated Show resolved Hide resolved
src/utils.jl Outdated Show resolved Hide resolved
@atiyo atiyo marked this pull request as ready for review January 6, 2021 22:25
end
rows, cols = dims
prop_zero = min(1.0, sparsity)
num_zeros = ceil(Integer, prop_zero * rows)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use \div here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you mean something like div(rows, 1/prop_zero)? This returns a float since prop_zero is a float, so would require further casting to an integer. I thought above was a bit easier to follow, but am happy to go with what you think is best.

Copy link
Member

@DhairyaLGandhi DhairyaLGandhi Jan 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\div{tab} should return an int

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something. I'm finding ÷ to behave the same way as div, i.e. it's returning a float for float values of prop_zero.

julia> prop_zero = 0.11; rows = 50;
julia> ÷(rows, 1/prop_zero, RoundUp)
6.0

Using ÷ as an infix operator still returns a float, but also doesn't allow to specify a RoundingMode. We need to round up to maintain consistency with PyTorch.

src/utils.jl Outdated Show resolved Hide resolved
src/utils.jl Outdated Show resolved Hide resolved
@DhairyaLGandhi
Copy link
Member

Thanks for looking into this! I've left a couple of thoughts in the implementation. We would need to use a different name though since sparse is already a function in a stdlib

src/utils.jl Outdated Show resolved Hide resolved
src/utils.jl Outdated Show resolved Hide resolved
@CarloLucibello
Copy link
Member

CarloLucibello commented Jan 8, 2021

Current implementation does sparse_array[1:num_zeros, :] .= 0, but it should randomize the zero positions in each row.
The pytorch code is:

def sparse_(tensor, sparsity, std=0.01):
    r"""Fills the 2D input `Tensor` as a sparse matrix, where the
    non-zero elements will be drawn from the normal distribution
    :math:`\mathcal{N}(0, 0.01)`, as described in `Deep learning via
    Hessian-free optimization` - Martens, J. (2010).

    Args:
        tensor: an n-dimensional `torch.Tensor`
        sparsity: The fraction of elements in each column to be set to zero
        std: the standard deviation of the normal distribution used to generate
            the non-zero values

    Examples:
        >>> w = torch.empty(3, 5)
        >>> nn.init.sparse_(w, sparsity=0.1)
    """
    if tensor.ndimension() != 2:
        raise ValueError("Only tensors with 2 dimensions are supported")

    rows, cols = tensor.shape
    num_zeros = int(math.ceil(sparsity * rows))

    with torch.no_grad():
        tensor.normal_(0, std)
        for col_idx in range(cols):
            row_indices = torch.randperm(rows)
            zero_indices = row_indices[:num_zeros]
            tensor[zero_indices, col_idx] = 0
    return tensor

We should follow them, swapping cols with rows, so sparse_array[row_idx, zero_indices] .= 0

@CarloLucibello
Copy link
Member

CarloLucibello commented Jan 8, 2021

Sorry, now I see that you randomly permute with mapslices(shuffle, x). So only my comment on swapping rows and cols compared to python's implementation applies

Co-authored-by: Brian Chen <ToucheSir@users.noreply.github.com>
@atiyo
Copy link
Contributor Author

atiyo commented Jan 8, 2021

Sorry, now I see that you randomly permute with mapslices(shuffle, x). So only my comment on swapping rows and cols compared to python's implementation applies

I'm not clear on why we need to swap rows and cols compared to PyTorch. I understand the batch dimension is different, but as far as I could tell Flux uses similar shapes for weights.

E.g

In [1]: from torch import nn
In [2]: nn.Linear(1,2).weight.shape
Out[2]: torch.Size([2, 1])
julia> using Flux
julia> size(Dense(1,2).W)
(2, 1)

@CarloLucibello
Copy link
Member

@atiyo you're right, I always thought that pytorch applies the transform x * W, but now I see that it does x * W^T instead. So this PR looks entirely fine to me. If we want to do some more name bikeshedding, init_sparse is an alternative proposal maybe more discoverable by tab-completion when looking for initialization methods.

Another consideration is that maybe we can move initialization functions to a submodule, but this doesn't have to be necessarily discussed here.

@DhairyaLGandhi
Copy link
Member

Let's not move it to a submodule, doesn't seem worthwhile enough as a standalone to me

init_sparse would be better for discoverability, but somewhat inconsistent. I guess it's alright for now

@CarloLucibello
Copy link
Member

bors r+

@bors
Copy link
Contributor

bors bot commented Jan 12, 2021

Build succeeded:

@bors bors bot merged commit b917a32 into FluxML:master Jan 12, 2021
@atiyo atiyo deleted the sparse_initialisation branch January 12, 2021 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants