Skip to content

Commit

Permalink
use gather; fix outdated docs
Browse files Browse the repository at this point in the history
Co-authored-by: Manikya <manikyabard@gmail.com>
  • Loading branch information
CarloLucibello and manikyabard committed Jul 11, 2021
1 parent 7175c36 commit 062fc09
Show file tree
Hide file tree
Showing 9 changed files with 44 additions and 36 deletions.
2 changes: 1 addition & 1 deletion docs/src/gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ If you define a structured model, like a `Dense` layer or `Chain`, you just need
```julia
d = Dense(10, 5, σ)
d = fmap(cu, d)
d.W # CuArray
d.weight # CuArray
d(cu(rand(10))) # CuArray output

m = Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
Expand Down
2 changes: 1 addition & 1 deletion docs/src/models/advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ by simply deleting it from `ps`:

```julia
ps = params(m)
delete!(ps, m[2].b)
delete!(ps, m[2].bias)
```

## Custom multiple input or output layer
Expand Down
7 changes: 7 additions & 0 deletions docs/src/models/nnlib.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,3 +67,10 @@ NNlib.batched_mul!
NNlib.batched_adjoint
NNlib.batched_transpose
```

## Gather and Scatter

```@docs
NNlib.gather
NNlib.scatter
```
38 changes: 19 additions & 19 deletions docs/src/models/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Here's how you'd use Flux to build and train the most basic of models, step by s

This example will predict the output of the function `4x + 2`. First, import `Flux` and define the function we want to simulate:

```
```julia
julia> using Flux

julia> actual(x) = 4x + 2
Expand All @@ -28,7 +28,7 @@ This example will build a model to approximate the `actual` function.

Use the `actual` function to build sets of data for training and verification:

```
```julia
julia> x_train, x_test = hcat(0:5...), hcat(6:10...)
([0 1 4 5], [6 7 9 10])

Expand All @@ -42,38 +42,38 @@ Normally, your training and test data come from real world observations, but thi

Now, build a model to make predictions with `1` input and `1` output:

```
```julia
julia> model = Dense(1, 1)
Dense(1, 1)

julia> model.W
1-element Array{Float64,1}:
-0.99009055
julia> model.weight
1×1 Matrix{Float32}:
-1.4925033

julia> model.b
1-element Array{Float64,1}:
julia> model.bias
1-element Vector{Float32}:
0.0
```

Under the hood, a dense layer is a struct with fields `W` and `b`. `W` represents a weight and `b` represents a bias. There's another way to think about a model. In Flux, *models are conceptually predictive functions*:
Under the hood, a dense layer is a struct with fields `weight` and `bias`. `weight` represents a weights' matrix and `bias` represents a bias vector. There's another way to think about a model. In Flux, *models are conceptually predictive functions*:

```
```julia
julia> predict = Dense(1, 1)
```

`Dense(1, 1)` also implements the function `σ(Wx+b)` where `W` and `b` are the weights and biases. `σ` is an activation function (more on activations later). Our model has one weight and one bias, but typical models will have many more. Think of weights and biases as knobs and levers Flux can use to tune predictions. Activation functions are transformations that tailor models to your needs.

This model will already make predictions, though not accurate ones yet:

```
```julia
julia> predict(x_train)
1×6 Array{Float32,2}:
-1.98018 -5.94054 -9.90091 -13.8613 -17.8216 -21.782
1×6 Matrix{Float32}:
0.0 -1.4925 -2.98501 -4.47751 -5.97001 -7.46252
```

In order to make better predictions, you'll need to provide a *loss function* to tell Flux how to objectively *evaluate* the quality of a prediction. Loss functions compute the cumulative distance between actual values and predictions.

```
```julia
julia> loss(x, y) = Flux.Losses.mse(predict(x), y)
loss (generic function with 1 method)

Expand All @@ -87,7 +87,7 @@ More accurate predictions will yield a lower loss. You can write your own loss f

Under the hood, the Flux [`train!`](@ref) function uses *a loss function* and *training data* to improve the *parameters* of your model based on a pluggable [`optimiser`](../training/optimisers.md):

```
```julia
julia> using Flux: train!

julia> opt = Descent()
Expand All @@ -100,12 +100,12 @@ julia> data = [(x_train, y_train)]

Now, we have the optimiser and data we'll pass to `train!`. All that remains are the parameters of the model. Remember, each model is a Julia struct with a function and configurable parameters. Remember, the dense layer has weights and biases that depend on the dimensions of the inputs and outputs:

```
julia> predict.W
```julia
julia> predict.weight
1-element Array{Float64,1}:
-0.99009055

julia> predict.b
julia> predict.bias
1-element Array{Float64,1}:
0.0
```
Expand All @@ -120,7 +120,7 @@ Params([[-0.99009055], [0.0]])
These are the parameters Flux will change, one step at a time, to improve predictions. Each of the parameters comes from the `predict` model:

```
julia> predict.W in parameters, predict.b in parameters
julia> predict.weight in parameters, predict.bias in parameters
(true, true)
```
Expand Down
4 changes: 2 additions & 2 deletions docs/src/models/regularisation.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,10 @@ m = Dense(10, 5)
loss(x, y) = logitcrossentropy(m(x), y)
```

We can apply L2 regularisation by taking the squared norm of the parameters , `m.W` and `m.b`.
We can apply L2 regularisation by taking the squared norm of the parameters , `m.weight` and `m.bias`.

```julia
penalty() = sum(abs2, m.W) + sum(abs2, m.b)
penalty() = sum(abs2, m.weight) + sum(abs2, m.bias)
loss(x, y) = logitcrossentropy(m(x), y) + penalty()
```

Expand Down
3 changes: 2 additions & 1 deletion src/layers/basic.jl
Original file line number Diff line number Diff line change
Expand Up @@ -475,7 +475,8 @@ function Embedding(in::Integer, out::Integer;
end

(m::Embedding)(x::Union{OneHotVector, OneHotMatrix}) = m.weight * x # equivalent to m.weight[:,onecold(x)]
(m::Embedding)(x::Union{Int,AbstractVector}) = m.weight[:, x]
(m::Embedding)(x::Integer) = m([x])
(m::Embedding)(x::AbstractVector) = NNlib.gather(m.weight, x)
(m::Embedding)(x::AbstractArray) = reshape(m(vec(x)), :, size(x)...)

function Base.show(io::IO, m::Embedding)
Expand Down
2 changes: 1 addition & 1 deletion src/utils.jl
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ This function is mainly used by weight initializers, e.g., [`kaiming_normal`](@r
```jldoctest
julia> layer = Dense(10, 20);
julia> Flux.nfan(size(layer.W))
julia> Flux.nfan(size(layer.weight))
(10, 20)
julia> layer = Conv((3, 3), 2=>10);
Expand Down
2 changes: 1 addition & 1 deletion test/cuda/layers.jl
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ end

@test sum(l(ip)) 0.f0
gs = gradient(() -> sum(l(ip)), Flux.params(l))
@test l.b gs.params
@test l.bias gs.params
end

@testset "Extended BatchNorm" begin
Expand Down
20 changes: 10 additions & 10 deletions test/utils.jl
Original file line number Diff line number Diff line change
Expand Up @@ -226,19 +226,19 @@ end
m = Chain(Dense(10, 5, relu), Dense(5, 2))
x64 = rand(Float64, 10)
x32 = rand(Float32, 10)
@test eltype(m[1].W) == Float32
@test eltype(m[1].weight) == Float32
@test eltype(m(x32)) == Float32
@test eltype(m(x64)) == Float64
@test eltype(f64(m)(x32)) == Float64
@test eltype(f64(m)(x64)) == Float64
@test eltype(f64(m)[1].W) == Float64
@test eltype(f32(f64(m))[1].W) == Float32
@test eltype(f64(m)[1].weight) == Float64
@test eltype(f32(f64(m))[1].weight) == Float32
end

@testset "Zeros" begin
m = Dense(3,2; bias=false)
@test f64(m).b === m.b === Zeros()
@test f32(m).b === m.b === Zeros()
@test f64(m).bias === m.bias === Zeros()
@test f32(m).bias === m.bias === Zeros()

@testset "Gradients for broadcasted $op with sizes $s" for op in (+,-,*), s in ((1,), (2,3))
o = ones(s)
Expand Down Expand Up @@ -340,19 +340,19 @@ end

nobias(n) = Zeros()
testdense(m, bt) = @testset "Check layer $i" for (i, (l1, l2)) in enumerate(zip(m, dm(bt)))
@test l1.W == l2.W
@test l1.b == l2.b
@test_skip typeof(l1.b) === typeof(l2.b)
@test l1.weight == l2.weight
@test l1.bias == l2.bias
@test_skip typeof(l1.bias) === typeof(l2.bias)
end

@testset "loadparams!" begin
import Flux: loadparams!
pars(w, b) = [w, b]
import Flux: loadparams!, Zeros
pars(w, b::Zeros) = [w, Flux.zeros(size(w,1))]
pars(l) = pars(l.W, l.b)
pars(l) = pars(l.weight, l.bias)
pararray(m) = mapreduce(pars, vcat, m)
weights(m) = mapreduce(l -> [l.W], vcat, m)
weights(m) = mapreduce(l -> [l.weight], vcat, m)
@testset "Bias type $bt" for bt in (Flux.zeros, nobias)
m = dm(bt)
loadparams!(m, params(m))
Expand Down

0 comments on commit 062fc09

Please sign in to comment.