Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some tweaks to the Getting Started docs #2195

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 14 additions & 11 deletions docs/src/index.md
Original file line number Diff line number Diff line change
@@ -31,23 +31,25 @@ Also see [Implementing pullbacks](@ref) on how to implement back-propagation for
We will try a few things with the following functions:

```jldoctest rosenbrock
julia> rosenbrock(x, y) = (1.0 - x)^2 + 100.0 * (y - x^2)^2
rosenbrock (generic function with 1 method)
julia> rosenbrock(x, y) = (1.0 - x)^2 + 100.0 * (y - x^2)^2;
julia> rosenbrock_inp(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
rosenbrock_inp (generic function with 1 method)
julia> rosenbrock_inp(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2;
```

where we note for future reference that the value of this function at `x=1.0`, `y=2.0` is `100.0`, and its derivative
Copy link
Contributor

@mcabbott mcabbott Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider showing this as code instead of prose?

julia> rosenbrock(x, y) = (1.0 - x)^2 + 100.0 * (y - x^2)^2;

julia> rosenbrock(xy) = (1.0 - xy[1])^2 + 100.0 * (xy[2] - xy[1]^2)^2;

julia> rosenbrock(1.0, 2.0) == rosenbrock([1.0, 2.0]) == 100.0
true

I also think you should not call the input of rosenbrock_inp the same thing, x == [x, y] is weird. The name rosenbrock_inp also seems a bit weird, maybe it can just be another method, or if that's too confusing, add a suffix more informative than "inp"? (I'm not sure what INP means, maybe input, but why?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was originally in place (@michel2323 were you the one to originally author this doc, just by virtue of it being rosenbrock?)

But either way sure!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok. But this function isn't in-place, it's just going to be used somewhere below in a demonstration that Enzyme likes to handle functions which accept Vector by mutating something else. The reader doesn't know that yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very true, maybe rosenbrok_array or something? or even just rosenbrock2

with respect to `x` at that point is `-400.0`, and its derivative with respect to `y` at that point is `200.0`.
Comment on lines +36 to +40
Copy link
Contributor

@mcabbott mcabbott Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
julia> rosenbrock_inp(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2;
```
where we note for future reference that the value of this function at `x=1.0`, `y=2.0` is `100.0`, and its derivative
with respect to `x` at that point is `-400.0`, and its derivative with respect to `y` at that point is `200.0`.
julia> rosenbrock(xy::Vector) = (1 - xy[1])^2 + 100 * (xy[2] - xy[1]^2)^2;
julia> z = rosenbrock(1.0, 2.0)
100.0
julia> z == rosenbrock([1.0, 2.0]) # Vector method
true
```
We note for future reference that the value of this function at `x=1.0`, `y=2.0` is `z=100.0`. Its derivative with respect to `x` at that point is `-400.0`, and its derivative with respect to `y` is `200.0`.

I've also removed 100.0 from the definition, as IMO this is idiomatic Julia -- rosenbrock can take & return Float32 without promoting.


## Reverse mode

The return value of reverse mode [`autodiff`](@ref) is a tuple that contains as a first value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The return value of reverse mode [`autodiff`](@ref) is a tuple that contains as a first value
The return value of reverse mode [`autodiff`](@ref) is a tuple that contains as a first element

the derivative value of the active inputs and optionally the primal return value.
the derivative value of the active inputs and optionally the _primal_ return value (i.e. the
value of the undifferentiated function).
Comment on lines 44 to +46
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider not using "value" to mean so many things here?

is a tuple that contains as a first element the derivatives of ..., and optionally the primal value (i.e. what the function returns).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "optionally" also seems a bit odd to describe the output not the input. It's not that you may omit this. It's that ReverseWithPrimal tells it to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we definitely don't need to say "derivative value" and can just say "the derivative of"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and "the value of the undifferentiated function" -> "the result of the original function without differentiation"

Copy link
Contributor

@mcabbott mcabbott Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also consider putting the ReverseWithPrimal case first, as without it, ((-400.0, 200.0),) seems like a puzzle to count the brackets & guess why.

Perhaps also write it with destructuring syntax, like:

derivs, y = autodiff(ReverseWithPrimal, rosenbrock, Active(1.0), Active(2.0))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah totally fair, if you want to put that in this PR that would be fine with me!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't make suggestions across deleted lines :/ so this is going to be messy...

Comment on lines +45 to +46
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the derivative value of the active inputs and optionally the _primal_ return value (i.e. the
value of the undifferentiated function).
the derivatives with respect to the inputs. The tuple's second element is the _primal_ value (i.e. the result of the original function without differentiation),
but this is omitted if you use `Reverse` instead of `ReverseWithPrimal`:


```jldoctest rosenbrock
julia> autodiff(Reverse, rosenbrock, Active, Active(1.0), Active(2.0))
julia> autodiff(Reverse, rosenbrock, Active(1.0), Active(2.0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
julia> autodiff(Reverse, rosenbrock, Active(1.0), Active(2.0))
julia> derivs, z = autodiff(ReverseWithPrimal, rosenbrock, Active(1.0), Active(2.0))
((-400.0, 200.0), 100.0)
julia> autodiff(Reverse, rosenbrock, Active(1.0), Active(2.0))

((-400.0, 200.0),)
julia> autodiff(ReverseWithPrimal, rosenbrock, Active, Active(1.0), Active(2.0))
julia> autodiff(ReverseWithPrimal, rosenbrock, Active(1.0), Active(2.0))
((-400.0, 200.0), 100.0)
Comment on lines +52 to 53
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
julia> autodiff(ReverseWithPrimal, rosenbrock, Active(1.0), Active(2.0))
((-400.0, 200.0), 100.0)

```

@@ -62,7 +64,7 @@ julia> dx = [0.0, 0.0]
0.0
0.0
julia> autodiff(Reverse, rosenbrock_inp, Active, Duplicated(x, dx))
julia> autodiff(Reverse, rosenbrock_inp, Duplicated(x, dx))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
julia> autodiff(Reverse, rosenbrock_inp, Duplicated(x, dx))
julia> autodiff(Reverse, rosenbrock, Duplicated(x, dx))

((nothing,),)
julia> dx
@@ -71,8 +73,9 @@ julia> dx
200.0
```

Both the inplace and "normal" variant return the gradient. The difference is that with
[`Active`](@ref) the gradient is returned and with [`Duplicated`](@ref) the gradient is accumulated in place.
Both the inplace and "normal" variant return the gradient. The difference is that with [`Active`](@ref)
Copy link
Contributor

@mcabbott mcabbott Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wording seems weird. The inplace version returns ((nothing,),), it's written right there. That's what return means. And inplace / "normal" are new terms here, which aren't the terms you need to learn to understand Enzyme.

The version with Active arguments (for immutable inputs like x::Float64) returns the gradient. The version with Duplicated (for mutable inputs like x::Vector{Float64}) instead writes the gradient into the Duplicated object, and returns nothing in the corresponding slot of the returned derivs. In fact it accumulates the gradient, i.e. if you run it again it will double dx. (See make_zero! perhaps.) In general a function may accept any mix of Active, Duplicated, and Const arguments.

IDK how much of the end goes here, but the reader should not get the impression that all arguments must have the same type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we should say both compute the gradient.

And we can use whatever function names are here for clarity

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Both the inplace and "normal" variant return the gradient. The difference is that with [`Active`](@ref)
Both versions calculate the same derivatives. The difference is that with [`Active`](@ref) arguments

the gradient is returned and with [`Duplicated`](@ref) the gradient is accumulated in-place into `dx`,
and a value of `nothing` is placed in the corresponding slot of the returned `Tuple`.
Comment on lines +77 to +78
Copy link
Contributor

@mcabbott mcabbott Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the gradient is returned and with [`Duplicated`](@ref) the gradient is accumulated in-place into `dx`,
and a value of `nothing` is placed in the corresponding slot of the returned `Tuple`.
(for immutable inputs like `x::Float64`) it returns them as `derivs`, while the version with `Duplicated` (for mutable inputs like `x::Vector{Float64}`) instead writes the gradient into the `Duplicated` object, and returns `nothing` in the corresponding slot of the returned `derivs`.
In fact it accumulates the gradient, i.e. if you run `autodiff` again it will double `dx`.
In general, `autodiff` accepts any mix of `Active` and `Duplicated` function arguments, as well as `Const` and various other `Annotation` types.


## Forward mode

@@ -121,7 +124,7 @@ julia> dx = [1.0, 1.0]
1.0
1.0
julia> autodiff(ForwardWithPrimal, rosenbrock_inp, Duplicated, Duplicated(x, dx))
julia> autodiff(ForwardWithPrimal, rosenbrock_inp, Duplicated(x, dx))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
julia> autodiff(ForwardWithPrimal, rosenbrock_inp, Duplicated(x, dx))
julia> autodiff(ForwardWithPrimal, rosenbrock, Duplicated(x, dx))

(-400.0, 400.0)
```