-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
getting a minimal Lux example working (with CUDA) #1392
Comments
Can you attach The whole error log
…On Thu, Apr 18, 2024 at 6:57 PM ExpandingMan ***@***.***> wrote:
This issue is to provide a minimal example of neural network training with
Lux to hopefully make it easier for developers to work toward making it
viable. It probably isn't news to anyone here that this example fails, but
it was indicated to me on slack that it would still be helpful to have this
issue for reference.
In this example, we train a neural network with zero hidden layers to
approximate the polynomial $x^2 - 2x$. It is trivial to generalize this
to deeper neural networks but probably not useful for this demonstration.
This roughly follows the Lux tutorial here
<https://lux.csail.mit.edu/dev/tutorials/beginner/2_PolynomialFitting>,
but I have stripped out the opaque Lux training stuff so that it's more
clear what's going on. I expect this example should be simpler to diagnose
than the equivalent with Flux, as the explicit parameterization of Lux
makes it easier to reason about, but I also expect that if this example
were working the analogous Flux example surely would. Indeed, I think this
example is a good proxy for a huge number of common use cases.
using LinearAlgebra, Random, Statistics, Optimisersusing CUDAusing Lux, LuxCUDAimport Zygote, Enzyme
# confirmed this works for enzyme on gpuconst dev = gpu_device()
function makedata(rng::AbstractRNG)
X = reshape(collect(range(-2.0f0, 2.0f0, 128)), (1, 128))
y = evalpoly.(X, ((0, -2, 1),)) .+ randn(rng, Float32, (1, 128)) .* 0.1f0
(X, y)end
function loss(model, θ, ψ, (X, y))
(ŷ, ψ) = Lux.apply(model, X, θ, ψ)
mean(abs2, ŷ .- y)end
function gradloss_zygote(model, θ, ψ, (X, y))
(∇ℓ,) = Zygote.gradient(θ) do ϑ
loss(model, ϑ, ψ, (X, y))
end
∇ℓend
function gradloss_enzyme(model, θ, ψ, (X, y))
ℓ = ϑ -> begin
loss(model, ϑ, ψ, (X, y))
end
Enzyme.gradient(Enzyme.Reverse, ℓ, θ)end
function main(rng=Random.Xoshiro(999),
model=Chain(Dense(1=>16, gelu), Dense(16=>1)),
(X, y)=makedata(rng) |> dev;
nepochs=300,
)
(θ, ψ) = Lux.setup(rng, model) |> dev
opts = Optimisers.setup(Adam(0.01f0), θ)
for j ∈ 1:nepochs
∇ℓ = gradloss_enzyme(model, θ, ψ, (X, y))
(opts, θ) = Optimisers.update!(opts, θ, ∇ℓ)
end
(ŷ, _) = Lux.apply(model, X, θ, ψ)
(y, ŷ)end
Note that
- This *works* with both Zygote and Enzyme if dev = cpu_device (i.e.
no GPU is involved at all).
- This *works* using gradloss_zygote using *either* cpu_device or
gpu_device.
- This fails rather spectacularly using gradloss_enzyme and gpu_device.
The error output is so verbose that I won't try to reproduce it all here
(it goes nuts and starts dumping LLVM IR), I expect others to be able to
reproduce a same or similar error, but the stack trace is
Stacktrace:
[1] julia_error(cstr::Cstring, val::Ptr{…}, errtype::Enzyme.API.ErrorType, data::Ptr{…}, data2::Ptr{…}, B::Ptr{…})
@ Enzyme.Compiler ~/.julia/packages/Enzyme/MIIMf/src/compiler.jl:1684
[2] EnzymeCreatePrimalAndGradient(logic::Enzyme.Logic, todiff::LLVM.Function, retType::Enzyme.API.CDIFFE_TYPE, constant_args::Vector{…}, TA::Enzyme.TypeAnalysis, returnValue::Bool, dretUsed::Bool, mode::Enzyme.API.CDerivativeMode, width::Int64, additionalArg::Ptr{…}, forceAnonymousTape::Bool, typeInfo::Enzyme.FnTypeInfo, uncacheable_args::Vector{…}, augmented::Ptr{…}, atomicAdd::Bool)
@ Enzyme.API ~/.julia/packages/Enzyme/MIIMf/src/api.jl:154
[3] enzyme!(job::GPUCompiler.CompilerJob{…}, mod::LLVM.Module, primalf::LLVM.Function, TT::Type, mode::Enzyme.API.CDerivativeMode, width::Int64, parallel::Bool, actualRetType::Type, wrap::Bool, modifiedBetween::Tuple{…}, returnPrimal::Bool, expectedTapeType::Type, loweredArgs::Set{…}, boxedArgs::Set{…})
@ Enzyme.Compiler ~/.julia/packages/Enzyme/MIIMf/src/compiler.jl:3109
[4] codegen(output::Symbol, job::GPUCompiler.CompilerJob{…}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, toplevel::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
@ Enzyme.Compiler ~/.julia/packages/Enzyme/MIIMf/src/compiler.jl:4964
[5] codegen
@ ~/.julia/packages/Enzyme/MIIMf/src/compiler.jl:4391 [inlined]
[6] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, postopt::Bool)
@ Enzyme.Compiler ~/.julia/packages/Enzyme/MIIMf/src/compiler.jl:5646
[7] _thunk
@ ~/.julia/packages/Enzyme/MIIMf/src/compiler.jl:5646 [inlined]
[8] cached_compilation
@ ~/.julia/packages/Enzyme/MIIMf/src/compiler.jl:5680 [inlined]
[9] (::Enzyme.Compiler.var"#532#533"{…})(ctx::LLVM.Context)
@ Enzyme.Compiler ~/.julia/packages/Enzyme/MIIMf/src/compiler.jl:5746
[10] JuliaContext(f::Enzyme.Compiler.var"#532#533"{…}; ***@***.***{})
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:52
[11] JuliaContext(f::Function)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:42
[12] #s1926#531
@ ~/.julia/packages/Enzyme/MIIMf/src/compiler.jl:5698 [inlined]
[13]
@ Enzyme.Compiler ./none:0
[14] (::Core.GeneratedFunctionStub)(::UInt64, ::LineNumberNode, ::Any, ::Vararg{Any})
@ Core ./boot.jl:602
[15] autodiff
@ ~/.julia/packages/Enzyme/MIIMf/src/Enzyme.jl:270 [inlined]
[16] autodiff
@ ~/.julia/packages/Enzyme/MIIMf/src/Enzyme.jl:287 [inlined]
[17] gradient
@ ~/.julia/packages/Enzyme/MIIMf/src/Enzyme.jl:938 [inlined]
[18] gradloss_enzyme(model::Chain{…}, ***@***.***{…}, ***@***.***{…}, ::Tuple{…})
@ Main ~/src/autodiff/zygote_enzyme_minimal.jl:31
[19] main(rng::Xoshiro, ***@***.***{…}, Nothing}, ::Tuple{CuArray{…}, CuArray{…}}; nepochs::Int64)
@ Main ~/src/autodiff/zygote_enzyme_minimal.jl:44
[20] main(rng::Xoshiro, ***@***.***{…}, Nothing}, ::Tuple{CuArray{…}, CuArray{…}})
@ Main ~/src/autodiff/zygote_enzyme_minimal.jl:34
[21] top-level scope
@ REPL[2]:1
[22] top-level scope
@ ~/.julia/packages/CUDA/fGE8R/src/initialization.jl:206
—
Reply to this email directly, view it on GitHub
<#1392>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJTUXCTOIDMKDPI3IBBXE3Y6BFVZAVCNFSM6AAAAABGOEXZQSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI2TCNZSGA2DANI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Full log attached. |
@ExpandingMan looks like CUDA.jl needs to have a rule added, specifically like below. What happens if you add this to your file before any AD?
cc @vchuravy |
Result looks the same to me, at least superficially. Log attached. Out of curiosity, if Enzyme is merely inactive for CuBLAS, but CuBLAS is being used in the function it's trying to differentiate (as I think would be the case here), wouldn't it, at best, return an incorrect result? I would have thought that CuBLAS would act as a frightful barrier to ever getting this working. |
That is a different error so that is progress!
And no we’re not marking cuBLAS as inactive with that, but telling enzyme
it doesn’t need to differentiate the cuBLAS parallel stream setup .
In any case probably need to open a PR for adding this to CUDA.jl
…On Sat, Apr 20, 2024 at 1:10 PM ExpandingMan ***@***.***> wrote:
Result looks the same to me, at least superficially. Log attached.
crash.log <https://github.com/EnzymeAD/Enzyme.jl/files/15049133/crash.log>
Out of curiosity, if Enzyme is merely inactive for CuBLAS, but CuBLAS is
being used in the function it's trying to differentiate (as I think would
be the case here), wouldn't it, at best, return an incorrect result? I
would have thought that CuBLAS would act as a frightful barrier to ever
getting this working.
—
Reply to this email directly, view it on GitHub
<#1392 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJTUXGKPAEIPYEO4VZWA33Y6KOPBAVCNFSM6AAAAABGOEXZQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRXG4ZTEMBVGM>
.
You are receiving this because you commented.Message ID: <EnzymeAD/Enzyme.
***@***.***>
|
This now hits:
which is equivalent to LuxDL/LuxLib.jl#148 so moving the issue there |
This issue is to provide a minimal example of neural network training with Lux to hopefully make it easier for developers to work toward making it viable. It probably isn't news to anyone here that this example fails, but it was indicated to me on slack that it would still be helpful to have this issue for reference.
In this example, we train a neural network with zero hidden layers to approximate the polynomial$x^2 - 2x$ . It is trivial to generalize this to deeper neural networks but probably not useful for this demonstration. This roughly follows the Lux tutorial here, but I have stripped out the opaque Lux training stuff so that it's more clear what's going on. I expect this example should be simpler to diagnose than the equivalent with Flux, as the explicit parameterization of Lux makes it easier to reason about, but I also expect that if this example were working the analogous Flux example surely would. Indeed, I think this example is a good proxy for a huge number of common use cases.
Note that
dev = cpu_device
(i.e. no GPU is involved at all).gradloss_zygote
using eithercpu_device
orgpu_device
.gradloss_enzyme
andgpu_device
.The error output is so verbose that I won't try to reproduce it all here (it goes nuts and starts dumping LLVM IR), I expect others to be able to reproduce a same or similar error, but the stack trace is
The text was updated successfully, but these errors were encountered: