Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Models with dropout effect GLOBAL_RNG differently when run on GPU #1372

Closed
ablaom opened this issue Oct 26, 2020 · 3 comments
Closed

Models with dropout effect GLOBAL_RNG differently when run on GPU #1372

ablaom opened this issue Oct 26, 2020 · 3 comments

Comments

@ablaom
Copy link

ablaom commented Oct 26, 2020

I have been comparing some Flux models on the CPU and GPU and was surprised to find they were not giving similar predictions. It took me quite some time to nail this down to the problem with dropout given in the comment title and demonstrated below. Maybe I'm missing something, but it seems to me this is serious impediment to reproducibility using Flux.

using Flux
import Random.seed!
seed!(123);

data = [(rand(Float32, 5, 1), rand(Float32, 5)), ]

model = Flux.Chain(Flux.Dense(5, 2, identity),
                  Flux.Dropout(0.5),
                  Flux.Dense(2, 1))

data_gpu = gpu(data);
model_gpu = gpu(model)

loss(x, y)     = Flux.mse(model(x),     y)
loss_gpu(x, y) = Flux.mse(model_gpu(x), y)

optimiser = Flux.ADAM()

# cpu training:
seed!(123);
Flux.train!(loss, Flux.params(model), data, optimiser)
rand() # 0.6739586945680673


# gpu training: 
seed!(123);
Flux.train!(loss_gpu, Flux.params(model_gpu), data_gpu, optimiser)
rand() # 0.13672511011651545  <----------------------- should be the same as a `rand()` above

If one removes Dropout from the chain, the rand() calls return the same value, as expected.

julia> versioninfo()
Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel Xeon Processor (Skylake, IBRS)
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake-avx512)

(jl_4sVBIY) pkg> st
Status `/tmp/jl_4sVBIY/Project.toml`
[587475ba] Flux v0.11.1
@DhairyaLGandhi
Copy link
Member

There was a similar concern brought up earlier.
cc @maleadt who had given the explanation for it then iirc

@maleadt
Copy link
Collaborator

maleadt commented Oct 28, 2020

I don't recall an issue with the CPU's global RNG behaving differently depending on whether the model is on the GPU or not.
@ablaom you could try redefining some inner method of e.g. MersenneTwister to have it trace where randomness is requested.

@CarloLucibello
Copy link
Member

you can use CUDA.seed! to get reproducible sequences on gpu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants