Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA throws OOM error when initializing API on multiple devices #398

Closed
andevellicus opened this issue Aug 27, 2020 · 7 comments
Closed

Comments

@andevellicus
Copy link

andevellicus commented Aug 27, 2020

I have a K80 with two cores, and usually what I'll end up doing is training different models simultaneously and independantly, one on each core. Since I deal with 3D images, I usually run pretty close to the limit for each core. Recently, while I've been running a model on one core, I'll get an OOM error when trying to specify the second device using CUDA.device!(1) despite the fact that the second device has plenty of memory. I'll eventually get my model training, but it usually takes restarting the code a few times. Stacktrace below:

ERROR: LoadError: LoadError: InitError: CUDA error (code 2, CUDA_ERROR_OUT_OF_MEMORY)
Stacktrace:
 [1] throw_api_error(::CUDA.cudaError_enum) at /home/andevellicus/.julia/packages/CUDA/dZvbp/lib/cudadrv/error.jl:103
 [2] macro expansion at /home/andevellicus/.julia/packages/CUDA/dZvbp/lib/cudadrv/error.jl:110 [inlined]
 [3] cuDevicePrimaryCtxRetain(::Base.RefValue{Ptr{Nothing}}, ::CUDA.CuDevice) at /home/andevellicus/.julia/packages/CUDA/dZvbp/lib/utils/call.jl:93
 [4] CuContext at /home/andevellicus/.julia/packages/CUDA/dZvbp/lib/cudadrv/context/primary.jl:31 [inlined]
 [5] context at /home/andevellicus/.julia/packages/CUDA/dZvbp/src/state.jl:249 [inlined]
 [6] device!(::CUDA.CuDevice, ::Nothing) at /home/andevellicus/.julia/packages/CUDA/dZvbp/src/state.jl:286
 [7] device! at /home/andevellicus/.julia/packages/CUDA/dZvbp/src/state.jl:265 [inlined]
 [8] mem at /home/andevellicus/.julia/packages/Knet/Mfd6L/src/Knet.jl:18 [inlined]
 [9] #1 at ./none:0 [inlined]
 [10] iterate at ./generator.jl:47 [inlined]
 [11] _all(::Base.var"#256#258", ::Base.Generator{CUDA.DeviceSet,Knet.var"#1#3"{Knet.var"#mem#2"}}, ::Colon) at ./reduce.jl:827
 [12] all at ./reduce.jl:823 [inlined]
 [13] Dict(::Base.Generator{CUDA.DeviceSet,Knet.var"#1#3"{Knet.var"#mem#2"}}) at ./dict.jl:130
 [14] __init__() at /home/andevellicus/.julia/packages/Knet/Mfd6L/src/Knet.jl:19
 [15] _include_from_serialized(::String, ::Array{Any,1}) at ./loading.jl:697
 [16] _require_search_from_serialized(::Base.PkgId, ::String) at ./loading.jl:782
 [17] _require(::Base.PkgId) at ./loading.jl:1007
 [18] require(::Base.PkgId) at ./loading.jl:928
 [19] require(::Module, ::Symbol) at ./loading.jl:923
 [20] include(::String) at ./client.jl:457
 [21] top-level scope at /home/andevellicus/Programming/ML/julia/knet-sdh-seg/train.jl:1
 [22] include(::Function, ::Module, ::String) at ./Base.jl:380
 [23] include(::Module, ::String) at ./Base.jl:368
 [24] exec_options(::Base.JLOptions) at ./client.jl:296
 [25] _start() at ./client.jl:506
during initialization of module Knet
in expression starting at /home/andevellicus/Programming/ML/julia/knet-sdh-seg/SDHSeg.jl:4
in expression starting at /home/andevellicus/Programming/ML/julia/knet-sdh-seg/train.jl:1
caused by [exception 1]
CUDA error (code 2, CUDA_ERROR_OUT_OF_MEMORY)
Stacktrace:
 [1] throw_api_error(::CUDA.cudaError_enum) at /home/andevellicus/.julia/packages/CUDA/dZvbp/lib/cudadrv/error.jl:103
 [2] macro expansion at /home/andevellicus/.julia/packages/CUDA/dZvbp/lib/cudadrv/error.jl:110 [inlined]
 [3] cuDevicePrimaryCtxRetain(::Base.RefValue{Ptr{Nothing}}, ::CUDA.CuDevice) at /home/andevellicus/.julia/packages/CUDA/dZvbp/lib/utils/call.jl:93
 [4] CuContext at /home/andevellicus/.julia/packages/CUDA/dZvbp/lib/cudadrv/context/primary.jl:31 [inlined]
 [5] context at /home/andevellicus/.julia/packages/CUDA/dZvbp/src/state.jl:249 [inlined]
 [6] device!(::CUDA.CuDevice, ::Nothing) at /home/andevellicus/.julia/packages/CUDA/dZvbp/src/state.jl:286
 [7] device! at /home/andevellicus/.julia/packages/CUDA/dZvbp/src/state.jl:265 [inlined]
 [8] mem at /home/andevellicus/.julia/packages/Knet/Mfd6L/src/Knet.jl:18 [inlined]
 [9] #1 at ./none:0 [inlined]
 [10] iterate at ./generator.jl:47 [inlined]
 [11] Dict{CUDA.CuDevice,Int64}(::Base.Generator{CUDA.DeviceSet,Knet.var"#1#3"{Knet.var"#mem#2"}}) at ./dict.jl:102
 [12] dict_with_eltype at ./abstractdict.jl:531 [inlined]
 [13] dict_with_eltype at ./abstractdict.jl:538 [inlined]
 [14] Dict(::Base.Generator{CUDA.DeviceSet,Knet.var"#1#3"{Knet.var"#mem#2"}}) at ./dict.jl:128
 [15] __init__() at /home/andevellicus/.julia/packages/Knet/Mfd6L/src/Knet.jl:19
 [16] _include_from_serialized(::String, ::Array{Any,1}) at ./loading.jl:697
 [17] _require_search_from_serialized(::Base.PkgId, ::String) at ./loading.jl:782
 [18] _require(::Base.PkgId) at ./loading.jl:1007
 [19] require(::Base.PkgId) at ./loading.jl:928
 [20] require(::Module, ::Symbol) at ./loading.jl:923
 [21] include(::String) at ./client.jl:457
 [22] top-level scope at /home/andevellicus/Programming/ML/julia/knet-sdh-seg/train.jl:1
 [23] include(::Function, ::Module, ::String) at ./Base.jl:380
 [24] include(::Module, ::String) at ./Base.jl:368
 [25] exec_options(::Base.JLOptions) at ./client.jl:296
 [26] _start() at ./client.jl:506

Usually I'll have CUDA.device!(0) full, and CUDA.device!(1) ready to go.

My system is:

  • Julia 1.5
  • Knet 1.4
  • CUDA 1.3.3
@andevellicus andevellicus added the bug Something isn't working label Aug 27, 2020
@andevellicus andevellicus changed the title CUDA throws OOM error when initializing API on multi-core devices CUDA throws OOM error when initializing API on multiple devices Aug 28, 2020
@maleadt
Copy link
Member

maleadt commented Aug 28, 2020

You're not using the other device exclusively, the Knet code in your stack trace initializes a context for all devices. This is bad behavior on Knet's side, because it allocates a context and initializes memory for every device upon package load time, which (as observed here) easily consumes 100-200 MB of device-memory per device. cc @denizyuret

@maleadt maleadt closed this as completed Aug 28, 2020
@maleadt
Copy link
Member

maleadt commented Aug 28, 2020

Specifically, @denizyuret, don't do this: https://github.com/denizyuret/Knet.jl/blob/2754cd6d4f61d5e509810461d8dcafb5efa2f79e/src/Knet.jl#L18-L19

Doing a device! during __init__ is bad, because of the memory it uses (i.e. this issue). Resetting the device there is even worse: it kills any existing allocations, making it impossible to import Knet later in the program, or use it in collaboration with other CUDA software. The available memory represented by mem is also not going to correct because, well, you reset the device afterwards.

@maleadt maleadt removed the bug Something isn't working label Aug 28, 2020
@denizyuret
Copy link
Contributor

denizyuret commented Aug 28, 2020 via email

@maleadt
Copy link
Member

maleadt commented Aug 28, 2020

@maleadt What is the workaround?

There is no workaround. Don't initialize the device if you don't need it.

NVML is not only not available on all platforms, but many users don't have it installed, and since it comes with the driver we can't provision it using artifacts. So I'd try NVML (CUDA.jl has wrappers for it), but if it isn't available just trust what CUDA selected as primary device, which is already heuristic-driven.

@maleadt
Copy link
Member

maleadt commented Aug 28, 2020

e.g. our tests do the following (which is a more complicated heuristic because it also looks at compute capability):

CUDA.jl/test/runtests.jl

Lines 119 to 159 in 892e649

# find suitable devices
@info "System information:\n" * sprint(io->CUDA.versioninfo(io))
candidates, driver_version, cuda_driver_version = if has_nvml()
[(index=i,
uuid=NVML.uuid(dev),
name=NVML.name(dev),
cap=NVML.compute_capability(dev),
mem=NVML.memory_info(dev).free)
for (i,dev) in enumerate(NVML.devices())],
NVML.driver_version(),
NVML.cuda_driver_version()
else
# using CUDA to query this information requires initializing a context,
# which might fail if the device is heavily loaded.
[(device!(dev);
(index=i,
uuid=uuid(dev),
name=CUDA.name(dev),
cap=capability(dev),
mem=CUDA.available_memory()))
for (i,dev) in enumerate(devices())],
"(unknown)",
CUDA.version()
end
## only consider devices that are fully supported by our CUDA toolkit, or tools can fail.
## NOTE: we don't reuse target_support which is also bounded by LLVM support,
# and is used to pick a codegen target regardless of the actual device.
cuda_support = CUDA.cuda_compat()
filter!(x->x.cap in cuda_support.cap, candidates)
## only consider recent devices if we want testing to be thorough
thorough = parse(Bool, get(ENV, "CI_THOROUGH", "false"))
if thorough
filter!(x->x.cap >= v"7.0", candidates)
end
isempty(candidates) && error("Could not find any suitable device for this configuration")
## order by available memory, but also by capability if testing needs to be thorough
sort!(candidates, by=x->x.mem)
## apply
picks = reverse(candidates[end-gpus+1:end]) # best GPU first
ENV["CUDA_VISIBLE_DEVICES"] = join(map(pick->"GPU-$(pick.uuid)", picks), ",")
@info "Testing using $(length(picks)) device(s): " * join(map(pick->"$(pick.index). $(pick.name) (UUID $(pick.uuid))", picks), ", ")

But I'd leave out the !has_nvml() fallback to avoid this issue here.

@andevellicus
Copy link
Author

Thanks for looking at it, sorry to have misplaced it.

@maleadt
Copy link
Member

maleadt commented Aug 28, 2020

No problem. Come to think of it, there is a workaround (which isn't usable by Knet though): define CUDA_VISIBLE_DEVICES=... in your environment before launching Julia. That way, Knet won't be able to initialize the other devices, and trigger the OOM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants