Skip to content

Commit

Permalink
Adding device objects for selecting GPU backends (and defaulting to C…
Browse files Browse the repository at this point in the history
…PU if none exists). (#2297)

* Adding structs for cpu and gpu devices.

* Adding implementation of `Flux.get_device()`, which returns the most
appropriate GPU backend (or CPU, if nothing is available).

* Adding docstrings for the new device types, and the `get_device` function.

* Adding `CPU` to the list of supported backends. Made corresponding
changes in `gpu(x)`. Adding more details in docstring of `get_device`.

* Using `julia-repl` instead of `jldoctest`, and `@info` instead of `@warn`.

* Adding `DataLoader` functionality to device objects.

* Removing pkgids and defining new functions to check whether backend is
available and functional.

* Correcting typographical errors, and removing useless imports.

* Adding `deviceID` to each device struct, and moving struct definitions
to package extensions.

* Adding tutorial for using device objects in manual.

* Adding docstring for `get_device` in manual, and renaming internal
functions.

* Minor change in docs.

* Removing structs from package extensions as it is bad practice.

* Adding more docstrings in manual.

* Removing redundant log messages.

* Adding kwarg to `get_device` for verbose output.

* Setting `deviceID` to `nothing` if GPU is not functional.

* Adding basic tests for device objects.

* Fixing minor errors in package extensions and tests.

* Minor fix in tests + docs.

* Moving device tests to extensions, and adding a basic data transfer
test.

* Moving all device tests in single file per extension.
  • Loading branch information
codetalker7 authored Aug 4, 2023
1 parent c565052 commit c2bd39d
Show file tree
Hide file tree
Showing 10 changed files with 413 additions and 5 deletions.
89 changes: 89 additions & 0 deletions docs/src/gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -231,3 +231,92 @@ $ export CUDA_VISIBLE_DEVICES='0,1'

More information for conditional use of GPUs in CUDA.jl can be found in its [documentation](https://cuda.juliagpu.org/stable/installation/conditional/#Conditional-use), and information about the specific use of the variable is described in the [Nvidia CUDA blog post](https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/).

## Using device objects

As a more convenient syntax, Flux allows the usage of GPU `device` objects which can be used to easily transfer models to GPUs (and defaulting to using the CPU if no GPU backend is available). This syntax has a few advantages including automatic selection of the GPU backend and type stability of data movement. To do this, the [`Flux.get_device`](@ref) function can be used.

`Flux.get_device` first checks for a GPU preference, and if possible returns a device for the preference backend. For instance, consider the following example, where we load the [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) package to use an NVIDIA GPU (`"CUDA"` is the default preference):

```julia-repl
julia> using Flux, CUDA;
julia> device = Flux.get_device(; verbose=true) # returns handle to an NVIDIA GPU
[ Info: Using backend set in preferences: CUDA.
(::Flux.FluxCUDADevice) (generic function with 1 method)
julia> device.deviceID # check the id of the GPU
CuDevice(0): NVIDIA GeForce GTX 1650
julia> model = Dense(2 => 3);
julia> model.weight # the model initially lives in CPU memory
3×2 Matrix{Float32}:
-0.984794 -0.904345
0.720379 -0.486398
0.851011 -0.586942
julia> model = model |> device # transfer model to the GPU
Dense(2 => 3) # 9 parameters
julia> model.weight
3×2 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
-0.984794 -0.904345
0.720379 -0.486398
0.851011 -0.586942
```

The device preference can also be set via the [`Flux.gpu_backend!`](@ref) function. For instance, below we first set our device preference to `"CPU"`:

```julia-repl
julia> using Flux; Flux.gpu_backend!("CPU")
┌ Info: New GPU backend set: CPU.
└ Restart your Julia session for this change to take effect!
```

Then, after restarting the Julia session, `Flux.get_device` returns a handle to the `"CPU"`:

```julia-repl
julia> using Flux, CUDA; # even if CUDA is loaded, we'll still get a CPU device
julia> device = Flux.get_device(; verbose=true) # get a CPU device
[ Info: Using backend set in preferences: CPU.
(::Flux.FluxCPUDevice) (generic function with 1 method)
julia> model = Dense(2 => 3);
julia> model = model |> device
Dense(2 => 3) # 9 parameters
julia> model.weight # no change; model still lives on CPU
3×2 Matrix{Float32}:
-0.942968 0.856258
0.440009 0.714106
-0.419192 -0.471838
```
Clearly, this means that the same code will work for any GPU backend and the CPU.

If the preference backend isn't available or isn't functional, then [`Flux.get_device`](@ref) looks for a CUDA, AMD or Metal backend, and returns a corresponding device (if the backend is available and functional). Otherwise, a CPU device is returned. In the below example, the GPU preference is `"CUDA"`:

```julia-repl
julia> using Flux; # preference is CUDA, but CUDA.jl not loaded
julia> device = Flux.get_device(; verbose=true) # this will resort to automatic device selection
[ Info: Using backend set in preferences: CUDA.
┌ Warning: Trying to use backend: CUDA but it's trigger package is not loaded.
│ Please load the package and call this function again to respect the preferences backend.
└ @ Flux ~/fluxml/Flux.jl/src/functor.jl:637
[ Info: Using backend: CPU.
(::Flux.FluxCPUDevice) (generic function with 1 method)
```
For detailed information about how the backend is selected, check the documentation for [`Flux.get_device`](@ref).

```@docs
Flux.AbstractDevice
Flux.FluxCPUDevice
Flux.FluxCUDADevice
Flux.FluxAMDDevice
Flux.FluxMetalDevice
Flux.supported_devices
Flux.get_device
```
4 changes: 4 additions & 0 deletions ext/FluxAMDGPUExt/FluxAMDGPUExt.jl
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ const MIOPENFloat = AMDGPU.MIOpen.MIOPENFloat
# Set to boolean on the first call to check_use_amdgpu
const USE_AMDGPU = Ref{Union{Nothing, Bool}}(nothing)

Flux._isavailable(::Flux.FluxAMDDevice) = true
Flux._isfunctional(::Flux.FluxAMDDevice) = AMDGPU.functional()

function check_use_amdgpu()
if !isnothing(USE_AMDGPU[])
return
Expand Down Expand Up @@ -44,6 +47,7 @@ include("conv.jl")

function __init__()
Flux.AMDGPU_LOADED[] = true
Flux.DEVICES[][Flux.GPU_BACKEND_ORDER["AMD"]] = AMDGPU.functional() ? Flux.FluxAMDDevice(AMDGPU.device()) : Flux.FluxAMDDevice(nothing)
end

# TODO
Expand Down
6 changes: 6 additions & 0 deletions ext/FluxCUDAExt/FluxCUDAExt.jl
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ import Adapt: adapt_storage

const USE_CUDA = Ref{Union{Nothing, Bool}}(nothing)

Flux._isavailable(::Flux.FluxCUDADevice) = true
Flux._isfunctional(::Flux.FluxCUDADevice) = CUDA.functional()

function check_use_cuda()
if !isnothing(USE_CUDA[])
return
Expand All @@ -36,6 +39,9 @@ include("functor.jl")
function __init__()
Flux.CUDA_LOADED[] = true

## add device to available devices
Flux.DEVICES[][Flux.GPU_BACKEND_ORDER["CUDA"]] = CUDA.functional() ? Flux.FluxCUDADevice(CUDA.device()) : Flux.FluxCUDADevice(nothing)

try
Base.require(Main, :cuDNN)
catch
Expand Down
4 changes: 4 additions & 0 deletions ext/FluxMetalExt/FluxMetalExt.jl
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ using Zygote

const USE_METAL = Ref{Union{Nothing, Bool}}(nothing)

Flux._isavailable(::Flux.FluxMetalDevice) = true
Flux._isfunctional(::Flux.FluxMetalDevice) = Metal.functional()

function check_use_metal()
isnothing(USE_METAL[]) || return

Expand All @@ -30,6 +33,7 @@ include("functor.jl")

function __init__()
Flux.METAL_LOADED[] = true
Flux.DEVICES[][Flux.GPU_BACKEND_ORDER["Metal"]] = Metal.functional() ? Flux.FluxMetalDevice(Metal.current_device()) : Flux.FluxMetalDevice(nothing)
end

end
218 changes: 217 additions & 1 deletion src/functor.jl
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,16 @@ _isbitsarray(x) = false
_isleaf(::AbstractRNG) = true
_isleaf(x) = _isbitsarray(x) || Functors.isleaf(x)

const GPU_BACKENDS = ("CUDA", "AMD", "Metal")
const GPU_BACKEND_ORDER = sort(
Dict(
"CUDA" => 1,
"AMD" => 2,
"Metal" => 3,
"CPU" => 4,
),
byvalue = true
)
const GPU_BACKENDS = tuple(collect(keys(GPU_BACKEND_ORDER))...)
const GPU_BACKEND = @load_preference("gpu_backend", "CUDA")

function gpu_backend!(backend::String)
Expand Down Expand Up @@ -249,6 +258,8 @@ function gpu(x)
gpu(FluxAMDAdaptor(), x)
elseif GPU_BACKEND == "Metal"
gpu(FluxMetalAdaptor(), x)
elseif GPU_BACKEND == "CPU"
cpu(x)
else
error("""
Unsupported GPU backend: $GPU_BACKEND.
Expand Down Expand Up @@ -444,3 +455,208 @@ function gpu(d::MLUtils.DataLoader)
d.rng,
)
end

# Defining device interfaces.
"""
Flux.AbstractDevice <: Function
An abstract type representing `device` objects for different GPU backends. The currently supported backends are `"CUDA"`, `"AMD"`, `"Metal"` and `"CPU"`; the `"CPU"` backend is the fallback case when no GPU is available. GPU extensions of Flux define subtypes of this type.
"""
abstract type AbstractDevice <: Function end

function (device::AbstractDevice)(d::MLUtils.DataLoader)
MLUtils.DataLoader(MLUtils.mapobs(device, d.data),
d.batchsize,
d.buffer,
d.partial,
d.shuffle,
d.parallel,
d.collate,
d.rng,
)
end

function _get_device_name(::T)::String where {T <: AbstractDevice} end

## check device availability; more definitions in corresponding extensions
_isavailable(::Nothing) = false
_isfunctional(::Nothing) = false

_isavailable(::AbstractDevice) = false
_isfunctional(::AbstractDevice) = false

"""
Flux.FluxCPUDevice <: Flux.AbstractDevice
A type representing `device` objects for the `"CPU"` backend for Flux. This is the fallback case when no GPU is available to Flux.
"""
Base.@kwdef struct FluxCPUDevice <: AbstractDevice end

(::FluxCPUDevice)(x) = cpu(x)
_isavailable(::FluxCPUDevice) = true
_isfunctional(::FluxCPUDevice) = true
_get_device_name(::FluxCPUDevice) = "CPU"

"""
FluxCUDADevice <: AbstractDevice
A type representing `device` objects for the `"CUDA"` backend for Flux.
"""
Base.@kwdef struct FluxCUDADevice <: AbstractDevice
deviceID
end

(::FluxCUDADevice)(x) = gpu(FluxCUDAAdaptor(), x)
_get_device_name(::FluxCUDADevice) = "CUDA"

"""
FluxAMDDevice <: AbstractDevice
A type representing `device` objects for the `"AMD"` backend for Flux.
"""
Base.@kwdef struct FluxAMDDevice <: AbstractDevice
deviceID
end

(::FluxAMDDevice)(x) = gpu(FluxAMDAdaptor(), x)
_get_device_name(::FluxAMDDevice) = "AMD"

"""
FluxMetalDevice <: AbstractDevice
A type representing `device` objects for the `"Metal"` backend for Flux.
"""
Base.@kwdef struct FluxMetalDevice <: AbstractDevice
deviceID
end

(::FluxMetalDevice)(x) = gpu(FluxMetalAdaptor(), x)
_get_device_name(::FluxMetalDevice) = "Metal"

## device list. order is important
const DEVICES = Ref{Vector{Union{Nothing, AbstractDevice}}}(Vector{Union{Nothing, AbstractDevice}}(nothing, length(GPU_BACKENDS)))
DEVICES[][GPU_BACKEND_ORDER["CPU"]] = FluxCPUDevice()

## get device

"""
Flux.supported_devices()
Get all supported backends for Flux, in order of preference.
# Example
```jldoctest
julia> using Flux;
julia> Flux.supported_devices()
("CUDA", "AMD", "Metal", "CPU")
```
"""
supported_devices() = GPU_BACKENDS

"""
Flux.get_device(; verbose=false)::AbstractDevice
Returns a `device` object for the most appropriate backend for the current Julia session.
First, the function checks whether a backend preference has been set via the [`Flux.gpu_backend!`](@ref) function. If so, an attempt is made to load this backend. If the corresponding trigger package has been loaded and the backend is functional, a `device` corresponding to the given backend is loaded. Otherwise, the backend is chosen automatically. To update the backend preference, use [`Flux.gpu_backend!`](@ref).
If there is no preference, then for each of the `"CUDA"`, `"AMD"`, `"Metal"` and `"CPU"` backends in the given order, this function checks whether the given backend has been loaded via the corresponding trigger package, and whether the backend is functional. If so, the `device` corresponding to the backend is returned. If no GPU backend is available, a `Flux.FluxCPUDevice` is returned.
If `verbose` is set to `true`, then the function prints informative log messages.
# Examples
For the example given below, the backend preference was set to `"AMD"` via the [`gpu_backend!`](@ref) function.
```julia-repl
julia> using Flux;
julia> model = Dense(2 => 3)
Dense(2 => 3) # 9 parameters
julia> device = Flux.get_device(; verbose=true) # this will just load the CPU device
[ Info: Using backend set in preferences: AMD.
┌ Warning: Trying to use backend: AMD but it's trigger package is not loaded.
│ Please load the package and call this function again to respect the preferences backend.
└ @ Flux ~/fluxml/Flux.jl/src/functor.jl:638
[ Info: Using backend: CPU.
(::Flux.FluxCPUDevice) (generic function with 1 method)
julia> model = model |> device
Dense(2 => 3) # 9 parameters
julia> model.weight
3×2 Matrix{Float32}:
-0.304362 -0.700477
-0.861201 0.67825
-0.176017 0.234188
```
Here is the same example, but using `"CUDA"`:
```julia-repl
julia> using Flux, CUDA;
julia> model = Dense(2 => 3)
Dense(2 => 3) # 9 parameters
julia> device = Flux.get_device(; verbose=true)
[ Info: Using backend set in preferences: AMD.
┌ Warning: Trying to use backend: AMD but it's trigger package is not loaded.
│ Please load the package and call this function again to respect the preferences backend.
└ @ Flux ~/fluxml/Flux.jl/src/functor.jl:637
[ Info: Using backend: CUDA.
(::Flux.FluxCUDADevice) (generic function with 1 method)
julia> model = model |> device
Dense(2 => 3) # 9 parameters
julia> model.weight
3×2 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
0.820013 0.527131
-0.915589 0.549048
0.290744 -0.0592499
```
"""
function get_device(; verbose=false)::AbstractDevice
backend = @load_preference("gpu_backend", nothing)

if backend !== nothing
allowed_backends = supported_devices()
idx = findfirst(isequal(backend), allowed_backends)
if backend allowed_backends
@warn """
`gpu_backend` preference is set to $backend, which is not allowed.
Defaulting to automatic device selection.
""" maxlog=1
else
verbose && @info "Using backend set in preferences: $backend."
device = DEVICES[][idx]

if !_isavailable(device)
@warn """
Trying to use backend: $backend but it's trigger package is not loaded.
Please load the package and call this function again to respect the preferences backend.
"""
else
if _isfunctional(device)
return device
else
@warn "Backend: $backend from the set preferences is not functional. Defaulting to automatic device selection."
end
end
end
end

for backend in GPU_BACKENDS
device = DEVICES[][GPU_BACKEND_ORDER[backend]]
if _isavailable(device)
if _isfunctional(device)
verbose && @info "Using backend: $backend."
return device
end
end
end
end
Loading

0 comments on commit c2bd39d

Please sign in to comment.