-
-
Notifications
You must be signed in to change notification settings - Fork 607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding device objects for selecting GPU backends (and defaulting to CPU if none exists). #2297
Merged
Merged
Changes from 8 commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
5d4f2a2
Adding structs for cpu and gpu devices.
codetalker7 70044fb
Adding implementation of `Flux.get_device()`, which returns the most
codetalker7 0dc5629
Adding docstrings for the new device types, and the `get_device` func…
codetalker7 a3f9257
Adding `CPU` to the list of supported backends. Made corresponding
codetalker7 18938de
Using `julia-repl` instead of `jldoctest`, and `@info` instead of `@w…
codetalker7 bf134ad
Adding `DataLoader` functionality to device objects.
codetalker7 f8fc22c
Removing pkgids and defining new functions to check whether backend is
codetalker7 3cd1d89
Correcting typographical errors, and removing useless imports.
codetalker7 f7f21e1
Adding `deviceID` to each device struct, and moving struct definitions
codetalker7 d22aaf5
Adding tutorial for using device objects in manual.
codetalker7 03faa96
Adding docstring for `get_device` in manual, and renaming internal
codetalker7 e1ad3e7
Minor change in docs.
codetalker7 179bbea
Removing structs from package extensions as it is bad practice.
codetalker7 bb67ad6
Adding more docstrings in manual.
codetalker7 7be1700
Removing redundant log messages.
codetalker7 7558d29
Adding kwarg to `get_device` for verbose output.
codetalker7 95e3bc3
Setting `deviceID` to `nothing` if GPU is not functional.
codetalker7 40b1fe2
Adding basic tests for device objects.
codetalker7 df70154
Fixing minor errors in package extensions and tests.
codetalker7 650a273
Minor fix in tests + docs.
codetalker7 1495e04
Moving device tests to extensions, and adding a basic data transfer
codetalker7 b07985b
Moving all device tests in single file per extension.
codetalker7 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use dispatch to get these fixed names and use the fields to instead store info about the actual device? e.g. ordinal number or wrapping the actual device type(s) from each GPU package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll try to add this to the structs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ToucheSir. I've added a
deviceID
to each device struct, whose type is the device type from the corresponding GPU package. SinceKernelAbstractions
orGPUArrays
doesn't have any type hierarchy for device objects, I've moved the struct definitions to the package extensions. The device types areCUDA.CuDevice
,AMDGPU.HIPDevice
andMetal.MTLDevice
respectively.One disadvantage of this approach: from what I understand, Flux leaves the work of managing devices to the GPU packages. So, if the user chooses to switch a device by using functions from the GPU package, then our
device
object will also have to be updated (which currently isn't the case). But if users of Flux don't care about what device is allocated to them, I think this works fine.What do you think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my mind, the whole point of calling this a device instead of a backend is that we'd allow users to choose which device they want their model to be transferred onto. If that's not feasible because of limitations in the way GPU packages must be used, I'd rather just call these backends instead. Others might have differing opinions on this, however, cc @CarloLucibello from earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. Also, if a user wants to have finer control over which device they want to use, isn't it better for them to just rely on CUDA.jl for example?
If not, I think it won't be hard to add a device selection capability within Flux as well. But ultimately, we will be calling functions from GPU packages, which the user can just call themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I'm fine with either; also, if we are to implement an interface for handling multiple devices, wouldn't it be a good idea to first discuss the overall API we want, and the specific implementation details we need (asking because I'm not completely aware of what all I'll have to implement to handle multiple devices)?
For instance, when we are talking about "multiple devices", do we mean providing the user the functionality to use "just one device", but have the ability to choose which one? Or do we mean using multiple devices simultaneously to train models? For the latter I was going through DaggerFlux.jl and it seems it's more non-trivial. The first idea seems easier to implement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere in the middle I think. Training on multiple GPUs is out of scope for this PR (we have other efforts looking into that), but allowing users to transfer models to any active GPU without calling
device!
beforehand every time would be great for ergonomics.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I think this shouldn't be too hard to implement. I have one idea for this.
Device methods
We will have the following methods:
With these functions, users can then specify the backend + ordinal of the GPU device which they want to work with.
Model transfer between devices
Next, suppose we have a model
m
which is bound to anAbstractDevice
, saydevice1
, which has abackend1::Type{<:KA.GPU}
and anordinal1::UInt
. Supposedevice2
is another device object withbackend2::Type{<:KA.GPU}
andordinal2::UInt
.Then, a call to
device2(m)
will do the following: ifbackend1 == backend2
andordinal1 == ordinal2
, then nothing happens andm
is returned. Otherwise,device1
is "freed" ofm
(we'll have to do some device memory management here) and is bound todevice2
.In the above, the tricky part is how to identify the GPU backend + ordinal which
m
is bound to, and how to do free the memory taken bym
on the device. For simple models likeDense
, I can do the followingNow clearly, I can't do something similar if
m
is a complex model. So we'll probably have to add some property to models which stores the device backend + ordinal to which they are bound.Regarding the freeing the GPU device memory: for CUDA for example, we can probably use the
CUDA.unsafe_free!
method. But it might be unsafe for a reason.How does this idea sound, @ToucheSir @CarloLucibello? Any pointers/suggestions on how to track which device a model is bound to and how to do the memory management?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the actual data movement adaptors (e.g.
FluxCUDAAdaptor
) receives the device ID as an argument, then you only need to apply your detect + free logic at the level of individual parameters.fmap
will take care of mapping the logic over a complex model.In the simple case we are talking about, every parameter in the model should be bound to the same device. In general, model parallelism means that a model could be across multiple devices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the only thing we need to worry about is "can I move this array to this device the user asked for?" Which sounds simple but might be tricky in practice if the GPU packages don't provide a way to do that directly. I hope here's a relatively straightforward way for most of them, but if not we can save that for future work and/or bug upstream to add it in for us :)