How to use more local parameters with a new CUDA device? #433
Replies: 4 comments
-
The Regarding the CPU version, make however sure that you can get/set the additional parameter with the get_device_parameters function. Otherwise checkpointing will not work properly. |
Beta Was this translation helpful? Give feedback.
-
Dear @maljoras Upon implementing the CUDA version, I discovered a new function that is not defined in the CPU code: As you suggested before in #379 (comment)_, I am using the So, how does the Thanks a lot! Zhenming |
Beta Was this translation helpful? Give feedback.
-
I see that But what function does this serves? Isn't the update calculated by converting Why is there a shortcut that does not consider the device dynamics? Is this for the FloatingPointDevice? But why is it in Thanks for your time. Zhenming |
Beta Was this translation helpful? Give feedback.
-
Hi @ZhenmingYu: |
Beta Was this translation helpful? Give feedback.
-
Description and motivation
@maljoras Thanks for answering all my questions.
As mentioned in previous issues, I am currently trying to develop a new device model for aihwkit.
The CPU version seems to be working now, and I am moving on to the CUDA version.
My device model requires 4 local (device-specific) variables to enable dtod variations.
In the CPU version, this is rather straightforward, I simply need to declare more matrices to store them:
However, In the CUDA version, this is not possible, as the device update function only supports 2 device-specific parameters, wrapped in
par_2
.Proposed solution
I tried to trace this down the road, and found out that I need to override [this macro]
to cast params_2 differently.(https://github.com/IBM/aihwkit/blob/master/src/rpucuda/cuda/pwu_kernel.h#L236-L248)
Is this even possible without a significant change to the code structure?
If I do change par_2 to hold 4 float values, are there any other functions that I need to re-implement other than the
operator()
function inside each device?Alternatives and other information
I could also try to swap out some part of
par_4
to hold my parameters.For this, I only need to overwrite the values defined in PulsedRPUDeviceCudaBase::populateFrom(), which is easy to do in the HOST_COPY_BODY part of BUILD_PULSED_DEVICE_CONSTRUCTORS_CUDA.
The
wmax
andwmax
insidepar_4
was easy to understand.But I wonder what is the function of
scale_up
andscale_down
insidepar_4
.Do they got transported from here and only control the step size for different directions?
If so I should be able to re-use the space for my stuff instead.
Thanks a lot for all the help in this!
Beta Was this translation helpful? Give feedback.
All reactions