-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add experimental support of cuQuantum #1400
Conversation
I added On-hold until 0.10 release is out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think introducing a new namespace is necessary for AER::QV
to identify chunk-based or not. For example, GateFuncBase
sounds too general and it should be in AER::QV::CHUNK
or something.
@doichanj a release note is necessary. |
This probably should also wait for Aer 0.11 - it's a big new feature, and patch releases are usually for bugfixes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I found that OpenMP does not work well in any devices. Let me investigate this phenomena is from only my configuration or common.
if(cuStateVec_enable_){ | ||
enable_batch_multi_shots_ = false; //cuStateVec does not support batch execution of multi-shots | ||
parallel_shots_ = 1; //cuStateVec is currently not thread safe | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if cuStateVec_enable=True
is configured in AerSimulator.run()
, parallel_state_update_
is not set. This will produce performance regression if application accidientaly sets cuStateVec_enable
with device='CPU'
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: when enable_batch_multi_shots_=true
would you create nShots
copies of the statevector for parallelization? If so & IIUC, I think a proper "workaround" is to create multiple cuStateVec handles (or just retain and reuse a pool of handles at init time to reduce overhead) and use them in parallel.
IMHO though it's beyond a "workaround": even after we fix the thread safety issue, generally speaking it is still challenging for library handles to be shared by multiple host threads. For example, despite cuBLAS supports this usage pattern they explicitly recommend to not do so. Thus the handle pool approach is commonly seen in ML/DL frameworks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enable_batch_multi_shots_=true
is not applicable for cuStateVec currently, because multiple state vectors are calculated in a single CUDA kernel and each state vector refers classical registers to handle branch operations, this is not implemented in cuStateVec.
Multiple cuStateVec handle is required when enable_batch_multi_shots_=false
and shot level parallelization is required. In this case, state vectors are independently calculated using OpenMP threads. (Currently cuStateVec is not thread safe and we disable OpenMP parallelization)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explanation @doichanj. I understand better now. So once we fix thread safety we can unblock you for the shot-level parallelization.
#'GPU_cuStateVec' is used only inside tests not available in Aer | ||
#and this is converted to "device='GPU'" and option "cuStateVec_enalbe = True" is added | ||
if cuStateVec: | ||
data_args.append((method, 'GPU_cuStateVec')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chriseclectic could you review this change? This is a hack to minimize changes of tests for tests of cuStateVec
option. cuStateVec
is an option only for device=GPU
. Current annotator supported_methods()
requires tests to take two argument method
and device
. Adding new option cuStateVec_enable
to all the tests is not productive, I believe.
I confirmed that no regressions will be happened with this PR. |
Summary
This is the experimental support for NVIDIA's cuQuantum Beta 2 (ver 0.1.0).
Details and comments
We can use cuStateVec APIs instead of Aer's GPU implementations by setting options at runtime (see CONTRIBUTING.md for details). cuStateVec is enabled when building with
CUSTATEVEC_ROOT
with the path to cuQuantum.By using cuStateVec, we can speed up x2 for large qubits (larger than 22 qubits) but Aer's implementation is still faster for smaller qubits.
Since cuQuantum is beta version, there are some limitations: