Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add experimental support of cuQuantum #1400

Merged
merged 33 commits into from
Mar 1, 2022
Merged

Conversation

doichanj
Copy link
Collaborator

@doichanj doichanj commented Dec 13, 2021

Summary

This is the experimental support for NVIDIA's cuQuantum Beta 2 (ver 0.1.0).

Details and comments

We can use cuStateVec APIs instead of Aer's GPU implementations by setting options at runtime (see CONTRIBUTING.md for details). cuStateVec is enabled when building with CUSTATEVEC_ROOT with the path to cuQuantum.
By using cuStateVec, we can speed up x2 for large qubits (larger than 22 qubits) but Aer's implementation is still faster for smaller qubits.

Since cuQuantum is beta version, there are some limitations:

  • cuStateVec is not thread safe, multi-chunk parallelization (cache blocking) is done by single thread (slow)
  • Multi-shots parallelization is disabled (single thread, slow)
  • Multi-shots batched optimization is not support for cuStateVec

@chriseclectic chriseclectic added the on hold Can not fix yet label Dec 13, 2021
@chriseclectic
Copy link
Member

I added On-hold until 0.10 release is out

@chriseclectic chriseclectic removed the on hold Can not fix yet label Dec 14, 2021
Copy link
Collaborator

@hhorii hhorii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think introducing a new namespace is necessary for AER::QV to identify chunk-based or not. For example, GateFuncBase sounds too general and it should be in AER::QV::CHUNK or something.

src/controllers/aer_controller.hpp Outdated Show resolved Hide resolved
src/controllers/aer_controller.hpp Show resolved Hide resolved
src/simulators/state.hpp Outdated Show resolved Hide resolved
src/simulators/statevector/chunk/chunk_container.hpp Outdated Show resolved Hide resolved
@hhorii
Copy link
Collaborator

hhorii commented Feb 1, 2022

@doichanj a release note is necessary.

@hhorii hhorii added this to the Aer 0.10.3 milestone Feb 2, 2022
@jakelishman
Copy link
Member

This probably should also wait for Aer 0.11 - it's a big new feature, and patch releases are usually for bugfixes.

@hhorii hhorii modified the milestones: Aer 0.10.3, Aer 0.11.0 Feb 2, 2022
Copy link
Collaborator

@hhorii hhorii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I found that OpenMP does not work well in any devices. Let me investigate this phenomena is from only my configuration or common.

if(cuStateVec_enable_){
enable_batch_multi_shots_ = false; //cuStateVec does not support batch execution of multi-shots
parallel_shots_ = 1; //cuStateVec is currently not thread safe
return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if cuStateVec_enable=True is configured in AerSimulator.run(), parallel_state_update_ is not set. This will produce performance regression if application accidientaly sets cuStateVec_enable with device='CPU'.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: when enable_batch_multi_shots_=true would you create nShots copies of the statevector for parallelization? If so & IIUC, I think a proper "workaround" is to create multiple cuStateVec handles (or just retain and reuse a pool of handles at init time to reduce overhead) and use them in parallel.

IMHO though it's beyond a "workaround": even after we fix the thread safety issue, generally speaking it is still challenging for library handles to be shared by multiple host threads. For example, despite cuBLAS supports this usage pattern they explicitly recommend to not do so. Thus the handle pool approach is commonly seen in ML/DL frameworks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enable_batch_multi_shots_=true is not applicable for cuStateVec currently, because multiple state vectors are calculated in a single CUDA kernel and each state vector refers classical registers to handle branch operations, this is not implemented in cuStateVec.
Multiple cuStateVec handle is required when enable_batch_multi_shots_=false and shot level parallelization is required. In this case, state vectors are independently calculated using OpenMP threads. (Currently cuStateVec is not thread safe and we disable OpenMP parallelization)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explanation @doichanj. I understand better now. So once we fix thread safety we can unblock you for the shot-level parallelization.

#'GPU_cuStateVec' is used only inside tests not available in Aer
#and this is converted to "device='GPU'" and option "cuStateVec_enalbe = True" is added
if cuStateVec:
data_args.append((method, 'GPU_cuStateVec'))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chriseclectic could you review this change? This is a hack to minimize changes of tests for tests of cuStateVec option. cuStateVec is an option only for device=GPU. Current annotator supported_methods() requires tests to take two argument method and device. Adding new option cuStateVec_enable to all the tests is not productive, I believe.

@chriseclectic chriseclectic self-assigned this Feb 15, 2022
@hhorii
Copy link
Collaborator

hhorii commented Feb 28, 2022

I confirmed that no regressions will be happened with this PR.

@hhorii hhorii merged commit db91e7d into Qiskit:main Mar 1, 2022
@hhorii hhorii mentioned this pull request Mar 29, 2022
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants