Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBlas ABI changes #3

Merged
merged 3 commits into from
Jun 15, 2022
Merged

CBlas ABI changes #3

merged 3 commits into from
Jun 15, 2022

Conversation

shadeMe
Copy link
Owner

@shadeMe shadeMe commented Jun 15, 2022

This PR removes use of torch.set_default_tensor_type. There are
various reasons why we should probably move away from using this
function:

This PR makes PyTorchWrapper/PyTorchShim flexible in terms of the
devices it can use. Both classes add a device argument to their
constructors that takes a torch.device instance. The shim ensures that
the model is on the given device. The wrapper ensures that input tensors
are on the correct device, by calling xp2torch with the new device
keyword argument.

Even though this approach offers more flexibility, as a default we want
to use the cpu device when NumpyOps is used and cuda:N when
CupyOps is used. In order to do so, this PR also adds a new function
get_torch_default_device that returns the correct device for the
currently active Ops. PyTorchWrapper/PyTorchShim/xp2torch use this
function when None is given as the device to fall back on this
default, mimicking the behavior from before this PR.

  • Add some typing fixes

  • Remove spurious cupy import

  • Small fixes

  • Use torch.cuda.current_device() to get the current PyTorch CUDA
    device.
  • Do not use torch_set_default_tensor_type in set_active_gpu.

Co-authored-by: explosion-bot explosion-bot@users.noreply.github.com

Co-authored-by: explosion-bot explosion-bot@users.noreply.github.com

  • Add support for PyTorch Metal Performance Shaders

Nightly PyTorch versions add support for Metal Performance Shaders
(MPS). Metal is a low-level graphics API for Apple platforms that also
supports compute kernels (shaders). MPS is a framework of
highly-optimized compute and graphics kernels, including kernels for
neural networks. MPS is supported on both Apple Silicon, such as the M1
family of SoC, as well as a range of AMD GPUs used in Macs.

Since devices are handled in Thinc through a specific Ops
implementation (e.g. CupyOps == CUDA GPUs), this change introduces the
MPSOps class. This class is a subclass of NumpyOps or
AppleOps (when available). MPSOps does not override any methods, but
is used to signal to relevant code paths (e.g. xp2torch) that Torch
tensors should be placed on the MPS device.

The mapping in the previously introduced get_torch_default_device
function is updated to:

  • NumpyOps -> cpu
  • CupyOps -> cuda:N, where N is the selected CUDA device.
  • MPSOps -> mps

to ensure placement of Torch tensors on the mps device when MPSOps
is active.

Finally, the following booleans have been added to or changed in
compat:

  • has_torch_mps (new): PyTorch has MPS support
  • has_torch_mps_gpu (new): PyTorch has MPS support and an
    MPS-capable GPU is available.
  • has_torch_cuda_gpu (new): PyTorch has CUDA support and a
    CUDA-capable GPU is available.
  • has_torch_gpu (changed): PyTorch has a GPU available (CUDA
    or MPS).

We do not want to hide other issues while importing thinc_apple_ops.

  • Remove unneeded has_torch_mps bool

  • Add has_gpu bool and use it in util

  • Replace another expression by has_gpu

  • Set has_torch_gpu to has_torch_cuda_gpu

We need to decide whether we want to make the potentially breaking
change from has_torch_cuda_gpu to has_torch_cuda_gpu or has_torch_mps_gpu. But since the latter is not needed for this PR,
remove the change.

  • Update thinc/util.py

Co-authored-by: Sofie Van Landeghem svlandeg@users.noreply.github.com

Co-authored-by: shademe shadeMe@users.noreply.github.com
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: explosion-bot explosion-bot@users.noreply.github.com
Co-authored-by: Adriane Boyd adrianeboyd@gmail.com
Co-authored-by: Sofie Van Landeghem svlandeg@users.noreply.github.com

Co-authored-by: shademe shadeMe@users.noreply.github.com
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: explosion-bot explosion-bot@users.noreply.github.com
Co-authored-by: Adriane Boyd adrianeboyd@gmail.com
Co-authored-by: Sofie Van Landeghem svlandeg@users.noreply.github.com

shadeMe and others added 3 commits June 14, 2022 15:09
* Remove use of `torch.set_default_tensor_type` (explosion#674)

* Remove use of `torch.set_default_tensor_type`

This PR removes use of `torch.set_default_tensor_type`. There are
various reasons why we should probably move away from using this
function:

- Upstream will deprecate and remove it:
  pytorch/pytorch#53124
- We cannot use this mechanism for other devices than CPU/CUDA, such as
  Metal Performance Shaders.
- It offers little flexibility in allocating Torch models on different
  devices.

This PR makes `PyTorchWrapper`/`PyTorchShim` flexible in terms of the
devices it can use. Both classes add a `device` argument to their
constructors that takes a `torch.device` instance. The shim ensures that
the model is on the given device. The wrapper ensures that input tensors
are on the correct device, by calling `xp2torch` with the new `device`
keyword argument.

Even though this approach offers more flexibility, as a default we want
to use the `cpu` device when `NumpyOps` is used and `cuda:N` when
CupyOps is used. In order to do so, this PR also adds a new function
`get_torch_default_device` that returns the correct device for the
currently active Ops. `PyTorchWrapper`/`PyTorchShim`/`xp2torch` use this
function when `None` is given as the device to fall back on this
default, mimicking the behavior from before this PR.

* Add some typing fixes

* Remove spurious cupy import

* Small fixes

- Use `torch.cuda.current_device()` to get the current PyTorch CUDA
  device.
- Do not use `torch_set_default_tensor_type` in `set_active_gpu`.

* Add `test_slow_gpu` explosion-bot command

* Auto-format code with black (explosion#682)

Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>

* Azure: pin protobuf to fix Tensorflow

* Extend typing_extensions to <4.2.0 (explosion#689)

* Add support for PyTorch Metal Performance Shaders (explosion#685)

* Add `test_slow_gpu` explosion-bot command

* Auto-format code with black (explosion#682)

Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>

* Add support for PyTorch Metal Performance Shaders

Nightly PyTorch versions add support for Metal Performance Shaders
(MPS). Metal is a low-level graphics API for Apple platforms that also
supports compute kernels (shaders). MPS is a framework of
highly-optimized compute and graphics kernels, including kernels for
neural networks. MPS is supported on both Apple Silicon, such as the M1
family of SoC, as well as a range of AMD GPUs used in Macs.

Since devices are handled in Thinc through a specific `Ops`
implementation (e.g. `CupyOps` == CUDA GPUs), this change introduces the
`MPSOps` class. This class is a subclass of `NumpyOps` or
`AppleOps` (when available). `MPSOps` does not override any methods, but
is used to signal to relevant code paths (e.g. `xp2torch`) that Torch
tensors should be placed on the MPS device.

The mapping in the previously introduced `get_torch_default_device`
function is updated to:

- `NumpyOps` -> `cpu`
- `CupyOps` -> `cuda:N`, where N is the selected CUDA device.
- `MPSOps` -> `mps`

to ensure placement of Torch tensors on the `mps` device when `MPSOps`
is active.

Finally, the following booleans have been added to or changed in
`compat`:

- `has_torch_mps` (new): PyTorch has MPS support
- `has_torch_mps_gpu` (new): PyTorch has MPS support and an
  MPS-capable GPU is available.
- `has_torch_cuda_gpu` (new): PyTorch has CUDA support and a
  CUDA-capable GPU is available.
- `has_torch_gpu` (changed): PyTorch has a GPU available (CUDA
  or MPS).

* Test PyTorch wrapper with all xp ops

* Azure: pin protobuf to fix Tensorflow

* Extend typing_extensions to <4.2.0 (explosion#689)

* Fix type checking error

* Only back-off to NumpyOps on import error

We do not want to hide other issues while importing thinc_apple_ops.

* Remove unneeded `has_torch_mps` bool

* Add `has_gpu` bool and use it in `util`

* Replace another expression by has_gpu

* Set `has_torch_gpu` to `has_torch_cuda_gpu`

We need to decide whether we want to make the potentially breaking
change from `has_torch_cuda_gpu` to `has_torch_cuda_gpu or
has_torch_mps_gpu`. But since the latter is not needed for this PR,
remove the change.

* Update thinc/util.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: shademe <shadeMe@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: shademe <shadeMe@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
@shadeMe shadeMe merged commit 19f406e into shadeMe:cblas-abi Jun 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants