Skip to content

Commit

Permalink
Custom kernel improvements (#107)
Browse files Browse the repository at this point in the history
- Wrapped "pytorch" version with torch.compile.
- Allows "cupy" to run without "numba" installed.
- Added "sparse" version using torch.sparse.
- Provide registration mechanism for implementations of message passing, simplifying the process of adding new versions
- Refactor to only require a single test script
- Add speed test script
- Moved old implementations to hidden, but still accessible if needed.
- Improved documentation

Other changes, not related to custom kernels:
- improved documentation.

Commit messages:
* Add first implementation of sparse kernels.

Note: sparse kernel sensesum cannot be used when more than one entry connects the same pair,
however this is caught as an error.

Fixed bug with compare_against='triton'.

* Add improved and simplified dispatch for custom kernel implementation.

* remove individual test files that are no longer necessary

* improve pytorch-only kernels and add compile and jit options

* fix python error on gpu part of test

* wrong print message

* put speed tests into a single function

* make comparison function also be wrapped

* add script for benchmarking implementation speeds

* formatting

* add features to speed tester

* add legacy atomic based kernels, improved test script

* update docs

* fix arg passing

* tweak documentation tables

* adjust error message
  • Loading branch information
lubbersnick authored Sep 27, 2024
1 parent c1c084c commit 36b7350
Show file tree
Hide file tree
Showing 30 changed files with 1,201 additions and 352 deletions.
5 changes: 3 additions & 2 deletions conda_requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
numpy
pytorch >= 1.9
pytorch >= 2.0
torchtriton
matplotlib
numba
Expand All @@ -8,4 +8,5 @@ ase
h5py
tqdm
python-graphviz
lightning
lightning
opt_einsum
5 changes: 3 additions & 2 deletions docs/source/examples/controller.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,12 @@ Controller
==========

How to define a controller for more customized control of the training process.
We assume that there is a set of ``training_modules`` assembled and a ``database`` object has been constructed.
We assume that there is a set of :class:`~hippynn.experiment.assembly.TrainingModules` assembled, called ``training_modules``,
and a :class:`~hippynn.databases.Database`-like object called ``database`` that has been constructed.

The following snippet shows how to set up a controller using a custom scheduler or optimizer::

from hippynn.experiment.controllers import RaiseBatchSizeOnPlateau,PatienceController
from hippynn.experiment.controllers import RaiseBatchSizeOnPlateau, PatienceController

optimizer = torch.optim.Adam(training_modules.model.parameters(),lr=1e-3)

Expand Down
3 changes: 2 additions & 1 deletion docs/source/examples/ensembles.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,6 @@ The ``ensemble_info`` object provides the counts for the inputs and targets of t
and the counts of those corresponding quantities across the ensemble members.

A typical use case would be to then build a Predictor or ASE Calculator from the ensemble.
See :file:`~examples/ensembling_models.py` for a detailed example.
See `/examples/ensembling_models.py`_ for a detailed example.

.. _/examples/ensembling_models.py: https://github.com/lanl/hippynn/blob/development/examples/ensembling_models.py
12 changes: 7 additions & 5 deletions docs/source/examples/plotting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,13 @@ Plotting
========


How to make a plotmaker.
:mod:`hippynn.plotting` is only available if matplotlib is installed.

Let's assume you have a ``molecule_energy`` node that you are training to.
By default, hippynn will plot loss metrics over time when training ends.
On top of this, hippynn can make diagnostic plots during its evaluation phase.
For example, Let's assume you have a ``molecule_energy`` node that you are training to.
A simple plot maker would look like this::


from hippynn import plotting

plot_maker = hippynn.plotting.PlotMaker(
Expand All @@ -19,7 +20,8 @@ A simple plot maker would look like this::

training_modules,db_info = assemble_for_training(train_loss, validation_losses, plot_maker=plot_maker)

The plot maker is thus passed to `assemble_for_training` and attached to the model evaluator.
The plot maker is thus passed to :func:`~hippynn.experiment.assemble_for_training` and attached to the model evaluator.



Note that :mod:`hippynn.plotting` is only available if matplotlib is installed.

4 changes: 2 additions & 2 deletions docs/source/examples/predictor.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
Predictor
=========

The predictor is a simple API for making predictions on an entire database.
The :class:`~hippynn.graphs.Predictor` is a class for making predictions on an entire database.

Often you'll want to make predictions based on the model. For this,
use :meth:`Predictor.from_graph`. Let's assume you have a ``GraphModule`` called ``model``::
use the :meth:`~hippynn.graphs.Predictor.from_graph`. method. Let's assume you have a :class:`~hippynn.GraphModule` called ``model``::

predictor = hippynn.graphs.Predictor.from_graph(model)

Expand Down
2 changes: 1 addition & 1 deletion docs/source/examples/restarting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ Advanced Details
- Here are a list of objects and their final device after loading.

.. list-table::
:widths: 40 30
:widths: 30 70
:header-rows: 1

* - Objects
Expand Down
5 changes: 3 additions & 2 deletions docs/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Requirements

Requirements:
* Python_ >= 3.9
* pytorch_ >= 1.9
* pytorch_ >= 2.0
* numpy_

Optional Dependencies:
Expand All @@ -20,6 +20,7 @@ Optional Dependencies:
* graphviz_ (for visualizing model graphs)
* h5py_ (for loading ani-h5 datasets)
* pytorch-lightning_ (for distributed training)
* opt_einsum_ (backend for accelerating some pytorch expressions)

Interfacing codes:
* ASE_
Expand All @@ -41,7 +42,7 @@ Interfacing codes:
.. _PYSEQM: https://github.com/lanl/PYSEQM
.. _pytorch-lightning: https://github.com/Lightning-AI/pytorch-lightning
.. _hippynn: https://github.com/lanl/hippynn/

.. _opt_einsum: https://github.com/dgasmith/opt_einsum

Installation Instructions
^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
102 changes: 97 additions & 5 deletions docs/source/user_guide/ckernels.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,106 @@ Custom Kernels
Bottom line up front
--------------------

We use custom kernels in `hippynn` to accelerate the HIP-NN neural network message passing.
On the GPU, the best implementation to select is ``triton``, followed by ``cupy``,
followed by ``numba``. On the CPU, only ``numba`` is available. In general, these
If possible, install ``triton`` and ``numba``, as they will accelerate HIP-NN networks
and reduce memory cost on GPU and CPU, respectively.


Brief Description
-----------------

We use custom kernels in hippynn to accelerate the HIP-NN neural network message passing and
to significantly reduce the amount of memory required in passing messages.
On the GPU, the best implementation to select is ``"triton"``, followed by ``"cupy"``,
followed by ``"numba"``. On the CPU, only ``"numba"`` is available. In general, these
custom kernels are very useful, and the only reasons for them to be off is if are
if the packages are not available for installation in your environment or if diagnosing
whether or not a bug could be related to potential misconfiguration of these additional packages.
``triton`` comes with recent versions of ``pytorch``, so optimistically you may already be
configured to use the custom kernels.
``"triton"`` comes with recent versions of ``"pytorch"``, so optimistically you may already be
configured to use the custom kernels. Finally, there is the ``"sparse"`` implementation, which
uses torch.sparse functions. This saves memory much as the kernels from external packages,
however, it does not currently achieve a significant speedup over pytorch.


Comparison Table
----------------


.. list-table:: Hippynn Custom Kernels Options Summary
:widths: 4 30 3 3 3 3 10 30
:header-rows: 1

* - Name
- Description
- Low memory
- Speedup
- CPU
- GPU
- Required Packages
- Notes
* - pytorch
- Dense operations and index add operations
- No
- No
- Yes
- Yes
- None
- lowest overhead, gauranteed to run, but poorest performance
for large data
* - triton
- CSR-dense with OpenAI's triton compiler
using autotuning.
- Yes
- Excellent
- no
- yes
- triton
- Best option for GPU. Does incur some start-up lag due to autotuning.
* - numba
- CSR-dense hybrid with numba
- Yes
- Good
- Yes
- Yes
- numba
- Best option for CPU; non-CPU implementations fall back to this on CPU when available.
* - cupy
- CSR-dense hybrid with cupy/C code.
- Yes
- Great
- no
- yes
- cupy
- Direct translation of numba algorithm, but has improved performance.
* - sparse
- CSR-dense using torch.sparse operations.
- Yes
- None
- Yes
- Yes
- pytorch>=2.4
- Cannot handle all systems, but raises an error on failure.

.. note::
Kernels which do not support the CPU fall back to numba if it is available, and
to pytorch if it is not.

.. note::
Custom Kernels do come with some launch overheads compared to the pytorch implementation.
If your workload is small (small batch sizes, networks, and/or small systems)
and you're using a GPU, then you may find best performance with kernels set to ``"pytorch"``.

.. note::
The sparse implementation is slow for very small workload sizes. At large workload
sizes, it is about as fast as pytorch (while using less memory), but still slower
than numba.

.. note::
The sparse implementation does not handle message-passing where atoms can appear
together in two or more sets of pairs due to small systems with periodic boundary conditions.


For information on how to set the custom kernels, see :doc:`settings`


Detailed Explanation
--------------------
Expand Down
5 changes: 2 additions & 3 deletions hippynn/_settings_setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,13 +88,12 @@ def kernel_handler(kernel_string):
kernel = {
"0": False,
"false": False,
"pytorch": False,
"1": True,
"true": True,
}.get(kernel_string, kernel_string)

if kernel not in [True, False, "auto", "triton", "cupy", "numba"]:
warnings.warn(f"Unexpected custom kernel setting: {kernel_string}.", stacklevel=3)
# This function used to warn about unexpected kernel settings.
# Now this is an error which is raised in the custom_kernels module.

return kernel

Expand Down
Loading

0 comments on commit 36b7350

Please sign in to comment.