Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Heat Tutorials, fix typos. #1238

Merged
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions doc/source/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,12 @@ If you do not have a recent installation on you system, you may want to upgrade

sudo dnf update python3

If you have new administrator privileges on your system, because you are working on a cluster for example, make sure to check its *user guide*, the module system (``module spider python``) or get in touch with the administrators.
If you have no administrator privileges on your system, because you are working on a cluster for example, make sure to check its *user guide*, the module system (``module spider python``) or get in touch with the administrators.

Optional Dependencies
^^^^^^^^^^^^^^^^^^^^^

You can accelerate computations with Heat in different ways. For GPU acceleration ensure that you have a `CUDA <https://developer.nvidia.com/cuda-zone>`_ installation on your system. Distributed computations require an MPI stack on you computer. We recommend `MVAPICH <https://mvapich.cse.ohio-state.edu/>`_ or `OpenMPI <https://www.open-mpi.org/>`_. Finally, for parallel data I/O, Heat offers interface to `HDF5 <https://www.hdfgroup.org/solutions/hdf5/>`_ and `NetCDF <https://www.unidata.ucar.edu/software/netcdf/>`_. You can obtain these packages using your operating system's package manager.
You can accelerate computations with Heat in different ways. For GPU acceleration ensure that you have a `CUDA <https://developer.nvidia.com/cuda-zone>`_ installation on your system. Distributed computations require an MPI stack on your computer. We recommend `MVAPICH <https://mvapich.cse.ohio-state.edu/>`_ or `OpenMPI <https://www.open-mpi.org/>`_. Finally, for parallel data I/O, Heat offers interface to `HDF5 <https://www.hdfgroup.org/solutions/hdf5/>`_ and `NetCDF <https://www.unidata.ucar.edu/software/netcdf/>`_. You can obtain these packages using your operating system's package manager.

Installation
------------
Expand Down
13 changes: 7 additions & 6 deletions doc/source/tutorial_clustering.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,8 +68,8 @@ initial centroids.
c1.balance_()
c2.balance_()

print(f"Number of points assigned to c1: {c1.shape[0]} "
f"Number of points assigned to c2: {c2.shape[0]} "
print(f"Number of points assigned to c1: {c1.shape[0]} \n"
f"Number of points assigned to c2: {c2.shape[0]} \n"
f"Centroids = {centroids}")

.. code:: text
Expand All @@ -95,7 +95,7 @@ Let's plot the assigned clusters and the respective centroids:

.. image:: ../images/clustering.png

We can also cluster the data with kmedians. The respective advanced initial centroid sampling is called 'kmedians++'
We can also cluster the data with kmedians. The respective advanced initial centroid sampling is called 'kmedians++'.

.. code:: python

Expand All @@ -110,8 +110,9 @@ We can also cluster the data with kmedians. The respective advanced initial cent
c1.balance_()
c2.balance_()

print(f"Number of points assigned to c1: {c1.shape[0]}"
f"Number of points assigned to c2: {c2.shape[0]}")
print(f"Number of points assigned to c1: {c1.shape[0]} \n"
f"Number of points assigned to c2: {c2.shape[0]} \n"
f"Centroids = {centroids}")

Plotting the assigned clusters and the respective centroids:

Expand All @@ -132,7 +133,7 @@ The Iris Dataset
------------------------------
The _iris_ dataset is a well known example for clustering analysis. It contains 4 measured features for samples from
three different types of iris flowers. A subset of 150 samples is included in formats h5, csv and netcdf in Heat,
located under 'heat/heat/datasets/iris.h5', and can be loaded in a distributed manner with Heat's parallel
located under 'heat/heat/datasets', and can be loaded in a distributed manner with Heat's parallel
dataloader

.. code:: python
Expand Down
12 changes: 6 additions & 6 deletions doc/source/tutorial_parallel_computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ Distributed Computing
---------------------

.. warning::
For the following code examples, make sure to you have `MPI <https://computing.llnl.gov/tutorials/mpi/>`_ installed.
For the following code examples, make sure you have `MPI <https://computing.llnl.gov/tutorials/mpi/>`_ installed.

With Heat you can even compute in distributed memory environments with multiple computation nodes, like modern high-performance cluster systems. For this, Heat makes use of the fact that operations performed on multi-dimensional arrays tend to be identical for all data items. Hence, they can be processed in data-parallel manner. Heat partitions the total number of data items equally among all processing nodes. A ``DNDarray`` assumes the role of a virtual overlay over these node-local data portions and manages them for you while offering the same interface. Consequently, operations can now be executed in parallel. Each processing node applies them locally to their own data chunk. If necessary, partial results are communicated and automatically combined behind the scenes for correct global results.

Expand Down Expand Up @@ -174,7 +174,7 @@ Output:

.. code:: text

DNDarray([12.], dtype=ht.float32, device=cpu:0, split=None)
DNDarray(12., dtype=ht.float32, device=cpu:0, split=None)

The previously ``split=0`` matrix is ``split=None`` after the reduction operation. Obviously, we can also perform operations between (differently) split ``DNDarrays``.

Expand All @@ -191,7 +191,7 @@ Output:

DNDarray([[1., 2., 3., 4.],
[1., 2., 3., 4.],
[1., 2., 3., 4.]], dtype=ht.float32, device=cpu:0, split=0)
[1., 2., 3., 4.]], dtype=ht.float32, device=cpu:0, split=1)

[0/3] DNDarray([1., 2., 3., 4.], dtype=ht.int32, device=cpu:0, split=None)
[1/3] DNDarray([1., 2., 3., 4.], dtype=ht.int32, device=cpu:0, split=None)
Expand All @@ -200,7 +200,7 @@ Output:
Technical Details
^^^^^^^^^^^^^^^^^

On a technical level, Heat is inspired by the so-called `Bulk Synchronous Parallel (BSP) <https://en.wikipedia.org/wiki/Bulk_synchronous_parallel>`_ processing model. Computations proceed in a series of hierarchical supersteps, each consisting of a number of node-local computations and subsequent communications. In contrast to the classical BSP model, communicated data is available immediately, rather than after the next global synchronization. In Heat, global synchronizations only occurs for collective MPI calls as well as at the program start and termination.
On a technical level, Heat is inspired by the so-called `Bulk Synchronous Parallel (BSP) <https://en.wikipedia.org/wiki/Bulk_synchronous_parallel>`_ processing model. Computations proceed in a series of hierarchical supersteps, each consisting of a number of node-local computations and subsequent communications. In contrast to the classical BSP model, communicated data is available immediately, rather than after the next global synchronization. In Heat, global synchronization only occurs for collective MPI calls as well as at the program start and termination.

.. image:: ../images/bsp.svg
:align: center
Expand All @@ -223,13 +223,13 @@ You can start the distributed interactive interpreter by invoking the following

.. note::

The interactive interpreter does only support a subset of all controls commands.
The interactive interpreter does only support a subset of all control commands.


Parallel Performance
--------------------

When working with parallel and distributed computation in Heat there are some best practices for you may to know about. The following list covers the major ones.
When working with parallel and distributed computation in Heat there are some best practices for you to know about. The following list covers the major ones.

Dos
^^^
Expand Down
Loading