Added sphinx documentation

- Added sphinx documentation files. (index, getting-started, basic-usage, api-reference, misc proj information). - Improved prose in psMNIST notebook example. - Added pre-trained weights to psMNIST notebook example. - Added changelog entry. - Updated setup.py url to ABR.com website.
nengo · Aug 20, 2020 · 74bcbe6 · 74bcbe6
1 parent 9d0b973
commit 74bcbe6
Show file tree

Hide file tree

Showing 23 changed files with 1,266 additions and 463 deletions.
diff --git a/.nengobones.yml b/.nengobones.yml
@@ -14,14 +14,19 @@ contributors_rst: {}
 manifest_in: {}
 
 setup_py:
-  install_req:  
+  url: https://appliedbrainresearch.com/lmu
+  install_req:
     - nengolib>=0.5.1
     - tensorflow>=2.0.0
   docs_req:
     - matplotlib>=3.0.2
     - IPython>=7.2.0
     - notebook>=5.7.4
     - seaborn>=0.9.0
+    - sphinx>=1.8
+    - nbsphinx
+    - nengo_sphinx_theme>=1.2.0
+    - numpydoc>=0.6
   optional_req:
     - scipy
   classifiers:
@@ -39,18 +44,27 @@ setup_py:
 
 setup_cfg: {}
 
+docs_conf_py:
+  intersphinx_mapping:
+    scipy: "https://docs.scipy.org/doc/scipy/reference"
+  analytics_id: UA-41658423-2
+  html_redirects:
+    getting_started.html: getting-started.html
+
 travis_yml:
   python: 3.6
   jobs:
     - script: static
+    - script: docs
   pypi_user: arvoelke
-  slack_notifications: "vv0lGEj/xNMFSZDbFsdoMJyEcr9BO8p43KUefmgHfjIQtodZqXdesLl+XJcXW0jxCJlNyy3H8LHYP/mEGJpIsK+tQ7dxlWscLjSGWfcoNiZGy4a1Jp4fF+MZyYvIGlTRJqxrazIrj73tGFptVr2XDE74eO0Z9YaVSJVQw4twEDrWFEAq4foWxV30SkcXfdCkhBwX+43CJyuGE3YFDD/+03me/mdccjNRqCfJ0lURRk7H5tcztryrZy2gpwHV+W73raGTybxlP1xEa1hyLYJO40eH/JfeqBqIDxa5m61Aw+BH/HJ5ZLNlTEUyUB6p7kcIYO9lyko5TY3QSqlX9pK+tK+2DojDlzI97QwgQVbx4WvTJ1JEidfgRqNcTlJOG16RvlyxQjW1u3/QV67bmINus470qQqzIBbdLfM70v+E5Ga/bk+Gk1Z29btB7DxXt4z9dH9z3NXTOLhDpH5WZzpcatrbfSrgMzKtxC+z6oLfDzzio9Fx20RiuHv3P8GtXyyR9WkelMH9GVi7xUBHVCveRVVhNKL555u7NbP5TI6Jc9NZqf7OtrNsRKIY4MfGc9KKjYa+Ks+3PT+yQZ8u/ZMMddMTv73nzLH0pU715/CBl1hQGkKkopukGtKbCpdc666PnRrFy9l21hBqSNqLo/FGPF/Yqr+yTXhuhBhvNZnvFQU=" 
+  slack_notifications: "vv0lGEj/xNMFSZDbFsdoMJyEcr9BO8p43KUefmgHfjIQtodZqXdesLl+XJcXW0jxCJlNyy3H8LHYP/mEGJpIsK+tQ7dxlWscLjSGWfcoNiZGy4a1Jp4fF+MZyYvIGlTRJqxrazIrj73tGFptVr2XDE74eO0Z9YaVSJVQw4twEDrWFEAq4foWxV30SkcXfdCkhBwX+43CJyuGE3YFDD/+03me/mdccjNRqCfJ0lURRk7H5tcztryrZy2gpwHV+W73raGTybxlP1xEa1hyLYJO40eH/JfeqBqIDxa5m61Aw+BH/HJ5ZLNlTEUyUB6p7kcIYO9lyko5TY3QSqlX9pK+tK+2DojDlzI97QwgQVbx4WvTJ1JEidfgRqNcTlJOG16RvlyxQjW1u3/QV67bmINus470qQqzIBbdLfM70v+E5Ga/bk+Gk1Z29btB7DxXt4z9dH9z3NXTOLhDpH5WZzpcatrbfSrgMzKtxC+z6oLfDzzio9Fx20RiuHv3P8GtXyyR9WkelMH9GVi7xUBHVCveRVVhNKL555u7NbP5TI6Jc9NZqf7OtrNsRKIY4MfGc9KKjYa+Ks+3PT+yQZ8u/ZMMddMTv73nzLH0pU715/CBl1hQGkKkopukGtKbCpdc666PnRrFy9l21hBqSNqLo/FGPF/Yqr+yTXhuhBhvNZnvFQU="
   deploy_dists:
     - sdist
     - bdist_wheel
 
 ci_scripts:
   - template: static
+  - template: docs
   - template: deploy
 
 codecov_yml: {}
diff --git a/.travis.yml b/.travis.yml
@@ -22,12 +22,20 @@ env:
     - SCRIPT="test"
     - TEST_ARGS=""
     - BRANCH_NAME="${TRAVIS_PULL_REQUEST_BRANCH:-$TRAVIS_BRANCH}"
+    - PIP_USE_FEATURE="2020-resolver"
 
 jobs:
   include:
   -
     env:
       SCRIPT="static"
+  -
+    env:
+      SCRIPT="docs"
+    addons:
+      apt:
+        packages:
+          - pandoc
   - stage: deploy
     if: branch =~ ^release-candidate-* OR tag =~ ^v[0-9]*
     env: SCRIPT="deploy"

diff --git a/CHANGES.rst b/CHANGES.rst
@@ -22,14 +22,17 @@ Release history
 0.1.1 (unreleased)
 ==================
 
+**Added**
+
+- Added documentation for package description, installation, usage, API, examples,
+  and project information. (`#20 <https://github.com/abr/lmu/pull/20>`__)
 - Added LMU FFT cell variant and auto-switching LMU class
   (`#21 <https://github.com/abr/lmu/pull/21>`__)
 
-
 0.1.0 (June 22, 2020)
 =====================
 
-Initial release of LMU 0.1.0! Supports Python 3.5+.
+Initial release of NengoLMU 0.1.0! Supports Python 3.5+.
 
 The API is considered unstable; parts are likely to change in the future.
 

diff --git a/README.rst b/README.rst
@@ -3,23 +3,27 @@ Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networ
 
 `Paper <https://papers.nips.cc/paper/9689-legendre-memory-units-continuous-time-representation-in-recurrent-neural-networks.pdf>`_
 
-We propose a novel memory cell for recurrent neural networks that dynamically maintains information across long windows of time using relatively few resources. The Legendre Memory Unit (LMU) is mathematically derived to orthogonalize its continuous-time history – doing so by solving d coupled ordinary differential equations (ODEs), whose phase space linearly maps onto sliding windows of time via the Legendre polynomials up to degree d − 1 (example d=12, shown below).
+NengoLMU is a python software library containing various implementations of the Legendre Memory Unit (LMU). The LMU is a novel memory cell for recurrent neural networks that dynamically maintains information across long windows of time using relatively few resources. It has been shown to perform as well as standard LSTM or other RNN-based models in a variety of tasks, generally with fewer internal parameters (see `this paper <https://papers.nips.cc/paper/9689-legendre-memory-units-continuous-time-representation-in-recurrent-neural-networks.pdf>`_ for more details). For the Permuted Sequential MNIST (psMNIST) task in particular, it has been demonstrated to outperform the current state-of-the-art results. See the note below for instructions on how to get access to this model.
+
+The LMU is mathematically derived to orthogonalize its continuous-time history – doing so by solving *d* coupled ordinary differential equations (ODEs), whose phase space linearly maps onto sliding windows of time via the Legendre polynomials up to degree *d* − 1 (the example for *d* = 12 is shown below).
 
 .. image:: https://i.imgur.com/Uvl6tj5.png
    :target: https://i.imgur.com/Uvl6tj5.png
    :alt: Legendre polynomials
 
-A single ``LMUCell`` expresses the following computational graph in Keras as an RNN layer, which couples the optimal linear memory (``m``) with a nonlinear hidden state (``h``):
+A single LMU cell expresses the following computational graph, which takes in an input signal, **x**, and couples a optimal linear memory, **m**, with a nonlinear hidden state, **h**. By default, this coupling is trained via backpropagation, while the dynamics of the memory remain fixed.
 
 .. image:: https://i.imgur.com/IJGUVg6.png
    :target: https://i.imgur.com/IJGUVg6.png
    :alt: Computational graph
 
-The discretized ``(A, B)`` matrices are initialized according to the LMU's mathematical derivation with respect to some chosen window length, θ. Backpropagation can be used to learn this time-scale, or fine-tune ``(A, B)``, if necessary. By default the coupling between the hidden state (``h``) and the memory vector (``m``) is trained via backpropagation, while the dynamics of the memory remain fixed (`see paper for details <https://papers.nips.cc/paper/9689-legendre-memory-units-continuous-time-representation-in-recurrent-neural-networks.pdf>`_).
+The discretized **A** and **B** matrices are initialized according to the LMU 's mathematical derivation with respect to some chosen window length, **θ**. Backpropagation can be used to learn this time-scale, or fine-tune **A** and **B**, if necessary.
+
+Both the kernels, **W**, and the encoders, **e**, are learned. Intuitively, the kernels learn to compute nonlinear functions across the memory, while the encoders learn to project the relevant information into the memory (see `paper <https://papers.nips.cc/paper/9689-legendre-memory-units-continuous-time-representation-in-recurrent-neural-networks.pdf>`_ for details).
 
-The ``docs`` includes an example for how to use the ``LMUCell``.
+.. note::
 
-The ``paper`` branch in the ``lmu`` GitHub repository includes a pre-trained Keras/TensorFlow model, located at ``models/psMNIST-standard.hdf5``, which obtains the current best-known psMNIST result (using an RNN) of **97.15%**. Note, the network is using fewer internal state-variables and neurons than there are pixels in the input sequence. To reproduce the results from the paper, run the notebooks in the ``experiments`` directory within the ``paper`` branch.
+   The ``paper`` branch in the ``lmu`` GitHub repository includes a pre-trained Keras/TensorFlow model, located at ``models/psMNIST-standard.hdf5``, which obtains the current best-known psMNIST result (using an RNN) of **97.15%**. Note that the network is using fewer internal state-variables and neurons than there are pixels in the input sequence. To reproduce the results from `this paper <https://papers.nips.cc/paper/9689-legendre-memory-units-continuous-time-representation-in-recurrent-neural-networks.pdf>`_, run the notebooks in the ``experiments`` directory within the ``paper`` branch.
 
 Nengo Examples
 --------------

diff --git a/docs/_static/favicon.ico b/docs/_static/favicon.ico
diff --git a/docs/api-reference.rst b/docs/api-reference.rst
@@ -0,0 +1,43 @@
+.. _api-reference:
+
+*************
+API reference
+*************
+
+.. _api-reference-lc:
+
+LMU Layers
+==========
+
+.. autosummary::
+   :nosignatures:
+
+   lmu.LMU
+
+.. autoclass:: lmu.LMU
+
+
+LMU Cells
+=========
+
+.. autosummary::
+   :nosignatures:
+
+   lmu.LMUCell
+   lmu.LMUCellFFT
+
+.. autoclass:: lmu.LMUCell
+
+.. autoclass:: lmu.LMUCellFFT
+
+.. _api-reference-li:
+
+Legendre Initializer
+====================
+
+.. autosummary::
+   :nosignatures
+
+   lmu.Legendre
+
+.. autoclass:: lmu.Legendre
diff --git a/docs/basic-usage.rst b/docs/basic-usage.rst
@@ -0,0 +1,154 @@
+.. _basic-usage:
+
+***********
+Basic usage
+***********
+
+The standard Legendre Memory Unit (LMU) layer
+implementation in NengoLMU is defined in the
+``lmu.LMU`` class. The following code creates
+a new LMU layer:
+
+.. testcode::
+
+   import lmu
+
+   lmu_layer = lmu.LMU(
+       units=10,
+       order=256,
+       theta=784
+   )
+
+Note that the values used above for ``units``, ``order``,
+and ``theta`` are arbitrary values and the actual values will depend on your
+specific solution. ``units`` represents the dimensionality of
+the output vector, ``order`` represents the dimensionality of
+the memory cell, and ``theta`` represents the dimensionality of
+the sliding window. To learn more about these parameters, check out
+the :ref:`LMU class API reference <api-reference-lc>`.
+
+Creating NengoLMU layers
+------------------------
+
+The ``LMU`` class functions as a standard
+TensorFlow layer and is meant to be used within a TensorFlow model.
+The code below illustrates how to do this using a TensorFlow functional model with
+a 10-dimensional input, and a 20-dimensional output.
+
+.. testcode::
+
+   from tensorflow.keras import Input, Model
+   from tensorflow.keras.layers import Dense
+
+   inputs = Input((None, 10))
+   lmus = lmu_layer(inputs)
+   outputs = Dense(20)(lmus)
+
+   model = Model(inputs=inputs, outputs=outputs)
+
+
+Customizing parameters
+----------------------
+
+The ``LMU`` class is designed to be easy to use and
+be integrated seamlessly into your TensorFlow
+models. However, for users looking to optimize
+their use of the ``LMU`` layer, there are additional
+parameters that can be modified.
+
+.. testcode::
+
+    custom_lmu_layer = lmu.LMU(
+        units=10,
+        order=256,
+        theta=784,
+        method="zoh",
+        hidden_activation="tanh",
+    )
+
+The ``method`` parameter specifies the
+discretization method that will be used to compute
+the ``A`` and ``B`` matrices. By default, this parameter is
+set to ``"zoh"`` (zero-order-hold). This is generally the best
+option for input signals that are held constant
+between time steps, which is a common use case for
+sequential data (e.g. feeding in a sequence of
+pixels like in the psMNIST task).
+
+The ``hidden_activation`` parameter specifies the
+final non-linearity that gets applied to the
+output. By default, this parameter is set to
+``tanh``. This is mainly done so that outputs
+are symmetric about zero and saturated to
+``[-1, 1]``, which betters stability. Though,
+other non-linearities like ``ReLU`` work well too.
+
+Tuning these parameters can lead to optimized
+performance. As an example, using ``"euler"`` as the
+discretization method results in an LMU configuration
+that is easier to implement on physical hardware and
+is also more amenable (produces a more stable system)
+in a model where ``theta`` is trained or controlled
+on the fly.
+
+Customizing trainability
+------------------------
+
+The ``LMU`` class allows users to choose which
+encoders and kernels they want to be trained.
+By default, every encoder and kernel is
+trainable, while the ``A`` and ``B`` matrices
+are not.
+
+.. testcode::
+
+    custom_lmu_layer = lmu.LMU(
+        units=10,
+        order=256,
+        theta=784,
+        trainable_input_encoders=True,
+        trainable_hidden_encoders=True,
+        trainable_memory_encoders=True,
+        trainable_input_kernel=True,
+        trainable_hidden_kernel=True,
+        trainable_memory_kernel=True,
+        trainable_A=False,
+        trainable_B=False,
+    )
+
+These trainability flags may be configured however
+you would like. The need for specific weights
+to be trained will vary on the task being modelled
+and the design of the network.
+
+Customizing initializers
+------------------------
+
+The ``LMU`` class allows users to customize
+the various initializers for the encoders
+and kernels. These define the distributions
+from which the initial values for the encoder
+or kernel weights will be drawn.
+
+.. testcode::
+
+    from tensorflow.keras.initializers import Constant
+
+    custom_lmu_layer = lmu.LMU(
+        units=10,
+        order=256,
+        theta=784,
+        input_encoders_initializer=Constant(1),
+        hidden_encoders_initializer=Constant(0),
+        memory_encoders_initializer=Constant(0),
+        input_kernel_initializer=Constant(0),
+        hidden_kernel_initializer=Constant(0),
+        memory_kernel_initializer="glorot_normal",
+    )
+
+These initializers may be configured using
+a variety of distributions. Accepted initializers
+are listed in the `TensorFlow documentation <https://www.tensorflow.org/api_docs/python/tf/keras/initializers>`_.
+Generally, we recommend the ``glorot_uniform``
+distribution for feed-forward weights, and an
+``orthogonal`` distribution for recurrent weights.