New features related to the revision of the accompanying manuscript (#6)

* Added new features and examples (cross-validation, synapse parameter correlation) * Changed afferent section types in accordance with MorphIO (1: soma, 2: axon, 3: basal dendrite, 4: apical dendrite) * Improved readme and documentation
BlueBrain · Nov 5, 2024 · 32c0e55 · 32c0e55
1 parent c34b2a0
commit 32c0e55
Show file tree

Hide file tree

Showing 57 changed files with 2,501 additions and 348 deletions.
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -1,6 +1,14 @@
 Changelog
 =========
 
+Version 1.0.2
+-------------
+
+- Added new features and examples (cross-validation, synapse parameter correlation)
+- Changed afferent section types in accordance with MorphIO (1: soma, 2: axon, 3: basal dendrite, 4: apical dendrite)
+- Improved readme and documentation
+
+
 Version 1.0.1
 -------------
 
@@ -17,8 +25,6 @@ Version 1.0.0
 Version 0.0.11.dev1
 -------------------
 
-New Features
-~~~~~~~~~~~~
 - New synapse position re-use mode "reuse_strict" with re-use restricted to source selection
 - Minor fixes for empty data splits, node selection, and data logs
 - Additional examples
@@ -27,8 +33,6 @@ New Features
 Version 0.0.11.dev0
 -------------------
 
-New Features
-~~~~~~~~~~~~
 - Added offset operation to synapse properties alteration
 - Minor fixes
 
@@ -42,8 +46,6 @@ Version 0.0.10
 Version 0.0.10.dev4
 -------------------
 
-New Features
-~~~~~~~~~~~~
 - Added license & copyright for open-sourcing
 - Added example notebook
 - Added version info to log files
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -7,8 +7,7 @@ As a contributor, here are the guidelines we would like you to follow:
  - [Issues and Bugs](#found-a-bug)
  - [Feature Requests](#missing-a-feature)
  - [Submissions](#submission-guidelines)
- - [Development Guidelines](#development)
- - [Release Procedure](#release)
+ - [How to extend](#how-to-extend)
 
 # Got a Question?
 
@@ -19,7 +18,7 @@ Please do not hesitate to raise an issue on [github project page][github].
 If you find a bug in the source code, you can help us by [submitting an issue](#issues)
 to our [GitHub Repository][github]. Even better, you can [submit a Pull Request](#pull-requests) with a fix.
 
-#  Missing a Feature?
+# Missing a Feature?
 
 You can *request* a new feature by [submitting an issue](#issues) to our GitHub Repository.
 If you would like to *implement* a new feature, please submit an issue with a proposal for your 
@@ -107,33 +106,37 @@ the main (upstream) repository:
 
 [github]: https://github.com/BlueBrain/connectome-manipulator
 
-# Development Environment
+# How to extend
 
-Please make sure to install the project requirements,
-see the [dependencies](./README.md#dependencies) section in top README.
+The connectome manipulation framework has been developed using reusable primitives, such as Python classes and specific file structures for individual code modules that allow easy extension of the framework in order to add new functionality. Specifically, new types of (stochastic) models, tools for fitting them, new manipulation operations, and additional structural validation methods can be added to the code repository as outlined below:
 
-This section applies to both Python versions 2 and 3.
+## Models
 
-## Setup
+All models are implemented under [`/model_building/model_types.py`](connectome_manipulator/model_building/model_types.py) and are derived from an abstract base class `AbstractModel` which provides general functionality for loading/saving models and evaluating them, i.e., returning the model output given its input. Specific functionality must be implemented in a respective derived class which must define the model parameter names (`param_names`; i.e., variables storing the internal representation of the model), default parameter values (`param_defaults`), names of data frames (`data_names`; for large data elements, if any, that would be stored as associated HDF5 file), and input names (`input_names`; i.e., input variables the model output depends on). Moreover, the derived class must provide implementations of `get_model_output()` for returning the model output given its input variables, and `__str__()` for returning a string representation describing the model. When initializing a concrete model instance, values for all specified model parameters and data frames must be provided. Values for model parameters can be omitted in case default parameter values have been defined instead.
 
-It is recommended to use `virtualenv` to develop in a sandbox environment:
+Another useful (abstract) base class `PathwayModel` exists which can be used in the same way as outlined above, but which already includes pathway-specific model parameterization. Specifically, it allows to store different parameter values dependent on pairs of pre-synaptic (`src_type`) and post-synaptic (`tgt_type`) m-types, together with default values in case no pathway is specified.
 
-```
-virtualenv venv
-. venv/bin/activate
-pip install -r tests/requirement_tests.txt
-```
+## Model fitting functions
 
-## Build
+All model fitting functions are implemented as separate code modules (.py files) under [`/model_building`](connectome_manipulator/model_building) and must always contain the following functions for implementing the three steps of model building:
 
-Run the following command to build incrementally the project: `pip install -e .`
+  - `extract()` for extracting relevant data (e.g., connection probabilities at binned distances) from a given connectome which will be stored automatically in a .pickle file by the framework
+  - `build()` for fitting model parameters against the data extracted during the previous step and initializing a model instance which will then be stored automatically as a .json file, optionally together with an associated HDF5 file
+  - `plot()` for generating visualizations of the extracted data versus the model output, and storing them in the output folder
 
-## Test
+Importantly, arbitrary parameters (optionally, including default values) can be added as keyword arguments to any of the three functions, values of which must be provided through a configuration file (see *Configuration file structure* in the [Documentation](https://connectome-manipulator.readthedocs.io/en/netneuro-24-0092-rev1/config_file_structure.html)) when launching model building.
 
-Run the following command to run the Python unit-tests: `pytest tests`
+## Manipulations
 
-## Coding conventions
+All manipulations are derived from an abstract base class `Manipulation` which is implemented in [`/connectome_manipulation/manipulation/base.py`](connectome_manipulator/connectome_manipulation/manipulation/base.py). The base class provides access to the neurons of a network model (through `self.nodes`) as well as to the input (i.e., before a manipulation) and output (i.e., after a manipulation) synapse tables (through `self.writer`). An alternative (abstract) base class, `MorphologyCachingManipulation`, exists which additionally provides efficient access to morphologies (through `self._get_tgt_morphs`) including a caching mechanism, i.e., without reloading them from the file system in case of repeated invocations.
 
-The code coverage of the Python unit-tests may not decrease over time.
-It means that every change must go with their corresponding Python unit-tests to
-validate the library behavior as well as to demonstrate the API usage. 
+A concrete manipulation must be implemented in a derived classes and stored in a separate code module (.py file) under [`/connectome_manipulation/manipulation`](connectome_manipulator/connectome_manipulation/manipulation). It must contain an implementation for the `apply()` method which must return a new synapse table (through `self.writer`) as a result of the manipulation. Importantly, arbitrary parameters (optionally, including default values) can be added as keyword arguments to the `apply()` method, values of which must be provided through a configuration file (see *Configuration file structure* in the [Documentation](https://connectome-manipulator.readthedocs.io/en/netneuro-24-0092-rev1/config_file_structure.html)) when launching a manipulation.
+
+## Structural comparison functions
+
+All structural comparison functions are implemented as separate code modules (.py files) under [`/connectome_comparison`](connectome_manipulator/connectome_comparison) and must always contain functions for implementing the two following steps:
+
+  - `compute()` for computing specific structural features from a given connectome (e.g., connection probability by layer), which will be evaluated for both connectomes to compare and results of which will be automatically stored as .pickle files by the framework
+  - `plot()` for plotting a graphical representation of individual feature instances (e.g., 2D matrix plot of connection probabilities by layer) or the difference between two such instances, which will be automatically stored in a compound output figure when comparing two connectomes
+
+Importantly, arbitrary parameters (optionally, including default values) can be added as keyword arguments to the two functions, values of which must be provided through a configuration file (see *Configuration file structure* in the [Documentation](https://connectome-manipulator.readthedocs.io/en/netneuro-24-0092-rev1/config_file_structure.html)) when launching a structural comparison.
diff --git a/README.rst b/README.rst
@@ -24,14 +24,15 @@ Table of contents
    -  `Structural comparison`_
 
 5. `Examples`_
-6. `How to contribute`_
-7. `Citation`_
-8. `Publications that use or mention Connectome-Manipulator`_
+6. `Documentation`_
+7. `How to contribute`_
+8. `Citation`_
+9. `Publications that use or mention Connectome-Manipulator`_
 
    -  `Scientific papers that use Connectome-Manipulator`_
    -  `Posters that use Connectome-Manipulator`_
 
-9. `Funding & Acknowledgment`_
+10. `Funding & Acknowledgment`_
 
 Introduction
 ------------
@@ -62,6 +63,11 @@ All dependencies declared in ``setup.py`` and are available from PyPI, including
 
 Recommended Python version: v3.10.8
 
+❗ Compatibility notes
+~~~~~~~~~~~~~~~~~~~~~~
+
+The software famework is intended to be used on Linux/MacOS-based systems! Specifically, some dependencies, like ``libsonata``, are currently not compatible with Microsoft Windows OS.
+
 Framework overview
 ------------------
 
@@ -73,11 +79,11 @@ consists of the following main components:
 
 -  | **Connectome manipulator**
    | As specified in the config, applies one or a sequence of manipulations to a given SONATA connectome, and writes the manipulated connectome to a new SONATA edges file. All manipulations are separately implemented in sub-modules and can be easily extended.
-   | Details can be found in the corresponding README file in the repository: `connectome_manipulation/README.md <https://github.com/BlueBrain/connectome-manipulator/blob/main/connectome_manipulator/connectome_manipulation/README.md>`_
+   | Details can be found in the corresponding README file in the repository: `connectome_manipulation/README.md <connectome_manipulator/connectome_manipulation/README.md>`_
 
 -  | **Model building**
    | As specified in the config, builds a model from a given connectome and writes the model to a file to be loaded and used by specific manipulations requiring a model (e.g., model-based rewiring based on connection probability model). All models are separately implemented in sub-modules and can be easily extended.
-   | Details can be found in the corresponding README file in the repository: `model_building/README.md <https://github.com/BlueBrain/connectome-manipulator/blob/main/connectome_manipulator/model_building/README.md>`_
+   | Details can be found in the corresponding README file in the repository: `model_building/README.md <connectome_manipulator/model_building/README.md>`_
 
       Notes:
 
@@ -87,7 +93,9 @@ consists of the following main components:
 
 -  | **Structural comparator**
    | As specified in the config, performs a structural comparison of the original and manipulated connectomes. Different structural parameters to compare (connection probability, synapses per connection, ...) are separately implemented in sub-modules and can be easily extended.
-   | Details can be found in the corresponding README file in the repository: `connectome_comparison/README.md <https://github.com/BlueBrain/connectome-manipulator/blob/main/connectome_manipulator/connectome_comparison/README.md>`_
+   | Details can be found in the corresponding README file in the repository: `connectome_comparison/README.md <connectome_manipulator/connectome_comparison/README.md>`_
+
+The structure of the respective configuration files can be found under `doc/source/config_file_structure.rst <doc/source/config_file_structure.rst>`_
 
 ℹ️ More details can be also found in the accompanying publication (esp.
 *Supplementary tables*), see `Citation`_.
@@ -180,6 +188,24 @@ Please note that this feature will require at least 4 MPI ranks. Dask will use 2
 
 When processing with ``parallel-manipulator``, one may pass the flag ``--target-payload`` to determine how big the individual workload for each process should be. The default value of 20e9 was determined empirically to run on the whole mouse brain with 75 million neurons. We recommend to use this value as a starting point and scale it up or down to achieve the desired runtime characteristics.
 
+Details on the CONFIG file structure can be found under `doc/source/config_file_structure.rst <doc/source/config_file_structure.rst>`_
+
+❗ Notes on error handling
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Errors may occur for many different reasons and are not always easy to track. Most common errors are that an allocation gets "killed", either due to a time limit or due to an out-of-memory error. Here we provide a few hints on how to avoid or track errors that may occur:
+
+-  Use the "verbose" mode (``-v`` flag) which will produce a lot of log output.
+-  Look into the log files: there is usually one master log file and individual log files for all data splits, all of which can be found in the ``/logs`` subfolder of the output circuit folder.
+-  Use a small connectome to start with.
+-  Use a simple operation to start with, such as ``null_manipulation`` (see examples).
+-  Run serially to start with, before switching to parallel processing.
+-  Start with a single data split.
+-  But: In case of memory errors, use more than a single data splits, even when running serially (!), which will reduce the memory consumption as individual splits will be processed one after the other.
+-  When running in parallel, use ``--tasks-per-node`` in the SLURM configuration to define how many tasks (=splits) will be executed on a single node; reducing this number may reduce the risk of out-of-memory errors.
+-  In general: Increasing memory allocation and/or allocation time may help.
+-  For high performance: Allocate many nodes and use parallel processing together with a relatively large number of data splits depending on the network size (i.e., aim for a few hundered post-synaptic neurons per data split).
+
 Model building
 ~~~~~~~~~~~~~~
 
@@ -190,9 +216,13 @@ Model building
      Extract and build models from existing connectomes.
 
    Options:
-     --force-reextract  Force re-extraction of data, in case already existing.
-     --force-rebuild    Force model re-building, in case already existing.
-     --help             Show this message and exit.
+     --force-reextract   Force re-extraction of data, in case already existing.
+     --force-rebuild     Force model re-building, in case already existing.
+     --cv-folds INTEGER  Optional number of cross-validation folds, overwrites
+                         value in config file
+     --help              Show this message and exit.
+
+Details on the CONFIG file structure can be found under `doc/source/config_file_structure.rst <doc/source/config_file_structure.rst>`_
 
 Structural comparison
 ~~~~~~~~~~~~~~~~~~~~~
@@ -210,15 +240,22 @@ Structural comparison
                            in case already existing.
      --help                Show this message and exit.
 
+Details on the CONFIG file structure can be found under `doc/source/config_file_structure.rst <doc/source/config_file_structure.rst>`_
+
 Examples
 --------
 
-Examples can be found under `examples/ <https://github.com/BlueBrain/connectome-manipulator/tree/main/examples>`_ in the repository.
+Examples can be found under `examples/ </examples>`_ in the repository.
+
+Documentation
+-------------
+
+The full documentation (API reference, CONFIG file structure, ...) can be found on `Read the Docs <https://connectome-manipulator.readthedocs.io/en/netneuro-24-0092-rev1>`_.
 
 How to contribute
 -----------------
 
-Contribution guidelines can be found in `CONTRIBUTING.md <https://github.com/BlueBrain/connectome-manipulator/blob/main/CONTRIBUTING.md>`_ in the repository.
+Contribution guidelines can be found in `CONTRIBUTING.md <CONTRIBUTING.md>`_ in the repository.
 
 Citation
 --------
@@ -244,11 +281,11 @@ Publications that use or mention Connectome-Manipulator
 Scientific papers that use Connectome-Manipulator
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
--  Michael W. Reimann, Sirio Bolaños-Puchet, Jean-Denis Courcol, Daniela Egas Santander, et al. (2022) **Modeling and Simulation of Neocortical Micro- and Mesocircuitry. Part I: Anatomy.** bioRxiv 2022.08.11.503144. DOI: `10.1101/2022.08.11.503144 <https://doi.org/10.1101/2022.08.11.503144>`_
+-  Michael W. Reimann, Sirio Bolaños-Puchet, Jean-Denis Courcol, Daniela Egas Santander, et al. (2024) **Modeling and Simulation of Neocortical Micro- and Mesocircuitry. Part I: Anatomy.** eLife, 13:RP99688. DOI: `10.7554/eLife.99688.1 <https://doi.org/10.7554/eLife.99688.1>`_
 
--  James B. Isbister, András Ecker, Christoph Pokorny, Sirio Bolaños-Puchet, Daniela Egas Santander, et al. (2023) **Modeling and Simulation of Neocortical Micro- and Mesocircuitry.** Part II: Physiology and Experimentation. bioRxiv 2023.05.17.541168. DOI: `10.1101/2023.05.17.541168 <https://doi.org/10.1101/2023.05.17.541168>`_
+-  James B. Isbister, András Ecker, Christoph Pokorny, Sirio Bolaños-Puchet, Daniela Egas Santander, et al. (2023) **Modeling and Simulation of Neocortical Micro- and Mesocircuitry. Part II: Physiology and Experimentation.** bioRxiv, 2023.05.17.541168. DOI: `10.1101/2023.05.17.541168 <https://doi.org/10.1101/2023.05.17.541168>`_
 
--  Daniela Egas Santander, Christoph Pokorny, András Ecker, Jānis Lazovskis, Matteo Santoro, Jason P. Smith, Kathryn Hess, Ran Levi, and Michael W. Reimann. (2024) **Efficiency and reliability in biological neural network architectures.** bioRxiv 2024.03.15.585196. DOI: `10.1101/2024.03.15.585196 <https://doi.org/10.1101/2024.03.15.585196>`_
+-  Daniela Egas Santander, Christoph Pokorny, András Ecker, Jānis Lazovskis, Matteo Santoro, Jason P. Smith, Kathryn Hess, Ran Levi, and Michael W. Reimann. (2024) **Efficiency and reliability in biological neural network architectures.** bioRxiv, 2024.03.15.585196. DOI: `10.1101/2024.03.15.585196 <https://doi.org/10.1101/2024.03.15.585196>`_
 
 Posters that use Connectome-Manipulator
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~