Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Update documentation for submodules #481

Merged
merged 4 commits into from
Jun 9, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,9 +100,18 @@ Further detailed instructions, including setup in Azure, are here:
1. [Debugging and monitoring models](docs/debugging_and_monitoring.md)
1. [Model diagnostics](docs/model_diagnostics.md)
1. [Move a model to a different workspace](docs/move_model.md)
1. [Deployment](docs/deploy_on_aml.md)
1. [Working with FastMRI models](docs/fastmri.md)

## Deployment
We offer a companion set of open-sourced tools that help to integrate trained CT segmentation models with clinical
software systems:
- The [InnerEye-Gateway](https://github.com/microsoft/InnerEye-Gateway) is a Windows service running in a DICOM network,
that can route anonymized DICOM images to an inference service.
- The [InnerEye-Inference](https://github.com/microsoft/InnerEye-Inference) component offers a REST API that integrates
with the InnnEye-Gateway, to run inference on InnerEye-DeepLearning models.

Details can be found [here](docs/deploy_on_aml.md).

![docs/deployment.png](docs/deployment.png)

## More information
Expand Down
26 changes: 5 additions & 21 deletions docs/environment.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,29 +4,13 @@

In order to work with the solution, your OS environment will need [git](https://git-scm.com/) and [git lfs](https://git-lfs.github.com/) installed. Depending on the OS that you are running the installation instructions may vary. Please refer to respective documentation sections on the tools' websites for detailed instructions.

## Using the InnerEye code as a git submodule of your project
We recommend using PyCharm or VSCode as the Python editor.

You have two options for working with our codebase:
* You can fork the InnerEye-DeepLearning repository, and work off that.
* You can fork the InnerEye-DeepLearning repository, and work off that. We recommend that because it is easiest to set up.
* Or you can create your project that uses the InnerEye-DeepLearning code, and include InnerEye-DeepLearning as a git
submodule.

If you go down the second route, here's the list of files you will need in your project (that's the same as those
given in [this document](building_models.md))
* `environment.yml`: Conda environment with python, pip, pytorch
* `settings.yml`: A file similar to `InnerEye\settings.yml` containing all your Azure settings
* A folder like `ML` that contains your additional code, and model configurations.
* A file `ML/runner.py` that invokes the InnerEye training runner, but that points the code to your environment and Azure
settings; see the [Building models](building_models.md) instructions for details.

You then need to add the InnerEye code as a git submodule, in folder `innereye-submodule`:
```shell script
git submodule add https://github.com/microsoft/InnerEye-DeepLearning innereye-submodule
```
Then configure your Python IDE to consume *both* your repository root *and* the `innereye-submodule` subfolder as inputs.
In Pycharm, you would do that by going to Settings/Project Structure. Mark your repository root as "Source", and
`innereye-submodule` as well.

We recommend using PyCharm or VSCode as the Python editor.
submodule. We only recommended that if you are very handy with Python. More details about this option
[are here](innereye_as_submodule.md).

## Windows Subsystem for Linux Setup
When developing on a Windows machine, we recommend using [the Windows Subsystem for Linux, WSL2](https://docs.microsoft.com/en-us/windows/wsl/about).
Expand Down
95 changes: 95 additions & 0 deletions docs/innereye_as_submodule.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Using the InnerEye code as a git submodule of your project

You can use InnerEye as a submodule in your own project.
If you go down that route, here's the list of files you will need in your project (that's the same as those
given in [this document](building_models.md))
* `environment.yml`: Conda environment with python, pip, pytorch
* `settings.yml`: A file similar to `InnerEye\settings.yml` containing all your Azure settings
* A folder like `ML` that contains your additional code, and model configurations.
* A file like `myrunner.py` that invokes the InnerEye training runner, but that points the code to your environment
and Azure settings; see the [Building models](building_models.md) instructions for details. Please see below for how
`myrunner.py` should look like.

You then need to add the InnerEye code as a git submodule, in folder `innereye-deeplearning`:
```shell script
git submodule add https://github.com/microsoft/InnerEye-DeepLearning innereye-deeplearning
```
Then configure your Python IDE to consume *both* your repository root *and* the `innereye-deeplearning` subfolder as inputs.
In Pycharm, you would do that by going to Settings/Project Structure. Mark your repository root as "Source", and
`innereye-deeplearning` as well.

Example commandline runner that uses the InnerEye runner (called `myrunner.py` above):
```python
import sys
from pathlib import Path


# This file here mimics how the InnerEye code would be used as a git submodule.

# Ensure that this path correctly points to the root folder of your repository.
repository_root = Path(__file__).absolute()


def add_package_to_sys_path_if_needed() -> None:
"""
Checks if the Python paths in sys.path already contain the /innereye-deeplearning folder. If not, add it.
"""
is_package_in_path = False
innereye_submodule_folder = repository_root / "innereye-deeplearning"
for path_str in sys.path:
path = Path(path_str)
if path == innereye_submodule_folder:
is_package_in_path = True
break
if not is_package_in_path:
print(f"Adding {innereye_submodule_folder} to sys.path")
sys.path.append(str(innereye_submodule_folder))


def main() -> None:
try:
from InnerEye import ML # noqa: 411
except:
add_package_to_sys_path_if_needed()

from InnerEye.ML import runner
print(f"Repository root: {repository_root}")
# Check here that yaml_config_file correctly points to your settings file
runner.run(project_root=repository_root,
yaml_config_file=Path("settings.yml"),
post_cross_validation_hook=None)


if __name__ == '__main__':
main()

```

## Adding new models

1. Set up a directory outside of InnerEye to holds your configs. In your repository root, you could have a folder
`InnerEyeLocal`, parallel to the InnerEye submodule, alongside `settings.yml` and `myrunner.py`.

The example below creates a new flavour of the Glaucoma model in `InnerEye/ML/configs/classification/GlaucomaPublic`.
All that needs to be done is change the dataset. We will do this by subclassing GlaucomaPublic in a new config
stored in `InnerEyeLocal/configs`
1. Create folder `InnerEyeLocal/configs`
1. Create a config file `InnerEyeLocal/configs/GlaucomaPublicExt.py` which extends the `GlaucomaPublic` class
like this:
```python
from InnerEye.ML.configs.classification.GlaucomaPublic import GlaucomaPublic

class MyGlaucomaModel(GlaucomaPublic):
def __init__(self) -> None:
super().__init__()
self.azure_dataset_id="name_of_your_dataset_on_azure"
```
1. In `settings.yml`, set `model_configs_namespace` to `InnerEyeLocal.configs` so this config
is found by the runner. Set `extra_code_directory` to `InnerEyeLocal`.

#### Start Training
Run the following to start a job on AzureML:
```
python myrunner.py --azureml=True --model=MyGlaucomaModel
```
See [Model Training](building_models.md) for details on training outputs, resuming training, testing models and model ensembles.
113 changes: 47 additions & 66 deletions docs/sample_tasks.md
Original file line number Diff line number Diff line change
@@ -1,116 +1,97 @@
# Sample Tasks

Two sample tasks for the classification and segmentation pipelines.
This document will walk through the steps in [Training Steps](building_models.md), but with specific examples for each task.
This document contains two sample tasks for the classification and segmentation pipelines.

The document will walk through the steps in [Training Steps](building_models.md), but with specific examples for each task.
Before trying tp train these models, you should have followed steps to set up an [environment](environment.md) and [AzureML](setting_up_aml.md)

## Sample classification task: Glaucoma Detection on OCT volumes

This example is based on the paper [A feature agnostic approach for glaucoma detection in OCT volumes](https://arxiv.org/pdf/1807.04855v3.pdf).

### Downloading and preparing the dataset
1. The dataset is available [here](https://zenodo.org/record/1481223#.Xs-ehzPiuM_) <sup>[[1]](#1)</sup>.
The dataset is available [here](https://zenodo.org/record/1481223#.Xs-ehzPiuM_) <sup>[[1]](#1)</sup>.

1. After downloading and extracting the zip file, run the [create_glaucoma_dataset_csv.py](https://github.com/microsoft/InnerEye-DeepLearning/blob/main/InnerEye/Scripts/create_glaucoma_dataset_csv.py)
After downloading and extracting the zip file, run the [create_glaucoma_dataset_csv.py](https://github.com/microsoft/InnerEye-DeepLearning/blob/main/InnerEye/Scripts/create_glaucoma_dataset_csv.py)
script on the extracted folder.
```
python create_dataset_csv.py /path/to/extracted/folder
```
This will convert the dataset to csv form and create a file `dataset.csv`.

1. Upload this folder (with the images and `dataset.csv`) to Azure Blob Storage. For details on creating a storage account,
Finally, upload this folder (with the images and `dataset.csv`) to Azure Blob Storage. For details on creating a storage account,
see [Setting up AzureML](setting_up_aml.md#step-4-create-a-storage-account-for-your-datasets). The dataset should go
into a container called `datasets`, with a folder name of your choice (`name_of_your_dataset_on_azure` in the
description below).

### Setting up training
### Creating the model configuration and starting training

You have two options for running the Glaucoma model:
- You can directly work on a fork of the InnerEye repository. In this case, you need to modify `AZURE_DATASET_ID`
in `GlaucomaPublic.py` to match the dataset upload location, called `name_of_your_dataset_on_azure` above.
If you choose that, you can start training via
```
python InnerEye/ML/runner.py --model=GlaucomaPublic --azureml=True
```
- Alternatively, you can create a separate runner and a separate model configuration folder. The steps described
below refer to this route.

#### Setting up a second runner
1. Set up a directory outside of InnerEye to holds your configs, as in
[Setting Up Training](building_models.md#setting-up-training). After this step, you should have a folder InnerEyeLocal
beside InnerEye with files `settings.yml` and `ML/runner.py`.

#### Creating the classification model configuration
The full configuration for the Glaucoma model is at `InnerEye/ML/configs/classification/GlaucomaPublic`.
All that needs to be done is change the dataset. We will do this by subclassing GlaucomaPublic in a new config
stored in `InnerEyeLocal/ML`
1. Create folder configs/classification under InnerEyeLocal/ML
1. Create a config file called GlaucomaPublicExt.py there which extends the GlaucomaPublic class that looks like
Next, you need to create a configuration file `InnerEye/ML/configs/MyGlaucoma.py`
which extends the GlaucomaPublic class like this:
```python
from InnerEye.ML.configs.classification.GlaucomaPublic import GlaucomaPublic


class GlaucomaPublicExt(GlaucomaPublic):
class MyGlaucomaModel(GlaucomaPublic):
def __init__(self) -> None:
super().__init__()
self.azure_dataset_id="name_of_your_dataset_on_azure"
```
1. In `settings.yml`, set `model_configs_namespace` to `InnerEyeLocal.ML.configs` so this config
is found by the runner. Set `extra_code_directory` to `InnerEyeLocal`.
The value for `self.azure_dataset_id` should match the dataset upload location, called
`name_of_your_dataset_on_azure` above.

#### Start Training
Run the following to start a job on AzureML
Once that config is in place, you can start training in AzureML via
```
python InnerEyeLocal/ML/runner.py --azureml=True --model=GlaucomaPublicExt
python InnerEye/ML/runner.py --model=MyGlaucomaModel --azureml=True
```
See [Model Training](building_models.md) for details on training outputs, resuming training, testing models and model ensembles.

As an alternative to working with a fork of the repository, you can use InnerEye-DeepLearning via a submodule.
Please check [here](innereye_as_submodule.md) for details.


## Sample segmentation task: Segmentation of Lung CT

This example is based on the [Lung CT Segmentation Challenge 2017](https://wiki.cancerimagingarchive.net/display/Public/Lung+CT+Segmentation+Challenge+2017) <sup>[[2]](#2)</sup>.

### Downloading and preparing the dataset

1. The dataset <sup>[[3]](#3)[[4]](#4)</sup> can be downloaded [here](https://wiki.cancerimagingarchive.net/display/Public/Lung+CT+Segmentation+Challenge+2017#021ca3c9a0724b0d9df784f1699d35e2).
1. The next step is to convert the dataset from DICOM-RT to NIFTI. Before this, place the downloaded dataset in another
parent folder, which we will call `datasets`. This file structure is expected by the converison tool.
1. Use the [InnerEye-CreateDataset](https://github.com/microsoft/InnerEye-createdataset) to create a NIFTI dataset
from the downloaded (DICOM) files.
The dataset <sup>[[3]](#3)[[4]](#4)</sup> can be downloaded [here](https://wiki.cancerimagingarchive.net/display/Public/Lung+CT+Segmentation+Challenge+2017#021ca3c9a0724b0d9df784f1699d35e2).

You need to convert the dataset from DICOM-RT to NIFTI. Before this, place the downloaded dataset in another
parent folder, which we will call `datasets`. This file structure is expected by the conversion tool.

Next, use the
[InnerEye-CreateDataset](https://github.com/microsoft/InnerEye-createdataset) commandline tools to create a
NIFTI dataset from the downloaded (DICOM) files.
After installing the tool, run
```batch
InnerEye.CreateDataset.Runner.exe dataset --datasetRootDirectory=<path to the 'datasets' folder> --niftiDatasetDirectory=<output folder name for converted dataset> --dicomDatasetDirectory=<name of downloaded folder inside 'datasets'> --geoNorm 1;1;3
```
Now, you should have another folder under `datasets` with the converted Nifti files.
The `geonorm` tag tells the tool to normalize the voxel sizes during conversion.
1. Upload this folder (with the images and dataset.csv) to Azure Blob Storage. For details on creating a storage account,
see [Setting up AzureML](setting_up_aml.md#step-4-create-a-storage-account-for-your-datasets).


### Setting up training
1. Set up a directory outside of InnerEye to holds your configs, as in
[Setting Up Training](building_models.md#setting-up-training). After this step, you should have a folder InnerEyeLocal
beside InnerEye with files settings.yml and ML/runner.py.

### Creating the segmentation model configuration
The full configuration for the Lung model is at InnerEye/ML/configs/segmentation/Lung.
All that needs to be done is change the dataset. We will do this by subclassing Lung in a new config
stored in InnerEyeLocal/ML
1. Create folder configs/segmentation under InnerEyeLocal/ML
1. Create a config file called LungExt.py there which extends the GlaucomaPublic class that looks like this:
```python
from InnerEye.ML.configs.segmentation.Lung import Lung

class LungExt(Lung):
Finally, upload this folder (with the images and dataset.csv) to Azure Blob Storage. For details on creating a storage account,
see [Setting up AzureML](setting_up_aml.md#step-4-create-a-storage-account-for-your-datasets). All files should go
into a folder in the `datasets` container, for example `my_lung_dataset`. This folder name will need to go into the
`azure_dataset_id` field of the model configuration, see below.

### Creating the model configuration and starting training
You can then create a new model configuration, based on the template
[Lung.py](../InnerEye/ML/configs/segmentation/Lung.py). To do this, create a file
`InnerEye/ML/configs/segmentation/MyLungModel.py`, where you create a subclass of the template Lung model, and
add the `azure_dataset_id` field (i.e., the name of the folder that contains the uploaded data from above),
so that it looks like:
```python
from InnerEye.ML.configs.segmentation.Lung import Lung
class MyLungModel(Lung):
def __init__(self) -> None:
super().__init__(azure_dataset_id="name_of_your_dataset_on_azure")
```
1. In `settings.yml`, set `model_configs_namespace` to `InnerEyeLocal.ML.configs` so this config
is found by the runner. Set `extra_code_directory` to `InnerEyeLocal`.
super().__init__()
self.azure_dataset_id = "my_lung_dataset"
```
If you are using InnerEye as a submodule, please add this configuration in your private configuration folder,
as described for the Glaucoma model [here](innereye_as_submodule.md).

### Start Training
Run the following to start a job on AzureML
You can now run the following command to start a job on AzureML:
```
python InnerEyeLocal/ML/runner.py --azureml=True --model=LungExt --train=True
python InnerEye/ML/runner.py --azureml=True --model=MyLungModel
```
See [Model Training](building_models.md) for details on training outputs, resuming training, testing models and model ensembles.

Expand Down