Extend documentation for fastMRI (#468)

microsoft · Jun 1, 2021 · 5c5687b · 5c5687b
1 parent 51274c8
commit 5c5687b
Show file tree

Hide file tree

Showing 3 changed files with 62 additions and 24 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -9,10 +9,24 @@ For each Pull Request, the affected code parts should be briefly described and a
 Once a release is done, the "Upcoming" section becomes the release changelog, and a new empty "Upcoming" should be
 created.
 
+
 ## Upcoming
 
 ### Added
 
+### Changed
+
+### Fixed
+
+### Removed
+
+### Deprecated
+
+
+## 0.3 (2021-06-01)
+
+### Added
+
 - ([#454](https://github.com/microsoft/InnerEye-DeepLearning/pull/454)) Checking that labels are mutually exclusive.
 - ([#447](https://github.com/microsoft/InnerEye-DeepLearning/pull/447/)) Added a sanity check to ensure there are no
   missing channels, nor missing files. If missing channels in the csv file or filenames associated with channels are
@@ -140,8 +154,9 @@ console for easier diagnostics.
 - ([#450](https://github.com/microsoft/InnerEye-DeepLearning/pull/450)) Delete unused `classification_report.ipynb`.
 - ([#455](https://github.com/microsoft/InnerEye-DeepLearning/pull/455)) Removed the AzureRunner conda environment.
   The full InnerEye conda environment is needed to submit a training job to AzureML.
--  ([#458](https://github.com/microsoft/InnerEye-DeepLearning/pull/458)) Getting rid of all the unused code for 
+- ([#458](https://github.com/microsoft/InnerEye-DeepLearning/pull/458)) Getting rid of all the unused code for 
    RandAugment & Co. The user has now instead complete freedom to specify the set of augmentations to use.
+- ([#468](https://github.com/microsoft/InnerEye-DeepLearning/pull/468)) Removed the `KneeSinglecoil` example model
 
 ### Deprecated
 

diff --git a/InnerEye/ML/configs/other/fastmri_varnet.py b/InnerEye/ML/configs/other/fastmri_varnet.py
@@ -133,25 +133,6 @@ def get_data_module(self) -> LightningDataModule:
                                        test_path="multicoil_test_v2")
 
 
-class KneeSinglecoil(FastMri):
-    """
-    A model configuration to train a VarNet model on the knee_singlecoil dataset, with 4x acceleration.
-    """
-
-    def __init__(self) -> None:
-        super().__init__()
-        self.azure_dataset_id = "knee_singlecoil"
-        # If the Azure nodes run out of disk space when downloading the dataset, re-submit with the
-        # --use_dataset_mount=True flag. The dataset will be mounted to the fixed path given here.
-        self.dataset_mountpoint = "/tmp/knee_singlecoil"
-
-    def get_data_module(self) -> LightningDataModule:
-        return get_fastmri_data_module(azure_dataset_id=self.azure_dataset_id,
-                                       local_dataset=self.local_dataset,
-                                       sample_rate=self.sample_rate,
-                                       test_path="singlecoil_test_v2")
-
-
 class BrainMulticoil(FastMri):
     """
     A model configuration to train a VarNet model on the brain_multicoil dataset, with 4x acceleration.

diff --git a/docs/fastmri.md b/docs/fastmri.md
@@ -45,7 +45,7 @@ extract all the AWS access tokens from the `curl` commands.
 
 Then run the script to download the dataset as follows, providing the path the the file with the curl commands
 and the connection string as commandline arguments, enclosed in quotes:
-`python InnerEye/Scripts/prepare_fastmri.py --curl curl.txt --connection_string "<your_connection_string"` --location westeurope
+`python InnerEye/Scripts/prepare_fastmri.py --curl curl.txt --connection_string "<your_connection_string>"` --location westeurope
 
 This script will
 - Authenticate against Azure either using the Service Principal credentials that you set up in Step 3 of the
@@ -83,8 +83,8 @@ If set up correctly, this is the Azure storage account that holds all datasets u
 Hence, after the downloading completes, you are ready to use the InnerEye toolbox to submit an AzureML job that uses
 the FastMRI data.
 
-There are 3 example models already coded up in the InnerEye toolbox, defined in 
-[fastmri_varnet.py](../InnerEye/ML/configs/other/fastmri_varnet.py): `KneeSinglecoil`, `KneeMulticoil`, and 
+There are 2 example models already coded up in the InnerEye toolbox, defined in 
+[fastmri_varnet.py](../InnerEye/ML/configs/other/fastmri_varnet.py): `KneeMulticoil` and 
 `BrainMulticoil`. As with all InnerEye models, you can start a training run by specifying the name of the class 
 that defines the model, like this:
 ```shell script
@@ -152,4 +152,46 @@ python InnerEye/ML/runner.py --model BrainMulticoil --azureml=True --use_dataset
 This job should pick up the existing cache file, and output a message like "Copying a pre-computed dataset cache 
 file ..."
 
-The same trick can of course be applied to the other models as well (`KneeSinglecoil`, `KneeMulticoil`).
+The same trick can of course be applied to other models as well (`KneeMulticoil`).
+
+
+# Running on a GPU machine
+
+You can of course run the InnerEye fastMRI models on a reasonably large machine with a GPU for development and 
+debugging purposes. Before running, we recommend to download the datasets using a tool 
+like [azcopy](http://aka.ms/azcopy) into a folder, for example the `datasets` folder at the repository root.
+
+To use `azcopy`, you will need the access key to the storage account that holds your data - it's the same storage
+account that was used when creating the Data Factory that downloaded the data.
+- To get that, navigate to the [Azure Portal](https://portal.azure.com), and search for the storage account 
+that you created to hold your datasets (Step 4 in [AzureML setup](setting_up_aml.md)). 
+- On the left hand navigation, there is a section "Access Keys". Select that and copy out one of the two keys (_not_
+the connection strings). The key is a base64 encoded string, it should not contain any special characters apart from 
+`+`, `/`, `.` and `=`
+
+Then run this script in the repository root folder:
+```shell script
+mkdir datasets
+azcopy --source-key <storage_account_key> --source https://<your_storage_acount>.blob.core.windows.net/datasets/brain_multicoil --destination datasets/brain_multicoil --recursive
+```
+Replace `brain_multicoil` with any of the other datasets names if needed.
+
+If you follow these suggested folder structures, there is no further change necessary to the models. You can then
+run, for example, the `BrainMulticoil` model by dropping the `--azureml=True` flag like this:
+```shell script
+python InnerEye/ML/runner.py --model BrainMulticoil
+```
+The code will recognize that an Azure dataset named `brain_multicoil` is already present in the `datasets` folder,
+and skip the download.
+
+If you choose to download the dataset to a different folder, for example `/foo/brain_multicoil`, you will need to
+make a small adjustment to the model in [fastmri_varnet.py](../InnerEye/ML/configs/other/fastmri_varnet.py),
+and add the `local_dataset` argument like this:
+```python
+class BrainMulticoil(FastMri):
+    def __init__(self) -> None:
+        super().__init__()
+        self.azure_dataset_id = "brain_multicoil"
+        self.local_dataset = Path("/foo/brain_multicoil")
+        self.dataset_mountpoint = "/tmp/brain_multicoil"
+```