Skip to content

Latest commit

 

History

History
240 lines (201 loc) · 21.5 KB

getting_started.md

File metadata and controls

240 lines (201 loc) · 21.5 KB

This document will help you get started with GaNDLF using a few representative examples.

Installation

Follow the installation instructions to install GaNDLF. When the installation is complete, you should end up with the following shell, which indicates that the GaNDLF virtual environment has been activated:

(venv_gandlf) $> ### subsequent commands go here

Running GaNDLF with GitHub Codespaces

Alternatively, you can launch a Codespace for GaNDLF by clicking this link:

Open in GitHub Codespaces

A codespace will open in a web-based version of Visual Studio Code. The dev container is fully configured with software needed for this project.

Note: Dev Containers is an open spec which is supported by GitHub Codespaces and other tools.

Sample Data

Sample data will be used for our extensive automated unit tests in all examples. You can download the sample data from this link. An example is shown below:

# continue from previous shell
(venv_gandlf) $> wget https://upenn.box.com/shared/static/y8162xkq1zz5555ye3pwadry2m2e39bs.zip -O ./gandlf_sample_data.zip
(venv_gandlf) $> unzip ./gandlf_sample_data.zip
# this should extract a directory called `data` in the current directory

The data directory content should look like the example below (for brevity, these locations shall be referred to as ${GANDLF_DATA} in the rest of the document):

# continue from previous shell
(venv_gandlf) $>  ls data
2d_histo_segmentation    2d_rad_segmentation    3d_rad_segmentation
# and a bunch of CSVs which can be ignored

Note: When using your own data, it is vital to correctly prepare your data prior to using it for any computational task (such as AI training or inference).

Segmentation

Segmentation using 3D Radiology Images

  1. Download and extract the sample data as described in the sample data. Alternatively, you can use your own data (see constructing CSV in usage for an example).

  2. Construct the main data file that will be used for the entire computation cycle. For the sample data for this task, the base location is ${GANDLF_DATA}/3d_rad_segmentation, and it will be referred to as ${GANDLF_DATA_3DRAD} in the rest of the document. Furthermore, the CSV should look like the example below (currently, the Label header is unused and ignored for classification/regression, which use the ValueToPredict header):

    SubjectID,Channel_0,Label
    001,${GANDLF_DATA_3DRAD}/001/image.nii.gz,${GANDLF_DATA_3DRAD}/001/mask.nii.gz
    002,${GANDLF_DATA_3DRAD}/002/image.nii.gz,${GANDLF_DATA_3DRAD}/002/mask.nii.gz
    003,${GANDLF_DATA_3DRAD}/003/image.nii.gz,${GANDLF_DATA_3DRAD}/003/mask.nii.gz
    004,${GANDLF_DATA_3DRAD}/004/image.nii.gz,${GANDLF_DATA_3DRAD}/004/mask.nii.gz
    005,${GANDLF_DATA_3DRAD}/005/image.nii.gz,${GANDLF_DATA_3DRAD}/005/mask.nii.gz
    006,${GANDLF_DATA_3DRAD}/006/image.nii.gz,${GANDLF_DATA_3DRAD}/006/mask.nii.gz
    007,${GANDLF_DATA_3DRAD}/007/image.nii.gz,${GANDLF_DATA_3DRAD}/007/mask.nii.gz
    008,${GANDLF_DATA_3DRAD}/008/image.nii.gz,${GANDLF_DATA_3DRAD}/008/mask.nii.gz
    009,${GANDLF_DATA_3DRAD}/009/image.nii.gz,${GANDLF_DATA_3DRAD}/009/mask.nii.gz
    010,${GANDLF_DATA_3DRAD}/010/image.nii.gz,${GANDLF_DATA_3DRAD}/010/mask.nii.gz
    
  3. Construct the configuration file to help design the computation (training and inference) pipeline. An example file for this task can be found here. This configuration has various levels of customization, and those details are presented on this page.

  4. Now you are ready to train your model.

  5. Once the model is trained, you can infer it on unseen data. Remember to construct a similar data file for the unseen data, just without Label or ValueToPredict headers.

Segmentation using 2D Histology Images

  1. Download and extract the sample data as described in the sample data. Alternatively, you can use your own data (see constructing CSV in usage for an example).

  2. Extract patches/tiles from the full-size whole slide images for training. A sample configuration to extract patches is presented here:

    num_patches: 3
    patch_size:
    - 1000m
    - 1000m
  3. Assuming the output will be stored in ${GANDLF_DATA}/histo_patches_output, you can refer to this location as ${GANDLF_DATA_HISTO_PATCHES} in the rest of the document.

  4. Construct the main data file that will be used for the entire computation cycle. The sample data for this task should be generated after the patches are extracted. It should look like the following example (currently, the Label header is unused and ignored for classification/regression, which use the ValueToPredict header):

    SubjectID,Channel_0,Label
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_720-3344.png,${GANDLF_DATA_HISTO_PATCHES}/1/mask/mask_patch_720-3344_LM.png
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_816-3488.png,${GANDLF_DATA_HISTO_PATCHES}/1/mask/mask_patch_816-3488_LM.png
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_960-3376.png,${GANDLF_DATA_HISTO_PATCHES}/1/mask/mask_patch_960-3376_LM.png
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_976-3520.png,${GANDLF_DATA_HISTO_PATCHES}/1/mask/mask_patch_976-3520_LM.png
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_1024-3216.png,${GANDLF_DATA_HISTO_PATCHES}/1/mask/mask_patch_1024-3216_LM.png
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_1104-3360.png,${GANDLF_DATA_HISTO_PATCHES}/1/mask/mask_patch_1104-3360_LM.png
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_1168-3104.png,${GANDLF_DATA_HISTO_PATCHES}/1/mask/mask_patch_1168-3104_LM.png
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_1248-3248.png,${GANDLF_DATA_HISTO_PATCHES}/1/mask/mask_patch_1248-3248_LM.png
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_1312-3056.png,${GANDLF_DATA_HISTO_PATCHES}/1/mask/mask_patch_1312-3056_LM.png
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_1392-3200.png,${GANDLF_DATA_HISTO_PATCHES}/1/mask/mask_patch_1392-3200_LM.png
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_720-3344.png,${GANDLF_DATA_HISTO_PATCHES}/2/mask/mask_patch_720-3344_LM.png
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_816-3488.png,${GANDLF_DATA_HISTO_PATCHES}/2/mask/mask_patch_816-3488_LM.png
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_960-3376.png,${GANDLF_DATA_HISTO_PATCHES}/2/mask/mask_patch_960-3376_LM.png
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_976-3520.png,${GANDLF_DATA_HISTO_PATCHES}/2/mask/mask_patch_976-3520_LM.png
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_1024-3216.png,${GANDLF_DATA_HISTO_PATCHES}/2/mask/mask_patch_1024-3216_LM.png
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_1104-3360.png,${GANDLF_DATA_HISTO_PATCHES}/2/mask/mask_patch_1104-3360_LM.png
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_1168-3104.png,${GANDLF_DATA_HISTO_PATCHES}/2/mask/mask_patch_1168-3104_LM.png
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_1248-3248.png,${GANDLF_DATA_HISTO_PATCHES}/2/mask/mask_patch_1248-3248_LM.png
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_1312-3056.png,${GANDLF_DATA_HISTO_PATCHES}/2/mask/mask_patch_1312-3056_LM.png
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_1392-3200.png,${GANDLF_DATA_HISTO_PATCHES}/2/mask/mask_patch_1392-3200_LM.png
    
  5. Construct the configuration file that will help design the computation (training and inference) pipeline. An example file for this task can be found here. This configuration has various levels of customization, and those details are presented on this page.

  6. Now you are ready to train your model.

  7. Once the model is trained, you can infer it on unseen data. Remember to construct a similar data file for the unseen data, but without Label or ValueToPredict headers.

Note: Please consider the special considerations for histology images during inference.

Classification

Classification using 3D Radiology Images

  1. Download and extract the sample data as described in the sample data. Alternatively, you can use your own data (see constructing CSV in usage for an example).

  2. Construct the main data file that will be used for the entire computation cycle. For the sample data for this task, the base location is ${GANDLF_DATA}/3d_rad_segmentation, and it will be referred to as ${GANDLF_DATA_3DRAD} in the rest of the document. The CSV should look like the following example (currently, the Label header is unused and ignored for classification/regression, which use the ValueToPredict header):

    SubjectID,Channel_0,ValueToPredict
    001,${GANDLF_DATA_3DRAD}/001/image.nii.gz,0
    002,${GANDLF_DATA_3DRAD}/002/image.nii.gz,1
    003,${GANDLF_DATA_3DRAD}/003/image.nii.gz,0
    004,${GANDLF_DATA_3DRAD}/004/image.nii.gz,2
    005,${GANDLF_DATA_3DRAD}/005/image.nii.gz,0
    006,${GANDLF_DATA_3DRAD}/006/image.nii.gz,1
    007,${GANDLF_DATA_3DRAD}/007/image.nii.gz,0
    008,${GANDLF_DATA_3DRAD}/008/image.nii.gz,2
    009,${GANDLF_DATA_3DRAD}/009/image.nii.gz,0
    010,${GANDLF_DATA_3DRAD}/010/image.nii.gz,1
    
  3. Construct the configuration file that will help design the computation (training and inference) pipeline. An example file for this task can be found here. This configuration has various levels of customization, and those details are presented on this page.

  4. Now you are ready to train your model.

  5. Once the model is trained, you can infer it on unseen data. Remember to construct a similar data file for the unseen data, but without Label or ValueToPredict headers.

Classification (patch-level) using 2D Histology Images

  1. Download and extract the sample data as described in the sample data. Alternatively, you can use your own data (see constructing CSV in usage for an example).

  2. Extract patches/tiles from the full-size whole slide images for training. A sample configuration to extract patches is presented here:

    num_patches: 3
    patch_size:
    - 1000m
    - 1000m
  3. Assuming the output will be stored in ${GANDLF_DATA}/histo_patches_output, you can refer to this location as ${GANDLF_DATA_HISTO_PATCHES} in the rest of the document.

  4. Construct the main data file that will be used for the entire computation cycle. The sample data for this task should be generated after the patches are extracted. It should look like the following example (currently, the Label header is unused and ignored for classification/regression, which use the ValueToPredict header):

    SubjectID,Channel_0,ValueToPredict
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_720-3344.png,0
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_816-3488.png,0
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_960-3376.png,0
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_976-3520.png,0
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_1024-3216.png,1
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_1104-3360.png,1
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_1168-3104.png,1
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_1248-3248.png,1
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_1312-3056.png,1
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_1392-3200.png,1
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_720-3344.png,0
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_816-3488.png,0
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_960-3376.png,0
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_976-3520.png,0
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_1024-3216.png,1
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_1104-3360.png,1
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_1168-3104.png,0
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_1248-3248.png,1
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_1312-3056.png,1
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_1392-3200.png,1
    
  5. Construct the configuration file that will help design the computation (training and inference) pipeline. An example file for this task can be found here. This configuration has various levels of customization, and those details are presented on this page.

  6. Now you are ready to train your model.

  7. Once the model is trained, you can infer it on unseen data. Remember to construct a similar data file for the unseen data, but without Label or ValueToPredict headers.

Note: Please consider the special considerations for histology images during inference.

Regression

Regression using 3D Radiology Images

  1. Download and extract the sample data as described in the sample data. Alternatively, you can use your own data (see constructing CSV in usage for an example).

  2. Construct the main data file that will be used for the entire computation cycle. For the sample data for this task, the base location is ${GANDLF_DATA}/3d_rad_segmentation, and it will be referred to as ${GANDLF_DATA_3DRAD} in the rest of the document. The CSV should look like the following example (currently, the Label header is unused and ignored for classification/regression, which use the ValueToPredict header):

    SubjectID,Channel_0,ValueToPredict
    001,${GANDLF_DATA_3DRAD}/001/image.nii.gz,0.4
    002,${GANDLF_DATA_3DRAD}/002/image.nii.gz,1.2
    003,${GANDLF_DATA_3DRAD}/003/image.nii.gz,0.2
    004,${GANDLF_DATA_3DRAD}/004/image.nii.gz,2.3
    005,${GANDLF_DATA_3DRAD}/005/image.nii.gz,0.4
    006,${GANDLF_DATA_3DRAD}/006/image.nii.gz,1.2
    007,${GANDLF_DATA_3DRAD}/007/image.nii.gz,0.3
    008,${GANDLF_DATA_3DRAD}/008/image.nii.gz,2.2
    009,${GANDLF_DATA_3DRAD}/009/image.nii.gz,0.1
    010,${GANDLF_DATA_3DRAD}/010/image.nii.gz,1.5
    
  3. Construct the configuration file that will help design the computation (training and inference) pipeline. An example file for this task can be found here. This configuration has various levels of customization, and those details are presented on this page.

  4. Now you are ready to train your model.

  5. Once the model is trained, you can infer it on unseen data. Remember to construct a similar data file for the unseen data, but without Label or ValueToPredict headers.

Regression (patch-level) using 2D Histology Images

  1. Download and extract the sample data as described in the sample data. Alternatively, you can use your own data (see constructing CSV in usage for an example).

  2. Extract patches/tiles from the full-size whole slide images for training. A sample configuration to extract patches is presented here:

    num_patches: 3
    patch_size:
    - 1000m
    - 1000m
  3. Assuming the output will be stored in ${GANDLF_DATA}/histo_patches_output, you can refer to this location as ${GANDLF_DATA_HISTO_PATCHES} in the rest of the document.

  4. Construct the main data file that will be used for the entire computation cycle. The sample data for this task should be generated after the patches are extracted. It should look like the following example (currently, the Label header is unused and ignored for classification/regression, which use the ValueToPredict header):

    SubjectID,Channel_0,ValueToPredict
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_720-3344.png,0.1
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_816-3488.png,0.2
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_960-3376.png,0.3
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_976-3520.png,0.6
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_1024-3216.png,1.5
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_1104-3360.png,1.3
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_1168-3104.png,1.0
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_1248-3248.png,1.5
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_1312-3056.png,1.1
    1,${GANDLF_DATA_HISTO_PATCHES}/1/image/image_patch_1392-3200.png,1.2
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_720-3344.png,0.4
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_816-3488.png,0.5
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_960-3376.png,0.2
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_976-3520.png,0.5
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_1024-3216.png,1.2
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_1104-3360.png,1.2
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_1168-3104.png,0.2
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_1248-3248.png,1.3
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_1312-3056.png,1.2
    2,${GANDLF_DATA_HISTO_PATCHES}/2/image/image_patch_1392-3200.png,1.1
    
  5. Construct the configuration file that will help design the computation (training and inference) pipeline. An example file for this task can be found here. This configuration has various levels of customization, and those details are presented on this page.

  6. Now you are ready to train your model.

  7. Once the model is trained, you can infer it on unseen data. Remember to construct a similar data file for the unseen data, but without Label or ValueToPredict headers.

  8. Note: Please consider the special considerations for histology images during inference.