Skip to content

m2ci-msp/mri-shape-framework

Repository files navigation

MRI shape framework

Introduction

Originally, this framework derives a multilinear tongue model from a given MRI dataset by performing the steps described in Hewer et al. in a minimally supervised way. Since then, we updated the approach somewhat, have a look at the changelog to learn about changes since the original release.

Basically, this means that the framework automatically takes care of the dependencies between the different steps and calls the respective tools. As a user, you only have to provide the MRI data with labels and the settings you want to use for the dataset.

Requirements

Please make sure that the following tools are installed and are also available on your path:

Setup

The following sections explain how to add an MRI dataset to the framework and how it is configured. In the following, ${...} are always placeholder variables.

Adding a Dataset

Adding a dataset of MRI scans to the framework is straightforward: The scan files have to placed in the resources/mri folder where the following directory hierarchy is expected:

.
└── resources
    └── mri
        └── ${DATASET_NAME}
            └── ${SPEAKER_NAME}
                └── ${SCAN_NAME}
                    └── scan.json

where ${DATASET_NAME}, ${SPEAKER_NAME}, and ${SCAN_NAME} can be freely chosen. The file scan.json is the actual MRI scan in JSON format.

Configuring the Dataset

In this step, we create the configuration files for the dataset. These are placed in the configuration folder:

.
└── configuration
    └── ${DATASET_NAME}

where ${DATASET_NAME} is the same name chosen above.

First, we create the file database.json that contains meta information about the added dataset. This file contains a JSON list of JSON objects. Each object has the following structure:

{
    "prompt": "${LABEL}",
    "speaker": "${SPEAKER_NAME}",
    "missing": true|false,
    "id": "${SCAN_NAME}"
}

Here, prompt is a label that identifies the sound that was produced during the scan. The variables ${SCAN_NAME} and ${SPEAKER_NAME} are chosen like above. The flag missing indicates if the scan for the respective speaker and prompt is missing in the dataset. In this case, ${SCAN_NAME} should be set to something that also indicates that the scan is missing.

Furthermore, we create the file palateDatabase.json that contains information about scans that are used for deriving the hard palate shape of each speaker in the dataset. Again, we have a list of JSON objects:

{
    "prompt": "palate",
    "speaker": "${SPEAKER_NAME}",
    "missing": false,
    "id": "${SCAN_NAME}"
}

Afterwards, we add the file dataset.groovy to the folder that contains the following settings:

// these are settings for bootstrapping the tongue model
bootstrapTongue{

  // iterations to perform
  iterations = 4
  // iteration that should be used for the deriving the final model
  selectedIteration = 4

}

// these are settings for bootstrapping the palate shapes
bootstrapPalate{
  // is the bootstrapping active?
  active = true
  // iterations to perform
  iterations = 1
  // iteration to use for the shapes
  selectedIteration = 1
}


// settings for evaluation the final model
evaluation{
  priorSize = 2
  convergenceFactor = 10000000
  projectedGradientTolerance = 0.00001
  maxFunctionEvals = 1000
  // sample amount for specificity
  samples = 1000000
  // truncated modes for specificity
  truncatedPhoneme = 4
  truncatedSpeaker = 5
  // subsets to use, have a look at the resources/evaluation folder for available subsets.
  // you can also add your own
  subsets = ["bladetip", "combined", "bladebackdorsum"]
}

// settings for the final tongue model
createFinalTongueModel{
  truncatedSpeaker = 5
  truncatedPhoneme = 4
}

// settings for the final palate model
createFinalPalateModel{
  truncatedSpeaker = 11
}

// settings for performing the Procrustes alignment of the palate shapes
procrustesPalate{
  originIndex = 93
  iter = 40
}

dataset{
  name = "${DATASET_NAME}"
  speakers = ["${SPEAKER_NAME}", ... ]
}

Finally, we create, for each considered speaker in the dataset, a subfolder with the corresponding name. Such a folder must contain a file named speaker.groovy with the following settings:

speaker {
  // name of the speaker
  name = "${SPEAKER_NAME}"
  // list of scans to consider
  scans = ["${SCAN_NAME}", ...]
  palateScan = "${PALATE_SCAN_NAME}"
}

Here, scans is a list of names of the scans belonging to that speaker. palateScan refers to the name of the scan that is used for the palate estimation.

This folder contains also a settings.groovy file that may override the default settings. However, it should at least provide for each speaker the region of interest of the associated scans:

speaker {
  cropToVocalTract{
    minX = 41
    minY = 152
    minZ = 0
    maxX = 171
    maxY = 258
    maxZ = 43
  }
}

Providing Landmarks

It is necessary to provide landmarks for each considered scan. These should be organized as follows:

.
└── resources
    └── landmarksPalate
        └── ${DATASET_NAME}
            └── ${SPEAKER_NAME}
                └── ${SCAN_NAME}
                    └── landmarks.json
.
└── resources
    └── landmarksTongue
        └── ${DATASET_NAME}
            └── ${SPEAKER_NAME}
                └── ${SCAN_NAME}
                    └── landmarks.json

For the tongue, the following landmarks are available:

  • FrontBaseCenter
  • Tip
  • SurfaceCenter
  • BackCenter
  • Airway

For the palate, we have:

  • FrontTeethCenter
  • SlopeMiddleRight
  • SlopeMiddleCenter
  • SlopeMiddleLeft
  • SlopeEndedRight
  • SlopeEndedCenter
  • SlopeEndedLeft
  • BackLeft
  • BackCenter
  • BackRight

Open the blend files in resources/template with Blender to see where these landmarks are located on the template meshes. (They are stored in the form of VertexGroups) Basically, you can also add your own landmarks by modifying the blend files. We recommend using the landmark-tool of the MSP MRI Shape Tools to distribute the landmarks on the MRI scans.

Landmarks for palate reconstruction

Additionally, we need landmarks on the scan that is used for the palate estimation. Their names can be arbitrary, their positions, however, should describe a bounding box of a scan region that can only undergo rigid motions.

The files should be organized as follows:

.
└── resources
    └── landmarksAlignment
        └── ${DATASET_NAME}
            └── ${SPEAKER_NAME}
                └── ${PALATE_SCAN_NAME}
                    └── landmarks.json

Example Data

Please have a look at the example data repository for sample MRI data and corresponding configuration and landmark files.

Running the Framework

After the configuration is finished, you can run the following commands from the root directory.

Deriving the Models

    ./gradlew createFinalTongueModel
    ./gradlew createFinalPalateModel

These commands perform the necessary steps to derive the final tongue model (first command) and palate model (second command). The results are afterwards available under

.
└── build
    └── ${DATASET_NAME}
        └── createFinalTongueModel

and

.
└── build
    └── ${DATASET_NAME}
        └── createFinalPalateModel

In both cases, the model is output in YAML and JSON format. Have a look at the documentation of the MSP MRI Shape Tools to learn more about the data format. Furthermore, most results of the immediate steps are available in the subfolders of

.
└── build
    └── ${DATASET_NAME}

Here, for example the template matching or segmentation results can be inspected.

Evaluating the Tongue Model

Run the command

./gradlew evaluateTongueModel

to evaluate the specificity, generalization, and compactness of the model. Plots of the results can then be found in

.
└── build
    └── ${DATASET_NAME}
        └── evaluation

Visualizing the Results

Execute

./gradlew createTongueHTML

to create HTML visualizations of the achieved results after the initial matching and at each boostrap iteration.

The visualizations of the initial results are available in

.
└── build
    └── ${DATASET_NAME}
        └── html

The visualizations of the bootstrap results are present in

.
└── build
    └── ${DATASET_NAME}
        └── bootstrapTongue
            └── ${COUNT}
                └── html

where ${COUNT} is the number of the iteration.

In both cases, you can start a HTML server in the corresponding subfolder and then inspect the results in your browser.

The command

./gradlew createPalateHTML

produces visualizations of the matched and reconstructed palate meshes.

License

This work is licensed under the MIT license.

If you are using our framework, please cite, for the time being, the following paper:

@article{Hewer2018CSL,
  author = {Hewer, Alexander and Wuhrer, Stefanie and Steiner, Ingmar and Richmond, Korin},
  doi = {10.1016/j.csl.2018.02.001},
  eprint = {1612.05005},
  eprinttype = {arxiv},
  journal = {Computer Speech \& Language},
  month = sep,
  pages = {68-92},
  title = {A Multilinear Tongue Model Derived from Speech Related {MRI} Data of the Human Vocal Tract},
  url = {https://arxiv.org/abs/1612.05005},
  volume = {51},
  year = {2018}
}