Skip to content
Huan He edited this page Feb 2, 2023 · 27 revisions

At present, MedTator has four tabs to cover the core annotation steps, including document annotation, corpus statistics, annotation export and IAA calculation. The functionalities are introduced as follows:

Annotation Tab

This tab allows the user to annotate texts according to pre-defined schema by coordinated four views, including:

  1. A: the file list view shows the summary of files and the annotation status of each file.
  2. B: the tagging view shows the content of the selected file and the visualized entity tags, relation tags, and annotation hints in selected file.
  3. C: the concept list view shows all entity and relation concepts in the schema and the count of each concept annotated in the selected file.
  4. D: the tag list view shows the detailed information of the annotated tags, such as spans, text, and attributes.

4 views

Schema and annotation files import

MedTator supports two ways to import schema file and annotation files.

The first one: drag and drop files from file explorer (e.g., Finder on MacOS) to the box. You can also drag a folder that contains the XML files or TXT files to import the annotation files.

import schema and annotation files by dragging and dropping

And the second one: click on the schema box to open the file select dialog and upload file.

import schema and annotation files by dialog

When the schema file is imported, the concept list view will show the concept names. And when the annotation files are imported, the file list view will show the file names and the total number of files.

Annotation file selection

MedTator support multi-document annotation and the imported files are listed in the file list view. The number of imported files is displayed at the top the file list. To help users find the file to annotated easier, there is a filter box displayed at the top the file list, which support file name matching.

the current working file and total number of files

Entity annotation

The entity tag can be annotated through the tagging view by three steps:

  1. Highlighting the text to be tagged.
  2. Right click in the tagging view (or tap with two fingers on trackpad in MacOS).
  3. Click the entity name in the popup menu.

annotate tag by highlighting and clicking the popup menu

In addition to the entity annotation by clicking, MedTator also supports shortcut keys for quick annotation. In the concept list view and the popup menu, there is a number or a letter on the left of each concept name, which is the shortcut key for that concept. For example, as shown in the following figure, the number key 1 is assigned to the VAX concept, 2 to the PYREXIA, 3 to CHILL, 4 to COUGH, etc.

Annotation shortcut keys

With the shortcut keys, the entity annotation could be done in just two steps:

  1. Highlighting the text to be tagged.
  2. Press the corresponding shortcut key.

For example, when annotating a headache concept, you could first highlight the token “headache” in the tagging view, then press the shortcut key w.

Discontinuous span annotation

MedTator supports discontinuous span annotation by using additional key.

For Windows and Linux systems:

  1. hold down the Ctrl on the keyboard,
  2. and while still holding the Ctrl, highlight each token you want to annotate,
  3. once all discontinuous tokens are highlighted, release the Ctrl and right click mouse on any position in the editor.

The following video demonstrates how to do that:

Discontinuous span annotation on Windows and Linux

For macOS system:

  1. hold down the command/cmd on the keyboard,
  2. and while still holding the command/cmd, highlight each token you want to annotate,
  3. once all discontinuous tokens are highlighted, release the command/cmd and right click mouse on any position in the editor.

The following video demonstrates how to do that:

Discontinuous span annotation on macOS

Document-level annotation

MedTator supports document-level annotation with customized schema file for document-level annotation task. As introduced in the schema file design, by setting the spans attribute for an entity concept, the said entity concept can be used for document-level annotation.

To add a document-level annotation, the process is similar to entity annotation, but just takes two steps:

  1. Right click in the tagging view (or tap with two fingers on trackpad in MacOS).
  2. Click the concept name in the popup menu.

document-level annotation

Or, as shown in the above figure, you could also click the “+” button in the concept list to add a document-level tag.

Relation annotation

MedTator provides two methods to annotate a relation tag.

In tagging view

The relation concept could be added by the following steps:

  1. Click on the annotated tag, a popup menu will be displayed which contains available relation concepts. You could select the one which is needed.
  2. A floating panel would be displayed based on the relation concept decided by the previous step, you could (1) click on the tag to be added and select the attribute from the popup menu. Or (2) use this floating panel to change other attributes and finish relation annotation.

add relation tag in tagging view in two ways

For example, as shown in the above figure, we have added two entity tags, i.e., a severity tag “mild” and an AE tag “pain”. First, you could click on the “pain” tag, a popup menu will be displayed, and you could click the “LK_AE_SVRT – link_AE” option in this menu to add “pain” tag as an attribute in a new LK_AE_SVRT relation tag. Secondly, you could click on the “mild” tag and select “LK_AE_SVRT – link_SVRT” attribute to finish the relation annotation.

Or you will find that a floating panel is display with all the attributes in the LK_AE_SVRT tag. You could select the link_SVRT attribute from the dropdown menu and click the “Done Linking” button to add a new relation tag.

In concept list

In addition to the previous method, the relation concept could also be added by two steps:

  1. Click the “+” button in the concept list.
  2. Modify the entity link the tag list.

add relation tag in concept list

For example, as shown in the above figure, two entity tags have (i.e., an AE concept and a SVRT concept) have been annotated. To add a new relation tag, first click the “+” button of the LK_AE_SVRT concept in the concept list. Then, a new empty LK_AE_SVRT tag will be created and displayed in the tag list, you could modify the attributes and select the existing entities in the dropdown selections to complete the details.

Attribute modification

The attributes of each tag can be modified in the tag list.

attribute value modification

For each type of attribute defined in the schema file, MedTator provides the following method to modify values:

  • The ID type attribute which is used in the relation tags is displayed as an dropdown box, whose values come from the annotated entity tags. The entity tag id, concept name, and extracted text will be displayed as the option for reference.
  • The CDATA type attribute is displayed as an input box, in which you could modify the text as needed.
  • The value set type attribute is displayed as a dropdown box, in which you could select pre-defined values.

Hint marks

MedTator could show annotation hints based on the annotated tags by highlighting tokens in dotted boxes. To reduce workload, once some tokens are annotated as a tag, MedTator will search for the same tokens in the files and show the matched tokens in a dashed box as a “Hint”. The letter next to the dashed box represents the concept abbreviation, and the color represents the entity color.

annotation hints

For example, in the above annotation, the user annotated “Nausea” as a Symptom entity. Then MedTator will highlight other “Nausea” in the file as hints (i.e., dashed boxes) and show an “S” with same blue color. Instead of highlighting and selecting the entity again, you can click the dashed box to annotate it.

The hints are searched in ALL imported files. So even if a token is not annotated in the current file, it can still be highlighted as a hint. For example, as shown in the blue rectangle, the “headache” is not annotated in the current file (doc_02), but it is still highlighted in dashed boxes as “headache” is annotated in other imported files (doc_01 and doc_03).

If you want to hide hints, you can turn off this feature by selecting “No Hint” in the menu

annotation hints

Document display mode

MedTator supports two different display mode for showing the document in the tagging view.

display mode

As shown in the above figure, you can select different displays in the menu:

  1. Document mode: In this mode, the content in the selected file will be displayed in its original format.
  2. Sentence mode: MedTator splits the document into sentence by an open-source JavaScript library “compromise” and creates a mapping of the offsets of each sentence in the original document. The blank lines will be removed in this mode.

While in sentence mode, the annotations are still saved with their original spans. MedTator will calculate the offsets automatically when rendering the annotated tags and exporting the annotations.

Visualize text and tags

MedTator supports presenting the text and tags in the document to provide better visualization, which can be used for publications or other purpose. You can click the "Visualize" button to open a dialog window to show the visualized tags and text.

image

This feature supports two modes of visualizing:

  1. Document Visualization: You can click the "Visualize" button or the "Visualize Whole Document" button to show the entire document and all of the annotated tags (both entities and relations) in the current document.
  2. Selection Visualization: You can highlight a sentence or a paragraph, and then click the "Visualize" button or the "Visualize Selection" button to visualize the tags within the highlighted text.

The visualization is generated by brat, and it is a SVG-based figure. You can zoom in the web browsers to take a screenshot for larger high-resolution figure.

Save annotations

By using the HTML5 techniques, MedTator supports saving file to local disk with the File System Access API.

two ways of saving annotation

As shown in the above figure, MedTator provides two ways to save annotation file:

  • A: Save the current working file. By clicking the “Save” button, the current working file, which is the “doc2.txt.xml” will be saved. Moreover, by clicking the “Save as” button, MedTator will ask the user to save current file to a new copy instead of saving to current working file.
  • B: Save a specific file. By clicking the yellow disk icon that is on the left of a file name, the corresponding file will be saved. In the above figure, when clicking the yellow disk icon, the “doc1.txt.xml” will be saved. The current working file “doc2.txt.xml” will NOT be saved, because the clicked yellow disk button is linked to the “doc1.txt.xml”.

Set file labels

MedTator support color labels for annotation files. As shown in the following figure, users can set color labels by clicking the button or the item in the dropdown list. Then the label will be displayed before the filename accordingly.

Set file labels

Users can also remove the label by clicking the item in the dropdown list.

Statistics tab

MedTator provides real-time statistics on the annotated tags. Whenever a new annotation is added or existing annotation is modified, the statistics can be updated in this tab.

Basic summary

MedTator provides a basic summary on the annotations in the Annotation tab. For example, if we import the COVID_VAX_AE sample, the Annotation may look like the following:

the basic summary

The left panel will show a basic summary, such as number of document and tags.

Annotated tag statistics

In addition to the basic summary, the summaries of each file are also displayed in a heatmap to help understand how many tags are annotated for each file and each concept. You can also donwload the summary and report as an Excel file.

the exported statistics

Moreover, the statistics tab will also show the detailed annotation tags in each entity concept with the source file location. For example, as shown in the figure, the “mild” SVRT tag is annotated twice in two files. One is in the doc1.txt.xml, the other one is in the doc2.txt.xml file.

Export tab

MedTator could export the annotations to different format for downstream tasks. The detailed format may be changed in future. For example, by clicking the “Tag Text” button, MedTator will create a .tsv file which contains the concept name, annotated text, and the count:

export as tag and text

By clicking the “Tag & Sentence” button, MedTator will create a .tsv file which contains more columns on the context information of each annotated tag.

export as tag with context sentence

In addition to the text, MedTator could also export the annotations to .tsv file in IOB2/BIO format for name-entity recognition, to MedTagger ruleset package, or jsonl format for spaCy ruleset. More formats will be added in future to support other downstream tasks.

How to use the exported data

The exported IOB2/BIO format files can be used for name-entity recognition training / evaluation task. For example, you can fine-tune a BERT-based model with the exported IOB2 / BIO format files. For more technical details, see HuggingFace document (https://huggingface.co/docs/transformers/v4.13.0/en/custom_datasets).

The exported MedTagger ruleset package could be used by MedTagger IE rule engine (https://github.com/OHNLP/MedTagger) .

The exported jsonl format pattern file can be used by spaCy NLP rule-based entity recognition module. It contains entity patterns defined by spaCy (https://spacy.io/usage/rule-based-matching#entityruler) and can be read to load patterns for named entity and text classification labelling.

IAA tab

MedTator supports IAA calculation and adjudication of two annotators in this tab.

Before start IAA calculation and adjudication, you need to load the schema file in the MedTator. Then, the documents need to be annotated by two annotators.

export as tag and text

For example, in the COVID_VAX_AE sample dataset, we have three documents, namely doc1.txt, doc2.txt, and doc3.txt. Two annotators annotated separately and finally got two annotations on each document, A_doc1.txt.xml, A_doc2.txt.xml, and A_doc3.txt.xml are from annotator A, while B_doc1.txt.xml, B_doc2.txt.xml, and B_doc3.txt.xml are from annotator B. Then you could drag and drop the A_doc1.txt.xml, A_doc2.txt.xml, and A_doc3.txt.xml to the annotator A box, B_doc1.txt.xml, B_doc2.txt.xml, and B_doc3.txt.xml to the annotator B box. MedTator will read those files and show the number of tags and files in each box.

IAA calculation

After import the annotations from two annotators, you could specify the overlap ratio which is the threshold to measure whether both annotators have an agreement on same text.

the overlap ratio for IAA calculation

By default, the overlap ratio is 50%, which means both annotators would have an agreement on an annotated text in the same concept if both of them annotated it and the overlap of the annotated tag is equal or greater than 50%. For example, as shown in the above figure, annotator A annotated an AE concept “left arm soreness”, while annotator B annotate the “arm soreness at elbow”. These two annotated tags are not exactly match each other, so it is needed to calculate how much the overlap is to decide whether two annotators have an agreement on the annotation. As we can see in the figure, the spans of the overlapped part is 5-16, which is 12 characters, and the whole annotation covers spans 0-25, which is 26 characters. So the overlap ratio of these two tags is 12 / 26 = 46.15%. As it is smaller than the defined threshold 50%, there is no agreement on this annotation. The results of all tags from both annotators will be used in the calculation of the file-level, concept-level, and overall IAA score.

After setting the overlap ratio, you could click the “Calculate” button to calculate the IAA from different levels.

calculate the IAA F-score

Then MedTator will use the given overlap ratio to calculate, the result would look like the following according to the annotations:

IAA calculation result

By default, MedTator will show the F1 score results. The IAA result contains three panels, (A) summary showing the overall F1 and the concept level F1, (B) the file level result showing the detailed results of the selected concept level grouped by file, and (C) the document level result showing the detailed tags. All panels linked with each other, when clicking on the concept or the file item, other views will be updated accordingly. As show in the above figure, when selected the “OVERALL” F1 in the summary, the files will show the results of all tags. For the doc1.txt.xml, annotator A and B achieves a F1 result of 0.44. the label “AB: 2” indicates that both annotator A and B agree on the 2 annotations, “A+: 0” indicates that there is no annotation that only agreed by annotator A, and “B+: 5” indicates that there are 5 tags that are only annotated by annotator B.

Then, the document level panel shows the detailed tags, order by the concept. In this panel, the results are displayed in three columns, the first column is the annotations from annotator A, the second column is the from annotator B, and the last column is for adjudication. In each column, the tags are displayed in dotted boxes with the attributes and context text. If a tag is agreed by both annotators, it will be displayed as a green dotted box in both first and second column. If a tag is only annotated by one annotator, it will be displayed as a red dotted box in one column.

Measures of IAA

MedTator supports different measures of IAA, including F1 score and Cohen's Kappa coefficient. After clicking the Calculate button, MedTator will automatically calcuate all measures. You can specify the measure to be displayed:

IAA results selection

For example, by selecting the Cohen's Kappa Coefficient, the results will look like the following:

IAA result of Cohen's Kappa

IAA Report

You can export the IAA result as an Excel xlsx file to check details in other software:

IAA calculation result export

Then, in the downloaded report file, you could file four sheets which contain the summary of overall F1 score, the summary of each file, the details of each tag annotated by two annotators, and adjudication tab.

IAA calculation exported report

In the 4th tab of the adjudication, the IAA calculation results of each matched tags are displayed. The annotated tags from two annotators are listed side by side for comparison and rendered in two different colors.

IAA calculation exported report

Adjudication

The adjudication column in the document level will generate a default gold standard based on the annotations from both annotator A and B. You could accept or reject a tag by clicking the “Accept” or “Reject” button displayed on the top left of a tag box. When the adjudication on one document is finished, you could set a green checked mark on this document. This check mark is just for a visual reminder, and it won’t affect the annotations.

IAA calculation and adjudication

You could download the adjudication results of all documents in a zip file by clicking the “Download All” button in the menu:

Adjudication result download

Then you could use the exported file as gold standard for downstream tasks.

Adjudication in Annotation Tab

You can use the Annotation Tab to further adjudicate the annotations.

As shown in the following figure, you can revise the agreement status of each tag to make it easier to be identified whether a tag is adjudicated.

Adjudication in Annotation Tab

In addition, you can also use the Set Labels button in the ribbon menu to update the status of an annotation file. For example, setting a green label means the annotation file is adjudicated; yellow means it is with concerns; red means high risk, etc.

Converter Tab

In this tab, MedTator can convert the data files of other formats to MedTator's XML format. This function can be used for adjudication, transfer legacy projects, and error analysis. As the difference between formats, the converted results may LOSE information due to various factors. Please double check the results carefully. You can also contact us to discuss the issue you met.

MedTagger

MedTagger's default output results (.ann files) can be converted to MedTator format.

Converter UI

Before converting, please load the annotation schema (the .dtd file) in the Annotation Tab. To convert the MedTagger's results, you need to follow these steps:

  1. Load the text files. As MedTator format uses raw text as part of the annotation file, you need to provide the raw text files (.txt). You can drag and drop many files to the box, or a folder that contains the text files (.txt).
  2. Load the output files. The MedTagger output files (.ann) will be used to generated the tags in the MedTator format. You can drag and drop many files to the box, or a folder that contains the MedTagger output files (.ann). The file names of the MedTagger output files MUST be the text file name with ".ann" as suffix, otherwise MedTator cannot match the file correctly.
  3. Click the Convert Files button. MedTator will read the given files, extract the data for tagging, generate the new files of MedTator format. During the process, the start, end, norm, and text attributes in the MedTagger .ann files will be used for converting, while other attributes will be ignored.
  4. Click the Download as zip button. Once the conversion is finished. You can download all the converted XML files in a zip file for further usage.