-
Notifications
You must be signed in to change notification settings - Fork 19
Manual
At present, MedTator has four tabs to cover the core annotation steps, including document annotation, corpus statistics, annotation export and IAA calculation. The functionalities are introduced as follows:
This tab allows the user to annotate texts according to pre-defined schema by coordinated four views, including:
- A: the file list view shows the summary of files and the annotation status of each file.
- B: the tagging view shows the content of the selected file and the visualized entity tags, relation tags, and annotation hints in selected file.
- C: the concept list view shows all entity and relation concepts in the schema and the count of each concept annotated in the selected file.
- D: the tag list view shows the detailed information of the annotated tags, such as spans, text, and attributes.
MedTator supports two ways to import schema file and annotation files.
The first one: drag and drop files from file explorer (e.g., Finder on MacOS) to the box. You can also drag a folder that contains the XML files or TXT files to import the annotation files.
And the second one: click on the schema box to open the file select dialog and upload file.
When the schema file is imported, the concept list view will show the concept names. And when the annotation files are imported, the file list view will show the file names and the total number of files.
MedTator support multi-document annotation and the imported files are listed in the file list view. The number of imported files is displayed at the top the file list. To help users find the file to annotated easier, there is a filter box displayed at the top the file list, which support file name matching.
The entity tag can be annotated through the tagging view by three steps:
- Highlighting the text to be tagged.
- Right click in the tagging view (or tap with two fingers on trackpad in MacOS).
- Click the entity name in the popup menu.
In addition to the entity annotation by clicking, MedTator also supports shortcut keys for quick annotation. In the concept list view and the popup menu, there is a number or a letter on the left of each concept name, which is the shortcut key for that concept. For example, as shown in the following figure, the number key 1
is assigned to the VAX
concept, 2
to the PYREXIA
, 3
to CHILL
, 4
to COUGH
, etc.
With the shortcut keys, the entity annotation could be done in just two steps:
- Highlighting the text to be tagged.
- Press the corresponding shortcut key.
For example, when annotating a headache
concept, you could first highlight the token “headache” in the tagging view, then press the shortcut key w
.
MedTator supports discontinuous span annotation by using additional key.
For Windows and Linux systems:
- hold down the
Ctrl
on the keyboard, - and while still holding the
Ctrl
, highlight each token you want to annotate, - once all discontinuous tokens are highlighted, release the
Ctrl
and right click mouse on any position in the editor.
The following video demonstrates how to do that:
For macOS system:
- hold down the
command
/cmd
on the keyboard, - and while still holding the
command
/cmd
, highlight each token you want to annotate, - once all discontinuous tokens are highlighted, release the
command
/cmd
and right click mouse on any position in the editor.
The following video demonstrates how to do that:
MedTator supports document-level annotation with customized schema file for document-level annotation task. As introduced in the schema file design, by setting the spans attribute for an entity concept, the said entity concept can be used for document-level annotation.
To add a document-level annotation, the process is similar to entity annotation, but just takes two steps:
- Right click in the tagging view (or tap with two fingers on trackpad in MacOS).
- Click the concept name in the popup menu.
Or, as shown in the above figure, you could also click the “+” button in the concept list to add a document-level tag.
MedTator provides two methods to annotate a relation tag.
The relation concept could be added by the following steps:
- Click on the annotated tag, a popup menu will be displayed which contains available relation concepts. You could select the one which is needed.
- A floating panel would be displayed based on the relation concept decided by the previous step, you could (1) click on the tag to be added and select the attribute from the popup menu. Or (2) use this floating panel to change other attributes and finish relation annotation.
For example, as shown in the above figure, we have added two entity tags, i.e., a severity tag “mild” and an AE tag “pain”. First, you could click on the “pain” tag, a popup menu will be displayed, and you could click the “LK_AE_SVRT – link_AE” option in this menu to add “pain” tag as an attribute in a new LK_AE_SVRT
relation tag. Secondly, you could click on the “mild” tag and select “LK_AE_SVRT – link_SVRT” attribute to finish the relation annotation.
Or you will find that a floating panel is display with all the attributes in the LK_AE_SVRT
tag. You could select the link_SVRT
attribute from the dropdown menu and click the “Done Linking” button to add a new relation tag.
In addition to the previous method, the relation concept could also be added by two steps:
- Click the “+” button in the concept list.
- Modify the entity link the tag list.
For example, as shown in the above figure, two entity tags have (i.e., an AE
concept and a SVRT
concept) have been annotated. To add a new relation tag, first click the “+” button of the LK_AE_SVRT
concept in the concept list. Then, a new empty LK_AE_SVRT
tag will be created and displayed in the tag list, you could modify the attributes and select the existing entities in the dropdown selections to complete the details.
The attributes of each tag can be modified in the tag list.
For each type of attribute defined in the schema file, MedTator provides the following method to modify values:
- The ID type attribute which is used in the relation tags is displayed as an dropdown box, whose values come from the annotated entity tags. The entity tag id, concept name, and extracted text will be displayed as the option for reference.
- The CDATA type attribute is displayed as an input box, in which you could modify the text as needed.
- The value set type attribute is displayed as a dropdown box, in which you could select pre-defined values.
MedTator could show annotation hints based on the annotated tags by highlighting tokens in dotted boxes. To reduce workload, once some tokens are annotated as a tag, MedTator will search for the same tokens in the files and show the matched tokens in a dashed box as a “Hint”. The letter next to the dashed box represents the concept abbreviation, and the color represents the entity color.
For example, in the above annotation, the user annotated “Nausea” as a Symptom entity. Then MedTator will highlight other “Nausea” in the file as hints (i.e., dashed boxes) and show an “S” with same blue color. Instead of highlighting and selecting the entity again, you can click the dashed box to annotate it.
The hints are searched in ALL imported files. So even if a token is not annotated in the current file, it can still be highlighted as a hint. For example, as shown in the blue rectangle, the “headache” is not annotated in the current file (doc_02), but it is still highlighted in dashed boxes as “headache” is annotated in other imported files (doc_01 and doc_03).
If you want to hide hints, you can turn off this feature by selecting “No Hint” in the menu
MedTator supports two different display mode for showing the document in the tagging view.
As shown in the above figure, you can select different displays in the menu:
- Document mode: In this mode, the content in the selected file will be displayed in its original format.
- Sentence mode: MedTator splits the document into sentence by an open-source JavaScript library “compromise” and creates a mapping of the offsets of each sentence in the original document. The blank lines will be removed in this mode.
While in sentence mode, the annotations are still saved with their original spans. MedTator will calculate the offsets automatically when rendering the annotated tags and exporting the annotations.
MedTator supports presenting the text and tags in the document to provide better visualization, which can be used for publications or other purpose. You can click the "Visualize" button to open a dialog window to show the visualized tags and text.
This feature supports two modes of visualizing:
- Document Visualization: You can click the "Visualize" button or the "Visualize Whole Document" button to show the entire document and all of the annotated tags (both entities and relations) in the current document.
- Selection Visualization: You can highlight a sentence or a paragraph, and then click the "Visualize" button or the "Visualize Selection" button to visualize the tags within the highlighted text.
The visualization is generated by brat, and it is a SVG-based figure. You can zoom in the web browsers to take a screenshot for larger high-resolution figure.
By using the HTML5 techniques, MedTator supports saving file to local disk with the File System Access API.
As shown in the above figure, MedTator provides two ways to save annotation file:
- A: Save the current working file. By clicking the “Save” button, the current working file, which is the “doc2.txt.xml” will be saved. Moreover, by clicking the “Save as” button, MedTator will ask the user to save current file to a new copy instead of saving to current working file.
- B: Save a specific file. By clicking the yellow disk icon that is on the left of a file name, the corresponding file will be saved. In the above figure, when clicking the yellow disk icon, the “doc1.txt.xml” will be saved. The current working file “doc2.txt.xml” will NOT be saved, because the clicked yellow disk button is linked to the “doc1.txt.xml”.
MedTator support color labels for annotation files. As shown in the following figure, users can set color labels by clicking the button or the item in the dropdown list. Then the label will be displayed before the filename accordingly.
Users can also remove the label by clicking the item in the dropdown list.
MedTator provides real-time statistics on the annotated tags. Whenever a new annotation is added or existing annotation is modified, the statistics can be updated in this tab.
MedTator provides a basic summary on the annotations in the Annotation tab. For example, if we import the COVID_VAX_AE
sample, the Annotation may look like the following:
The left panel will show a basic summary, such as number of document and tags.
In addition to the basic summary, the summaries of each file are also displayed in a heatmap to help understand how many tags are annotated for each file and each concept. You can also donwload the summary and report as an Excel file.
Moreover, the statistics tab will also show the detailed annotation tags in each entity concept with the source file location. For example, as shown in the figure, the “mild” SVRT tag is annotated twice in two files. One is in the doc1.txt.xml, the other one is in the doc2.txt.xml file.
MedTator could export the annotations to different format for downstream tasks. The detailed format may be changed in future. For example, by clicking the “Tag Text” button, MedTator will create a .tsv
file which contains the concept name, annotated text, and the count:
By clicking the “Tag & Sentence” button, MedTator will create a .tsv file which contains more columns on the context information of each annotated tag.
In addition to the text, MedTator could also export the annotations to .tsv file in IOB2/BIO format for name-entity recognition, to MedTagger ruleset package, or jsonl format for spaCy ruleset. More formats will be added in future to support other downstream tasks.
The exported IOB2/BIO format files can be used for name-entity recognition training / evaluation task. For example, you can fine-tune a BERT-based model with the exported IOB2 / BIO format files. For more technical details, see HuggingFace document (https://huggingface.co/docs/transformers/v4.13.0/en/custom_datasets).
The exported MedTagger ruleset package could be used by MedTagger IE rule engine (https://github.com/OHNLP/MedTagger) .
The exported jsonl format pattern file can be used by spaCy NLP rule-based entity recognition module. It contains entity patterns defined by spaCy (https://spacy.io/usage/rule-based-matching#entityruler) and can be read to load patterns for named entity and text classification labelling.
MedTator supports IAA calculation and adjudication of two annotators in this tab.
Before start IAA calculation and adjudication, you need to load the schema file in the MedTator. Then, the documents need to be annotated by two annotators.
For example, in the COVID_VAX_AE sample dataset, we have three documents, namely doc1.txt, doc2.txt, and doc3.txt. Two annotators annotated separately and finally got two annotations on each document, A_doc1.txt.xml, A_doc2.txt.xml, and A_doc3.txt.xml are from annotator A, while B_doc1.txt.xml, B_doc2.txt.xml, and B_doc3.txt.xml are from annotator B. Then you could drag and drop the A_doc1.txt.xml, A_doc2.txt.xml, and A_doc3.txt.xml to the annotator A box, B_doc1.txt.xml, B_doc2.txt.xml, and B_doc3.txt.xml to the annotator B box. MedTator will read those files and show the number of tags and files in each box.
After import the annotations from two annotators, you could specify the overlap ratio which is the threshold to measure whether both annotators have an agreement on same text.
By default, the overlap ratio is 50%, which means both annotators would have an agreement on an annotated text in the same concept if both of them annotated it and the overlap of the annotated tag is equal or greater than 50%. For example, as shown in the above figure, annotator A annotated an AE concept “left arm soreness”, while annotator B annotate the “arm soreness at elbow”. These two annotated tags are not exactly match each other, so it is needed to calculate how much the overlap is to decide whether two annotators have an agreement on the annotation. As we can see in the figure, the spans of the overlapped part is 5-16, which is 12 characters, and the whole annotation covers spans 0-25, which is 26 characters. So the overlap ratio of these two tags is 12 / 26 = 46.15%. As it is smaller than the defined threshold 50%, there is no agreement on this annotation. The results of all tags from both annotators will be used in the calculation of the file-level, concept-level, and overall IAA score.
After setting the overlap ratio, you could click the “Calculate” button to calculate the IAA from different levels.
Then MedTator will use the given overlap ratio to calculate, the result would look like the following according to the annotations:
By default, MedTator will show the F1 score results. The IAA result contains three panels, (A) summary showing the overall F1 and the concept level F1, (B) the file level result showing the detailed results of the selected concept level grouped by file, and (C) the document level result showing the detailed tags. All panels linked with each other, when clicking on the concept or the file item, other views will be updated accordingly. As show in the above figure, when selected the “OVERALL” F1 in the summary, the files will show the results of all tags. For the doc1.txt.xml, annotator A and B achieves a F1 result of 0.44. the label “AB: 2” indicates that both annotator A and B agree on the 2 annotations, “A+: 0” indicates that there is no annotation that only agreed by annotator A, and “B+: 5” indicates that there are 5 tags that are only annotated by annotator B.
Then, the document level panel shows the detailed tags, order by the concept. In this panel, the results are displayed in three columns, the first column is the annotations from annotator A, the second column is the from annotator B, and the last column is for adjudication. In each column, the tags are displayed in dotted boxes with the attributes and context text. If a tag is agreed by both annotators, it will be displayed as a green dotted box in both first and second column. If a tag is only annotated by one annotator, it will be displayed as a red dotted box in one column.
MedTator supports different measures of IAA, including F1 score and Cohen's Kappa coefficient.
After clicking the Calculate
button, MedTator will automatically calcuate all measures.
You can specify the measure to be displayed:
For example, by selecting the Cohen's Kappa Coefficient, the results will look like the following:
You can export the IAA result as an Excel xlsx file to check details in other software:
Then, in the downloaded report file, you could file four sheets which contain the summary of overall F1 score, the summary of each file, the details of each tag annotated by two annotators, and adjudication tab.
In the 4th tab of the adjudication, the IAA calculation results of each matched tags are displayed. The annotated tags from two annotators are listed side by side for comparison and rendered in two different colors.
The adjudication column in the document level will generate a default gold standard based on the annotations from both annotator A and B. You could accept or reject a tag by clicking the “Accept” or “Reject” button displayed on the top left of a tag box. When the adjudication on one document is finished, you could set a green checked mark on this document. This check mark is just for a visual reminder, and it won’t affect the annotations.
You could download the adjudication results of all documents in a zip file by clicking the “Download All” button in the menu:
Then you could use the exported file as gold standard for downstream tasks.
You can use the Annotation Tab to further adjudicate the annotations.
As shown in the following figure, you can revise the agreement status of each tag to make it easier to be identified whether a tag is adjudicated.
In addition, you can also use the Set Labels
button in the ribbon menu to update the status of an annotation file.
For example, setting a green label means the annotation file is adjudicated; yellow means it is with concerns; red means high risk, etc.
In this tab, MedTator can convert the data files of other formats to MedTator's XML format. This function can be used for adjudication, transfer legacy projects, and error analysis. As the difference between formats, the converted results may LOSE information due to various factors. Please double check the results carefully. You can also contact us to discuss the issue you met.
MedTagger's default output results (.ann files) can be converted to MedTator format.
Before converting, please load the annotation schema (the .dtd file) in the Annotation Tab. To convert the MedTagger's results, you need to follow these steps:
- Load the text files. As MedTator format uses raw text as part of the annotation file, you need to provide the raw text files (.txt). You can drag and drop many files to the box, or a folder that contains the text files (.txt).
-
Load the output files. The MedTagger output files (.ann) will be used to generated the
tags
in the MedTator format. You can drag and drop many files to the box, or a folder that contains the MedTagger output files (.ann). The file names of the MedTagger output files MUST be the text file name with ".ann" as suffix, otherwise MedTator cannot match the file correctly. -
Click the
Convert Files
button. MedTator will read the given files, extract the data for tagging, generate the new files of MedTator format. During the process, thestart
,end
,norm
, andtext
attributes in the MedTagger .ann files will be used for converting, while other attributes will be ignored. -
Click the
Download as zip
button. Once the conversion is finished. You can download all the converted XML files in a zip file for further usage.
MedTator 2021-2023