This example highlights named entity recognition of drugs and adverse drug events (ADE). Occurrences of drugs and ADEs are identified and labelled. ##Model Set Up
The sentence_classification\resources
folder contains model weights from a fine-tuned model, saved as model-fine-tuned.hdf5
(not tracked by GitHub). This file only contains the weights, not the model architecture. Therefore, the model
class, is initialized by loading the base model from the Hugging Face model hub. Next, the models' weights are loaded from the sentence_classification\resources
. In the last step, the model is wrapped in a transformers pipeline
, which conveniently wraps pre-processing, prediction and post-processing in a single object. embeddings of the model.
The pipeline
object pre-processes the model inputs according to the model' requirements.
The predict
method is a simple wrapper for the Hugging Face pipeline
object.
While the pipeline
object converts to numerical outputs in a readable format, the annotations are on a token-level in the IOB format.
For example, the drug aspirin would be annotated as to separate entities:
annotation_1 = {
"begin": 0,
'end': 5,
"value": aspir,
'label': 'B-drug',
'confidence': 0.98,
'componentId': 'ADE_Sentence_SequenceLabeling-service:0.1.0'
}
annotation_2 = {
"begin": 5,
'end': 7,
"value": in,
'label': 'I-ade-sequence-labeling',
'confidence': 0.96,
'componentId': 'ADE_Sentence_SequenceLabeling-service:0.1.0'
}
The post-processing
model iterates over all annotations, and merges annotations belonging to the same entities.
The Annotator
iterates over each sentence and creates drug and ADE annotations in the following format:
annotation = {
"begin": 0,
'end': 7,
"value": "aspirin",
'label': "drug',
'confidence': 0.97,
'componentId': 'ADE_Sentence_SequenceLabeling-service:0.1.0'
}