Skip to content

automated annotation for relation extraction

Honghan Wu edited this page Sep 7, 2016 · 14 revisions

annotations with Sapienta and NCBO Annotator

The results seem to be relevant but a bit far away from meeting the requirements of helping identifying the relations. But both complement each other in a sensible way. The first step to move forward from current results: creating a set of heuristic rules for identifying a set of important sentences from the annotations. Specifically,

  • motivation sentences: the goal/aim of the research
  • methodology sentences: the technical detail settings
  • conclusion sentences: the findings/relations

The initial rule set can be manually constructed by following [Christine's guideline](Curation Process). We can further ask Christine to help construct computable rules or use machine learning approaches or combine the above two approaches.

Train NLP models for the task

Essentially, there are two types of tasks in the NLP pipeline: 1) Named Entity Recognition and 2) Sentence Classification (Sepienta Annotation). It seems inevitable to train models for this particular project (reasons as follows).

  • NER: a unique corpus, customised terms/ subsets of existing ontologies
  • Sentence Classification: a unique corpus, a different task

automated highlighting

  • terms + annotated entities based learning can only achieve 54% accuracy
  • POS tags + above two information on highlighted texts can only achieve up to 58% accuracy
  • the next step is to look into subtyping highlights manually to see whether different groups of highlights would help
    • tests: letter fluency tests, category fluency tests, Mini-Mental State, To clarify the neural basis of verbal fluency in AD, Gastrointestinal (GI) dysfunction is one of the most common non-motor symptom of Parkinson's disease, Gastrointestinal dysfunction questionnaire, Mattis Dementia Rating Scale, Clinical Dementia Rating scale, Concept formation, abstract link between dissimilar objects or thoughts by extracting their meaningful common characteristics, To study these issues, we designed a new experimental paradigm based on similarities, called the Verbal Concept Formation Task, aimed at assessing verbal concept, 80 pairs of words (items) for, We used voxel-based morphometry (VBM) to compare grey matter volumes in our groups, xical fluency (mental flexibility), The working-memory capacity was measured by the DB subtest of the digit span test in the Chinese Revised Wechsler Adult Intelligence Scale (WAIS-RC), Amnestic Mild Cognitive Impairment (aMCI), (fMRI) consisted of two different sets: theme identification as an experimental task and emotion selection as a control task
    • subtypes of mental illness: The diagnosis of other types of dementia, Parkinson’s disease with dementia, progressive supra-nuclear palsy, normal pressure hydrocephalus
    • experiment settings: 20 patients, the UK Parkinson's Disease Society Brain Bank Clinical diagnostic criteria, 15 elderly community!dwelling individuals of comparable age and educational backgrou, We also correlated the scores obtained with grey matter volume for all participants, 8 subjects with probable behavioural variant FTD, 21 subjects with PSPr, 14 subjects with isolated or predominant hippocampal memory dysfunction, 18 healthy controls, A total of 324 right-handed healthy young adults participated in this study, 8 patients, We present data from a group of aMCI patients, group of 10 healthy volunteers (6 male), age and IQ matched to the patients
    • useful information/conclusion: The most common GI symptoms reported prior to the surgery were: constipation 95% it occurred in all patients, the present study, bilateral STN-DBS significantly improved gastric motility in PD, smaller in the AD patients, Statistical analysis revealed significant differences in the Global Performance Score between patients with behavioural variant FTD and non-frontal participants, and between patients with behavioural variant FTD and either group of controls, Global Performance Score was positively correlated with grey matter volume in the left and right angular gyri, the head of the left caudate nucleus, the right dorsal anterior cingulate, the left middle frontal gyrus, the right frontal lobe, and the right and left superior temporal gyri, Statistical analysis revealed significant differences in the Global Performance Score between frontal patients and non-frontal participants taken as a whole, The FAB performance was positively associated with rCMglc in the right middle frontal gyri, with the first eigenvariate values of the left middle temporal gyri (BA 21) in the more severe AD but not in the less severe AD, the DB scores were negatively correlated with the rsFC between ROI 2 (the right posterior STG) and the left insula, we additionally observed a direct correlation with metabolism in left IFG, pars triangularis, right MTG, bilateral ITG, and right angular gyrus, were linear relationships between deficits of, Deficits in mental flexibility as assessed with the TMT-B are often found in patients with frontal-lobe damage, develop complications in the form of motor fluctuations", "motor fluctuations

language patterns of cardinal nouns, named entities and subject/predicate

  • cardinal nouns: are nouns that are preceded with a component tagged as CD, e.g., 10 elderly patients, where patients will be picked up.
  • named entities: NLTK named entity model is used to identify NEs because many of the psychometric tests are not identified using NCBO annotators.
  • subject/predicate pattern: stanford parser is used to get the parse tree and subject and predicate are extracted using some heuristic rules

The above three types of patterns are extracted from highlighted sentences only and the results are saved as (pattern, frequency) pairs, which will be used to score sentences in question.