Skip to content

Language Patterns

Honghan Wu edited this page Oct 5, 2016 · 1 revision

The following is concluded from Anika's manual checking on the general sentences

  • we don't pick up goal, method descriptions, e.g. "to determine sth, we calculated ..."
  • goals expressed with "to study", "To study the ... ", "To investigate ... ", "To determine", "We were interested in ... ", "The presented study concerned/examined ... "
  • noun phrases ending in "test" or "score" or "performance" or "examination" --> should be named entities
  • findings indicated with "were observed", "significantly decreased/increased/different", "differed significantly", "this indicates", "indicating", "correlated with", "(patients) showed", "patients experienced", "was/were seen" in conjunction with brain regions, "patients/subjects (...) reported", "was/were, as expected, ... ", "was/were abnormally", "correlation between", "were greater in ... than in ... ", "our results show", "compared with", "significant (...) correlations ...", "our main prediction", "was associated with", "clusters were identified in", "these findings suggest", "was related to"
  • some sentences seem to have heading merged to them, which then can cause problems with parsing and probably exceeds threshold of 500 characters
  • methods indicated by "were instructed to", "were assessed", "were asked", "was/were investigated with"
  • we seem to miss a lot of named entities that are brain regions
  • methods information for patients/study subjects/cohorts goes amiss "patients with", "patients were identified with", "??? (n= *)", "control subjects", "patients with", "were presented", "were required", concatenated named entities e.g. tests, "the task was ...", "we assume that", "that requires patients", "were classified as ... (according to) .... (criteria)"
  • stanford parser seems to have a problem with recognising "CD" tags when numbers are given as words, i.e. "16 individuals" could be recognised as cardinal noun, in "sixteen individuals" sixteen is annotated with "NN" instead of "CD"