You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The way the NER tags are currently exported in ALTO creates considerable overhead in terms of file size, parsing etc. as every occurrence of a named entity will result in a separate NamedEntityTag and TAGREF.
Consider, for example, the following text in ALTO:
This is definitely on our to-do list for 1st half of 2018, though I cannot make any promises yet as to when it will be implemented exactly.
PR welcome ;-)
The way the NER tags are currently exported in ALTO creates considerable overhead in terms of file size, parsing etc. as every occurrence of a named entity will result in a separate NamedEntityTag and TAGREF.
Consider, for example, the following text in ALTO:
Tagging this line will result in two TAGREFS and two NamedEntityTag elements for the named entity "Berlin" being added to the ALTO like this:
It would be preferable not to repeat NamedEntityTag for identical references and instead write this like:
The text was updated successfully, but these errors were encountered: