Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offset problems with "empty" TextMarkup elements #107

Open
kosloot opened this issue Mar 20, 2023 · 0 comments
Open

Offset problems with "empty" TextMarkup elements #107

kosloot opened this issue Mar 20, 2023 · 0 comments
Assignees

Comments

@kosloot
Copy link
Collaborator

kosloot commented Mar 20, 2023

given this, a bit weird, FoLiA file

<?xml version="1.0" encoding="UTF-8"?>
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" xml:id="bugxx" generator="libfolia-v1.11" version="2.5">
  <metadata type="native">
    <annotations>
      <text-annotation set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl"/>
      <division-annotation/>
      <paragraph-annotation/>
      <sentence-annotation/>
      <hyphenation-annotation/>
      <string-annotation/>
    </annotations>
  </metadata>
  <text xml:id="bug">
    <div xml:id="bug.div">
      <p xml:id="bug.div.p">
        <s xml:id="bug.div.p.s.2">
	  <t>appel<t-hbr>-</t-hbr>taart</t>
          <str xml:id="bug.div.p.s.2.str.1">
            <t offset="0">appel</t>
	  </str>
          <str xml:id="bug.div.p.s.2.str.2">
            <t offset="5"><t-hbr>-</t-hbr></t>
	  </str>
          <str xml:id="bug.div.p.s.2.str.3">
            <t offset="5">taart</t>
	  </str>
        </s>
      </p>
    </div>
  </text>
</FoLiA>

This is accepted by folialint (latest GIT version), But rejected byfoliavalidator
The latter states:

TEXT VALIDATION ERROR: Text for String, ID bug.div.p.s.2.str.2, textclass current, has incorrect offset 5 or invalid reference: Reference (ID bug.div.p.s.2, class=current) found but no text match at specified offset (5)! Expected '', got 't', full text: 'appeltaart"
(also checked against older rules prior to FoLiA v2.4.1)
VALIDATION ERROR on full parse by library (stage 2/3), in tests/bug52-3.xml
UnresolvableTextContent: Reference (ID bug.div.p.s.2, class=current) found but no text match at specified offset (5)! Expected '', got 't', full text: 'appeltaart"

The problem is with the offset of the <t-hbr> element in the second <str>
IMHO this should be 5, as folialint accepts. And, while it has a size off 0, the next <str> ALSO has that same offset, 5.
This is a BUG

Both programs don't really handle this very well though. As can be shown by replacing the offset by a an out-of-band- value,
like -1, 10 or 2894234
In that case both programs will validate the FoLiA

SOLUTION:
I suppose that FoliA elements with the IMPLICITSPACE property should be defined to add 0 to the offset,
AND: when an offset attribute is added, it should have a meaningful, correct value.
Which might prove to be difficult, as the offset should be equal to that of the NEXT non-TextMarkup element, and
there is no obligation to have an offset attribute there. (or even that that element exists)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants