Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non Linear Hyphens #41

Open
jpmoreux opened this issue Aug 30, 2016 · 1 comment
Open

Non Linear Hyphens #41

jpmoreux opened this issue Aug 30, 2016 · 1 comment
Assignees

Comments

@jpmoreux
Copy link
Member

jpmoreux commented Aug 30, 2016

Describing hyphen running on 2 pages or between main text flow and footnotes block is undeterministic.

Example:
left page: one hyphen in last footnote: "Victor-"
right page: one hyphen in main text flow ("Font-") and 2nd part of page 194 hyphen ("Hugo")

In this example, ALTO markup could let one think that String "Font-" is the first part of the hyphen (HypPart1), and String "Hugo" the second part (HypPart2). In such a case, a validation tool on hyphens consistency will fail at doing its job.

These ALTO files were produced during an EPUB+ALTO digitization program. EPUB format needs to identify footnotes and consequently, export of hyphens in ALTO files are logically correct but "unclear" in the ALTO "context".

...
<String ID="PAG_00000213_ST000193" CONTENT="Fon-" HEIGHT="44" HPOS="1335" STYLEREFS="TXT_14" SUBS_CONTENT="Fontanes" SUBS_TYPE="HypPart1" VPOS="2214" WC="1" WIDTH="100"/>
<HYP CONTENT="-" HPOS="1435" VPOS="2214" WIDTH="26"/>
</TextLine>
</TextBlock>
<TextBlock ID="PAG_00000213_TB000010" HEIGHT="156" HPOS="224" STYLEREFS="TXT_77" VPOS="2394" WIDTH="1236" language="FR">
<TextLine ID="PAG_00000213_TL000025" BASELINE="2431" HEIGHT="48" HPOS="224" VPOS="2394" WIDTH="1235">
<String ID="PAG_00000213_ST000194" CONTENT="Hugo" HEIGHT="45" HPOS="224" STYLEREFS="TXT_7" SUBS_CONTENT="Victor-Hugo" SUBS_TYPE="HypPart2" VPOS="2394" WC="0.983" WIDTH="110"/>

212

213

@artunit
Copy link
Member

artunit commented Jul 20, 2017

The SUBS_CONTENT attribute does distinguish between "Fontanes" and "Victor-Hugo", would checking SUBS_CONTENT be an option for a validation tool?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants