Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow a String to contain alternative Glyph segmentation hypotheses #57

Open
urieli opened this issue Jan 30, 2019 · 25 comments
Open

Allow a String to contain alternative Glyph segmentation hypotheses #57

urieli opened this issue Jan 30, 2019 · 25 comments

Comments

@urieli
Copy link

urieli commented Jan 30, 2019

One of the most inherently difficult OCR tasks is segmenting a String into Glyphs. Because of ink or wearing problems, two glyphs can be merged on the page without any separating white space, or a single glyph can be split by white space.

As a developer of OCR software, I would like to be able to output alternative splits for a single String, with confidence attached to each split.
Alto currently provides no way of outputting these alternatives. The existing ALTERNATIVEType and VariantType are not sufficient, because they only allow to express alternative content, not splits.

One way to attain this would be:

<xsd:complexType name="StringType" mixed="false">
  <xsd:sequence minOccurs="0">
    ...
    <xsd:element name="StringVariant" type="StringType" minOccurs="0" maxOccurs="unbounded"/>
  </xsd:sequence>
  ...
</xsd:complexType>

This however would make it possible to define a different HPOS, VPOS, HEIGHT and WIDTH for the String, which is not desired.

Another approach would be:

<xsd:complexType name="StringType" mixed="false">
  <xsd:sequence minOccurs="0">
    ...
    <xsd:element name="StringVariant" type="StringVariantType" minOccurs="0" maxOccurs="unbounded"/>
  </xsd:sequence>
  ...
</xsd:complexType>

<xsd:complexType name="StringVariantType" mixed="false">
  <xsd:sequence minOccurs="0">
    <xsd:element name="Glyph" type="GlyphType" minOccurs="0" maxOccurs="unbounded"/>
  </xsd:sequence>
  <xsd:attribute name="WC" type="WCType" use="optional"/>
</xsd:complexType>

Yet a third way would be to extend the existing ALTERNATIVEType to include confidence and glyphs:

<xsd:complexType name="ALTERNATIVEType" mixed="false">
  ...
  <xsd:sequence minOccurs="0">
    <xsd:element name="Glyph" type="GlyphType" minOccurs="0" maxOccurs="unbounded"/>
  </xsd:sequence>
  <xsd:attribute name="WC" type="WCType" use="optional"/>
</xsd:complexType>

However, this implies a redefinition of ALTERNATIVEType, which is currently expressed as a variant of writing by new typing / spelling rules.

@artunit
Copy link
Member

artunit commented Feb 5, 2019

Is the intent that the glyphs, or partial glyphs, in this case are still able to be represented in unicode? Font definitions sometimes use a unicode replacement number for symbols that can't be represented, could something similar be done for the CONTENT attribute.

@urieli
Copy link
Author

urieli commented Feb 6, 2019

No, the intention is not at all to represent partial glyphs. It is to represent different guesses at how a String should be split into entire glyphs.

For example, if we have a String like the following one:
ever0

The letters e and r are attached by too much ink.
When the OCR program analyses this String, it might have two different guesses on how to split it:
ever1
ever2

The first guess might be analysed correctly as four glyphs: "ever"
The second guess might be analysed incorrectly as three glyphs: "evw"

The idea is to allow the OCR program to add both guesses to the Alto file, as follows:

<String VPOS="3977" HPOS="2795" HEIGHT="118" WIDTH="157" WC="0.87" CONTENT="ever">
  <Glyph VPOS="3977" HPOS="2795" HEIGHT="118" WIDTH="46" GC="0.94" CONTENT="e"/>
  <Glyph VPOS="3978" HPOS="2841" HEIGHT="118" WIDTH="47" GC="0.85" CONTENT="v"/>
  <Glyph VPOS="3978" HPOS="2888" HEIGHT="118" WIDTH="40" GC="0.92" CONTENT="e"/>
  <Glyph VPOS="3977" HPOS="2928" HEIGHT="118" WIDTH="24" GC="0.78" CONTENT="r"/>
  <StringVariant WC="0.68" CONTENT="evw">
    <Glyph VPOS="3977" HPOS="2795" HEIGHT="118" WIDTH="46" GC="0.94" CONTENT="e"/>
    <Glyph VPOS="3978" HPOS="2841" HEIGHT="118" WIDTH="47" GC="0.85" CONTENT="v"/>
    <Glyph VPOS="3978" HPOS="2888" HEIGHT="118" WIDTH="64" GC="0.24" CONTENT="w"/>
  </StringVariant
</String>

These allows downstream processing systems to handle both suggestions.

I'm suggesting either to create a new StringVariantType, or to update the existing ALTERNATIVEType.

My suggestion for the StringVariantType should also have a CONTENT attribute, so it becomes:

<xsd:complexType name="StringVariantType" mixed="false">
  <xsd:sequence minOccurs="0">
    <xsd:element name="Glyph" type="GlyphType" minOccurs="0" maxOccurs="unbounded"/>
  </xsd:sequence>
  <xsd:attribute name="CONTENT" type="CONTENTType" use="required"/>
  <xsd:attribute name="WC" type="WCType" use="optional"/>
</xsd:complexType>

@artunit
Copy link
Member

artunit commented Feb 7, 2019

Thanks for the clarification. There has been interest in supporting multiple hypotheses in ALTO so this might be a path towards that goal.

@urieli urieli changed the title Allow a String to contain alternative Glyph segmentations Allow a String to contain alternative Glyph segmentation hypotheses Feb 7, 2019
@artunit
Copy link
Member

artunit commented May 30, 2019

One approach that has come that has come up in ALTO Board discussions, and was discussed at length at a recent face-to-face meeting, is to encode multiple hypotheses within a standard and interoperable lattice structure. There seems to be reluctance to add intricate XML to support a lattice implementation and some feeling that the ALTO file should reference an external source for the lattice structure (see the very useful and relevant OCR-D issue on word segmentation ambiguity).

For simplicity, here is an attempt to work through this example with a single, optional attribute called lattice. There are two functionally equivalent, and somewhat shorthand, formats for lattices that might fit into an attribute. One is called Python Lattice Format (PLT) and the other is JSON Lattice Format (JLT). This example might look something like this in JLT:

[
  [["e", {"conf": 0.94}, 1]], 
  [["v", {"conf": 0.85}, 1]], 
  [["e", {"conf": 0.92}, 1], ["w", {"conf": 0.24}, 2]], 
  [["r", {"conf": 0.78}, 1]]
]

With apologies to more heavy duty lattice formats (see, for example, Lattices in Kaldi), this syntax can be used with a tool like cicada to analyse and compare lattices, and, in this case, to produce a DOT output file, which can then be used with gravizo to get a graphic rendering:

Lattice example

This does not, in any way, eliminate for the need for a construct like StringVariantType, which carries the WC value and allows easy access to variant forms of the word through XML proper, which I think is an important provision. But this type of approach might open the door to handing off lattice formats to lattice friendly software.

<String VPOS='3977' HPOS='2795' HEIGHT='118' WIDTH='157' WC='0.87' CONTENT='ever' LATTICE='[[["e",{"conf": 0.94},1]], [["v",{"conf": 0.85},1]],[["e",{"conf": 0.92},1], ["w",{"conf": 0.24}, 2]],[["r",{"conf": 0.78},1]]]'>
<!-- see above, I reversed the quoting for simplicity (which seems to be valid in XML, see https://stackoverflow.com/questions/6800467/quotes-in-xml-single-or-double)  -->
</String>

An attribute might not be the way to go on this, and it would make sense to allow the attribute to reference an external source even if a shorthand syntax was viable, but I like the idea that the same attribute could also conceivably be added to the SP element to support multiple spacing hypotheses, which has come up in the past and fits into the segmentation challenges of OCR.

@urieli
Copy link
Author

urieli commented Jun 3, 2019

Thanks @artunit for this excellent response.

The lattice indeed contains some of the information required to indicate alternative glyph hypotheses, and could be used to express alternatives at other levels as well (spacing).

However, I see several disadvantages to using lattices:

  • If encoded as an attribute in PLT or JLT, is no way to enforce their content via XSD.
  • We would need a formal description of how to interpret the arc labels. Should arc labels represent the textual content of the node being referenced (glyphs or strings), or are they the ID of the node being referenced, or something else?
  • The arcs could not easily be made to indicate other critical information (HPOS and WIDTH for example), without specifying a complex and artificial arc label naming convention. Referencing IDs would solve this, but for now, we don't have anywhere to store the Glyph alternatives to which the IDs make reference.
  • If we allow both lattices and a type such as StringVariantType, we are opening the door to inconsistency between the two.

Of the advantages, the ones I retain are:

  • Less verbose than an XML tree structure, because alternatives are only used when required - if all hypotheses agree on 90% of the glyph segmentation, there is no need to repeat this 90% n times. However, XML is verbose by nature. We could of course encode the lattice structure in XML (just as it is done in PLT or JLT), but there seemed to be a reaction against this.
  • Referencing external files: not a criterion for me personally, because I feel strongly the full OCR analysis should always be encodable into a single XML file
  • Easy conversion to DOT output: however, it is not at all complex to take an XML structure of alternatives such as StringVariantType and write a script that generates lattices or DOT output directly.

So, I remain far from convinced that lattices are the way to go. If they are added, I would see them as an addition, not a replacement to a more native XML structure for alternatives such as the StringVariantType above.

@artunit
Copy link
Member

artunit commented Jun 3, 2019

And thanks @urieli for the equally excellent and thoughtful "response to the response". I agree totally with the continuing need for a construct like StringVariantType and to support complete encoding of OCR within a single XML file as much as possible. I guess the big question is how extensive the lattice structure should be. There's an interesting discussion in this paper about character-lattices but I am vague on how complex the modeling requirements are. From my very limited experience, it is easy to compare multiple JLT lattices in cicada and I could see a workflow that pulls lattices from ALTO files to look for commonalities, but I am hopeful we can surface what would be needed to accommodate real-world use cases.

@bertsky
Copy link
Contributor

bertsky commented Jul 10, 2019

Please forgive my intrusion, but I think I can help with some outsider's perspective. Let me go back a little:

There seems to be reluctance to add intricate XML to support a lattice implementation and some feeling that the ALTO file should reference an external source for the lattice structure

It seems to me the task here is not XML support for a lattice implementation (i.e. graph processing or FST library, which generally could not accomodate anything beyond strings and confidences/weights – no coordinates, no styles etc), but rather the opposite, a lattice extension for ALTO XML.

This can be done with very little extra syntax (basically, representing both nodes and edges as elements, with xs:IDREF attributes for connections), and without introducing extra ambiguity, as I have proposed for PAGE XML in the above mentioned issue. It is always easy to derive any graph or FST representation from that, as @urieli recently pointed out.

I do not see any reason or benefit in referencing an external source for that, or using a binary representation via an xs:string attribute or CDATA element – for the same reasons @urieli gave. It can be done in XML itself, without any need for a (flat) representation like StringVariantType – the question is only how to do it right.

If there is reluctance (as there is for PAGE), then it should be substantiated and discussed here IMHO.

but I like the idea that the same attribute could also conceivably be added to the SP element to support multiple spacing hypotheses

I fail to see how. Each TextLine contains a sequence of Strings each possibly followed by SP. An alternative word segmentation constitutes a different sequence – that's one level higher. And we do not want (N) sequences here, but rather (more efficiently) a lattice. The same goes for a construct like StringVariantType BTW.

So here is my proposal: In the xsd:sequence maxOccurs="unbounded" which now holds String and optionally SP, allow an alternative representation as a new lattice type which includes whitespace:

<!-- still a sequence: -->
<xsd:choice maxOccurs="unbounded">
  <!-- old representation: -->
  <xsd:sequence>
    <xsd:element name="String" type="StringType"/>
    <xsd:element name="SP" type="SPType" minOccurs="0"/>
  </xsd:sequence>
  <!-- new representation: -->
  <xsd:element name="Lattice" type="LatticeType"/>
</xsd:choice>

with

<xsd:complexType name="LatticeType">
  <xsd:sequence>
    <!-- re-use GlyphType for nodes (but allow it to be used for white space as well): -->
    <xsd:element name="Glyph" type="GlyphType" maxOccurs="unbounded"/>
    <!-- introduce a simple edge type: -->
    <xsd:element name="Span" type="SpanType" minOccurs="0" maxOccurs="unbounded"/>
  </xsd:sequence>
  <xsd:attribute name="ID" type="xs:ID" use="optional"/>
  <!-- for convenience, summarize initial nodes (yes, plural here): -->
  <xsd:attribute name="begin" type="xs:IDREFS" use="required"/>
  <!-- for convenience, summarize terminal nodes (yes, plural here): -->
  <xsd:attribute name="end" type="xs:IDREFS" use="required"/>
</xsd:complexType>
<xsd:complexType name="SpanType">
  <!-- ID of incoming Glyph: -->
  <xsd:attribute name="begin" type="xs:IDREF" use="required"/>
  <!-- ID of outgoing Glyph: -->
  <xsd:attribute name="end" type="xs:IDREF" use="required"/>
</xsd:complexType>

So in contrast to my PAGE proposal, where text (and whitespace) segments are edges and nodes are merely positions, here text (and whitespace) segments are nodes and edges are merely connectors. The reason for the difference is that in PAGE we originally have a strict hierarchy with implicit whitespace.

@artunit
Copy link
Member

artunit commented Jul 10, 2019

Thanks @bertsky, no need to apologize for weighing in on this. I totally agree that concerns need to substantiated and discussed in an open forum. The use case we were presented with at the last Board meeting (two days ago) was a lattice structure where nodes correspond to locations in the line, and edges correspond to character hypotheses which, in turn, are annotated with an optical character match score and a language model score. I am under no illusions that I have very much lattice expertise, but I am hoping that @acpopat can provide some more details on the use case, especially if I have mangled my description of it, and how this might fit. My only experience with lattices is very simple work with cicada so I welcome more heavy duty perspectives.

@bertsky
Copy link
Contributor

bertsky commented Jul 10, 2019

That's precisely my use case, too! (I am doing post-correction.) The OCR and LM scores can be added/multiplied with each other (with a given weight) and annotated under WC or GC (or Conf in PAGE). In principle, it does not matter whether nodes or edges are the entities of interest – this is a duality, and there are algorithms to convert either representation into the other.

But maybe I should put in a full example (not just a schema sketch), like you did?

@artunit
Copy link
Member

artunit commented Jul 10, 2019

Thanks @bertsky - a full example would be awesome!

@bertsky
Copy link
Contributor

bertsky commented Jul 10, 2019

Sorry @artunit, I am afraid I ran into my own trap here. It's always better to start off with an example! I had to rewrite parts of the above: We do not want to re-use SPType or StringType for nodes, because that would only allow us to capture word segmentation ambiguity, not glyph segmentation ambiguity. But we want both at the same time (and we want to describe GCfor whitespace as well), so we must put explicit white space on the same level as glyphs and their variants and thus cannot reuse SP.

Okay, here's how schema instances could look like. Starting with your above example, this could become the following:

<TextLine ID='...' HPOS='...' VPOS='...' WIDTH='...' HEIGHT='...'>
  <String ID='s1' VPOS='...' HPOS='...' HEIGHT='...' WIDTH='...' WC='0.99' CONTENT='Did'/>
  <SP ID='s2' VPOS='...' HPOS='...' HEIGHT='...' WIDTH='...'/>
  <String ID='s3' VPOS='...' HPOS='...' HEIGHT='...' WIDTH='...' WC='0.95' CONTENT='you'/>
  <SP ID='s4' VPOS='...' HPOS='...' HEIGHT='...' WIDTH='...'/>
  <Lattice ID='s5' begin="g1" end="g4,g5">
    <Glyph ID='g1' VPOS='3977' HPOS='2795' HEIGHT='118' WIDTH='40' GC='0.94' CONTENT='e'/>
    <Glyph ID='g2' VPOS='3977' HPOS='2835' HEIGHT='118' WIDTH='40' WC='0.85' CONTENT='v'/>
    <Glyph ID='g3' VPOS='3977' HPOS='2875' HEIGHT='118' WIDTH='40' WC='0.92' CONTENT='e'/>
    <Glyph ID='g4' VPOS='3977' HPOS='2915' HEIGHT='118' WIDTH='37' WC='0.78' CONTENT='r'/>
    <Glyph ID='g5' VPOS='3977' HPOS='2875' HEIGHT='118' WIDTH='77' WC='0.24' CONTENT='w'/>
    <Span begin='g1' end='g2'/>
    <Span begin='g2' end='g3'/>
    <Span begin='g3' end='g4'/>
    <Span begin='g2' end='g5'/>
  </Lattice>
  ...
</TextLine>

Now, let's do an example which also includes word segmentation ambiguity (the same I did for PAGE):

<TextLine ID='...' HPOS='...' VPOS='...' WIDTH='...' HEIGHT='...'>
  <Lattice ID='s1' begin="g1,g2" end="g10">
    <Glyph ID='g1' VPOS='3977' HPOS='2795' HEIGHT='118' WIDTH='40' GC='0.9' CONTENT='m'/>
    <Glyph ID='g2' VPOS='3977' HPOS='2795' HEIGHT='118' WIDTH='30' GC='0.75' CONTENT='n'>
      <Variant CONTENT='r' VC='0.65'/>
    </Glyph>
    <Glyph ID='g3' VPOS='3977' HPOS='2825' HEIGHT='118' WIDTH='10' GC='0.9' CONTENT='i'>
      <Variant CONTENT='r' VC='0.6'/>
    </Glyph>
    <Glyph ID='g4' VPOS='3977' HPOS='2835' HEIGHT='118' WIDTH='30' GC='0.9' CONTENT='y'>
      <Variant CONTENT='v' VC='0.8'/>
    </Glyph>
    <!-- whitespace glyph: -->
    <Glyph ID='g5' VPOS='3977' HPOS='2865' HEIGHT='118' WIDTH='15' GC='0.9' CONTENT=' '/>
    <!-- whitespace plus comma glyph: -->
    <Glyph ID='g6' VPOS='3977' HPOS='2865' HEIGHT='118' WIDTH='20' GC='0.8' CONTENT=' ,'/>
    <Glyph ID='g7' VPOS='3977' HPOS='2880' HEIGHT='118' WIDTH='30' GC='0.9' CONTENT='p'/>
    <Glyph ID='g8' VPOS='3977' HPOS='2885' HEIGHT='118' WIDTH='25' GC='0.9' CONTENT='o'/>
    <Glyph ID='g9' VPOS='3977' HPOS='2910' HEIGHT='118' WIDTH='40' GC='0.9' CONTENT='a'>
      <Variant CONTENT='e' VC='0.7'/>
    </Glyph>
    <Glyph ID='g10' VPOS='3977' HPOS='2950' HEIGHT='118' WIDTH='35' GC='0.9' CONTENT='y'/>
    <Span begin='g1' end='g4'/>
    <Span begin='g2' end='g3'/>
    <Span begin='g3' end='g4'/>
    <Span begin='g4' end='g5'/>
    <Span begin='g4' end='g6'/>
    <Span begin='g5' end='g7'/>
    <Span begin='g6' end='g8'/>
    <Span begin='g7' end='g9'/>
    <Span begin='g8' end='g9'/>
    <Span begin='g9' end='g10'/>
  </Lattice>
</TextLine>

dot graph visualisation

@artunit
Copy link
Member

artunit commented Jul 11, 2019

Excellent work @bertsky, an example makes all the difference in the world.

@bertsky
Copy link
Contributor

bertsky commented Jul 11, 2019

Thx @artunit – I just hope we will get a vivid discussion this time...

@artunit
Copy link
Member

artunit commented Aug 7, 2019

Hi @bertsky, this is some feedback from Reeve Ingle at Google, who has done way more heavy lifting with lattice models than I have.

While I'm not very familiar with the ALTO standard, the overall representation seems reasonable to me. One additional consideration could be to support a set of costs per node, in addition to (or in place of) the confidence score. Two useful costs would be an optical model cost and a language model cost, and ideally these costs would be additive (e.g., negative log-probabilities). One of the common uses of the lattice, which was also mentioned in the github thread, could be to re-score an OCR result using a different language model. The rescoring could be done by finding the minimum-cost path through the lattice using a weighted combination of the optical model cost and language model cost. For that, it would be great if the relative contributions of the optical model and the language model were separated, and since a common operation will be to find the minimum-cost path (or minimum-cost paths, in the case of N-best results), it would be great if that cost were represented in an additive domain.

Sorry that it's been a little quiet in here, the summer doesn't seem to be a busy time for github activity.

@bertsky
Copy link
Contributor

bertsky commented Aug 12, 2019

Hi @artunit, thanks for relaying!

One additional consideration could be to support a set of costs per node, in addition to (or in place of) the confidence score. Two useful costs would be an optical model cost and a language model cost, and ideally these costs would be additive (e.g., negative log-probabilities). One of the common uses of the lattice, which was also mentioned in the github thread, could be to re-score an OCR result using a different language model. The rescoring could be done by finding the minimum-cost path through the lattice using a weighted combination of the optical model cost and language model cost. For that, it would be great if the relative contributions of the optical model and the language model were separated, and since a common operation will be to find the minimum-cost path (or minimum-cost paths, in the case of N-best results), it would be great if that cost were represented in an additive domain.

IMHO having only one kind of confidence score in the representation will always be enough, because you cannot separate the weight combination and the LM score calculation anyway: language models in general need a history of more than one lattice element as input (usually a sequence of varying length of previous tokens for n-gram models, or a fixed window of characters/words for RNN models), therefore you cannot represent an LM score at some lattice element independent of its partial path. Moreover, it is generally infeasible to enumerate all possible paths as LM input, so rescoring needs to prune away some partial paths at each node, which again cannot be done separately from weight combination (without loosing information/accuracy).

@artunit
Copy link
Member

artunit commented Sep 13, 2019

Hi @bertsky - sorry for the radio silence. The summer is over, and there is a Board meeting at the end of the month, I am hoping we can get this thread moving again.

@bertsky
Copy link
Contributor

bertsky commented Sep 14, 2019

Okay – let me know if you need any more input from my side.

@artunit
Copy link
Member

artunit commented Sep 28, 2019

@urieli, @bertsky - At the 2019-09-27 ALTO Board meeting, there was general agreement that encoding OCR uncertainty and alternative hypotheses via a lattice, or similar model, would be a good topic for the ALTO Fall F2F gathering. The meeting will be held right before the 2019 IIIF Working meeting. The IIIF event runs November 4-7 at the University of Michigan campus in Ann Arbor, Michigan, and the F2F meeting will be held Sunday, Nov. 3 from 3 to 6 pm at the aadlfreespace room of the downtown branch of the Ann Arbor District Public Library. This location is about a 10 minute walk from the U. of Michigan campus. You are welcome to attend if this might be viable for you, or I can try to set up a virtual option. Please let me know if you are interested and we can figure out the next steps.

@bertsky
Copy link
Contributor

bertsky commented Sep 29, 2019

@artunit Thanks for the offer! Interested yes, that's if there will be a virtual option. But I am not sure whether I can make it at that time: Living in Germany (UTC+1), which is 6h ahead of Michigan (EST / UTC-5), this will be from 9pm to midnight on a Sunday for me. Perhaps if there was a slot dedicated to lattice extension in the first hour?

@urieli
Copy link
Author

urieli commented Sep 30, 2019

Hi @artunit, I'm interested as well in the virtual option, but with the same constraints, as I'm in France (CET). I might be able to make it for 9pm, but certainly not much later. I haven't yet had a chance to read through @bertsky 's replies fully, will try to find time in the next few days.

@artunit
Copy link
Member

artunit commented Sep 30, 2019

@urieli, @bertsky - I have set up a zoom meetng, if you use this link on Nov. 3, I will have a boom microphone set up and hopefully the technology will fall into place. I tried this for a meeting in Brussels, and the virtual pieces had a few glitches, but I can take some better equipment to Michigan than was possible for Belgium. Thank you both for considering this, those time zone differences can play havoc with ALTO events.

@artunit
Copy link
Member

artunit commented Nov 1, 2019

@urieli, @bertsky - Just a reminder about the upcoming meeting on Sunday, Nov. 3 from 3 to 6 pm EST - available via zoom with this link.

@artunit
Copy link
Member

artunit commented Nov 5, 2019

As per the 2019-11-03 meeting, the lattice discussion will be moved to issue 63 - ALTO support for encoding OCR uncertainty. StringVariantType will be brought forward to the Board for consideration. Thanks to @urieli and @bertsky for all of the work on these important issues for ALTO's evolution.

@artunit
Copy link
Member

artunit commented Feb 9, 2020

Circling back to the original proposal from @urieli, now that the lattice proposal is part of a separate issue, it is worth restating that this is to represent different guesses at how a String should be split into entire glyphs. The StringVariantType should also have a CONTENT attribute, so it becomes:

<xsd:complexType name="StringVariantType" mixed="false">
  <xsd:sequence minOccurs="0">
    <xsd:element name="Glyph" type="GlyphType" minOccurs="0" maxOccurs="unbounded"/>
  </xsd:sequence>
  <xsd:attribute name="CONTENT" type="CONTENTType" use="required"/>
  <xsd:attribute name="WC" type="WCType" use="optional"/>
</xsd:complexType>

See the example in above comment for more detail.

@urieli
Copy link
Author

urieli commented Feb 10, 2020

@artunit Thanks for the info, and sorry I couldn't manage to attend any of the virtual meetings. I would love to at solution allowing the encoding of multiple hypotheses with their respective confidence in a near-future version of Alto. The solution outlined here suits my immediate needs, but an XML-encoded lattice with confidence would do so as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants