- Welcome [All]
- Find and tell a non-offensive, maybe self-deprecating joke before the meeting begins and/or after it ends. [All]
- Review recent schema issues:
- candidates for closing [All]:
- issue updates:
- See full list of open schema issues here
- Fall Face-to-Face Meeting:
- confirm event - original idea for location was 2019 DLF Forum but 2019-07-08 meeting shifted preference to 2019 IIIF Working meeting to be held November 4-7 in Ann Arbor, Michigan. Most likely timeframe would be afternoon of Sunday, Nov. 3, IIIF Showcase scheduled for Monday, Nov. 4.
- possible topics:
- Confidence value calculation (CC - WC - PC) - annotation extension - using Ashok's summary document to try to decide on the best approach.
- examining lattice model proposed in Allow a String to contain alternative Glyph segmentation hypotheses - XML-based syntax sample in issue.
- others?
- Upcoming Board Changes/Expirations:
- CCS : Welcome Ciprian Dinu, Managing Director (CCS Romania)
- December 31, 2019: Clemens, Raju, and Stefan
- December 31, 2020: Art, Ashok, Evelien, Jukka, and Ralph
- December 31, 2021: Frederick, Nate, Matthew, and Ahmed Samir
- Spring 2020 F2F Meeting Location candidates:
- RLUK (Research Libraries UK) 2020. London, UK. March 16-18 2020
- Digitised newspapers - a new Eldorado for historians? Lausanne, Switzerland. April 23-24, 2020
- IFLA News Media section will sponsor a midyear meeting in Mexico City at UNAM and the National Library of Mexico (no URL or specific date yet - most likely to be held in March-May).
- suggestions
- Other business. [All]
Attending members
- Ahmed Samir
- Ashok Popat
- Art Rhyno
- Cally Law
- Ciprian Dinu
- Frederick Zarndt
- Jukka Kervinen
Minutes
wrt agenda item 1. The Board welcomed Ciprian (Cip) Dinu from CCS. Cip described his extensive background in ALTO, going back to the EU-funded METAe project in 2000 and his work on the first software to ever produce ALTO files.
wrt agenda item 3. Review recent schema issues:
-
Change BASELINE to accommodate a list of points in addition to a single point - the syntax generally agreed to at the March 18, 2018 meeting will be updated to change single line to polyline. Art will update the issue and seek a vote on github so that this change can be part of the next ALTO revision.
-
Length of main glyph and variants and TextBlocks and paragraphs will be closed with a note acknowledging their informational nature and that they don’t have a schema implication.
-
Ashok noted that recent work has been done on ALTO - PAGE xml: Object mapping and possible transformation generation by Christian Clausner. In response to a question from Frederick, Ashok highlighted a couple of developments at the recent ICDAR conference that might have a connection to ALTO:
- Increasing emphasis on making datasets available for public use. This is useful for competitions and testing, page segmentation analysis, text-finding, and other emerging research areas. [Note that this has synergy with the recent announcements for public ground-truth datasets made at DH2019].
- These datasets can have minimal formatting at this point due to their ad-hoc nature, but this is a promising trend for standards like ALTO and PAGE, and for advancing text analysis technologies in general.
-
The XML syntax used by Transkribus for handwriting was highlighted as a production example for the ALTO for Handwriting issue.
-
The github issues for Expand schema documentation for PointsType and Allow a String to contain alternative Glyph segmentation hypotheses have seen substantial activity since the last Board meeting. The lattice discussion has largely become embedded in the Glyph issue. Ashok noted that the related thread on the optical model score and language score in the issue could use further clarification from his group and he could bring forward some concrete examples at an in-person meeting.
wrt agenda item 4. Fall Face-to-Face Meeting:
There was general agreement that encoding OCR uncertainty and alternative hypotheses via a lattice, or similar model, would be a good topic for the Fall F2F gathering. The meeting will be held right before the 2019 IIIF Working meeting. The IIIF event runs November 4-7 at the University of Michigan campus in Ann Arbor, Michigan, and the F2F meeting will be held Sunday, Nov. 3 from 3 to 6 pm at the aadlfreespace room of the downtown branch of the Ann Arbor District Public Library. This location is about a 10 minute walk from the U. of Michigan campus. Art will ensure that Stefan and Christian are made aware of the meeting, and will extend an invitation to the Text Granularity Technical Specification Group, as well as to Robert Sachunsky, who has proposed an XML syntax for lattices for ALTO based on work done for PAGE.
wrt agenda item 5. Upcoming Board Changes/Expirations:
Art will check with Clemens, Raju, and Stefan about their plans in light of the upcoming expiration dates for their terms on the Board. Ashok may be passing on his membership to one of his team as he takes on broader duties at Google. Art will be assuming a Chair position on another Board in January 2021 and will be looking to pass the torch at the end of next year.
wrt agenda item 6. Spring 2020 F2F Meeting Location candidates:
The Digitised newspapers - a new Eldorado for historians? conference was seen as a good opportunity to present a paper on ALTO, and it would make sense to then use the event as an opportunity for a Spring F2F meeting.
wrt agenda item 7. Other business.
The next meeting will be scheduled after the Fall F2F meeting.