HIPE-2022 data v2.0
Release notes
This release contains:
- π ajmc: full train and dev sets for fr, en, de.
- π ajmc: mappings [OCR-gold transcript] for ajmc entities (see README-ajmc)
- π newseye: correction of document_id number in metadata line
# hipe2022:document_id =
+ removal of unannotated documents from DE train set (see README-newseye) - π sonar: thorough revision of NER and NEL annotations + removal of unrevised materials from dev set (see README-sonar.md)
- updated stats in the dedicated notebook
- updated corpus statistics in the dedicated notebook