You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The README also contains the command-lines I used on this test set.
The kit contains the 'original' text which is one continuous blob of text, no newlines demarcating paragraphs. Also a new version with (rough) paragraphs: newlines inserted after each dot or '.'.
After FoLiA-correct, the new one validates, the original one fails.
I hope this can be resolved.
Thank you!
The text was updated successfully, but these errors were encountered:
Well, after a quick investigation, it was clear that the problem is NOT in FoLiA-correct. In fact the produced FoLiA is correct!
But libfolia, and thus folialint has some problems extracting text from documents where the last tag in a Sentence is a correction.
I will create a new issue in libfolia to address this problem.
Hi,
I have a batch of 51 valid FoLiA texts. After running through FoLiA-correct (OCR post-correction), 15 fail folialint.
I provide a test kit with the necessary input files for FoLiA-correct and a single test text here:
https://ticclops.uvt.nl/TESTcorrect.20221005.tar.gz
The README also contains the command-lines I used on this test set.
The kit contains the 'original' text which is one continuous blob of text, no newlines demarcating paragraphs. Also a new version with (rough) paragraphs: newlines inserted after each dot or '.'.
After FoLiA-correct, the new one validates, the original one fails.
I hope this can be resolved.
Thank you!
The text was updated successfully, but these errors were encountered: