Skip to content

thaodod/VieBookRead

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is the official code (partial) for the paper "Reference-Based Post-OCR Processing with LLM for Diacritic Languages" which accepted in AAAI 2025.

The pipeline is done step by step by multiple authors and some of us forget to keep the source for some processings so we only publish part of all pipeline.

Compare with the paper, we made a small change after that to improve the best quality of final dataset like using Sonnet 3.5 for 4.7 section.

To collect the VieBooRead dataset for OCR, please head to hugging face https://huggingface.co/datasets/thaodd11/VieBookRead, sign the form and wait for approval to download.

Thank you

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published