Text cleanup.
Bookbinder.py
is regex; Antiquarian.py
is AI cleanup. openai_batch_api/batch.ipynb
is AI cleanup in batches and preparing finetune files.
Because every project has different requirements, this needs to be manual every time.
- Edit config files
- Edit python scripts
- Run scripts
- fix cost estimation to estimate just the file that was generated
- fix batch notebook to work when the batch job uses several files.
- refactor
- Add segmentation.
- Add deferral to save on compute