Curator is an open-source tool to curate large scale datasets for post-training LLMs.
Curator was used to curate Bespoke-Stratos-17k, a reasoning dataset to train a fully open reasoning model Bespoke-Stratos.
- Calling Deepseek API for scalable synthetic data curation
- Easy structured data extraction
- Caching and automatic recovery
- Dataset visualization
- Saving $$$ using batch mode