[FEATURE] ArgillaTrainer - allow passing initialized model & tokenizer #3631

tomaarsen · 2023-08-25T10:19:50Z

Is your feature request related to a problem? Please describe.
Related: #3467 (comment)

Describe the solution you'd like
I want users to be able to provide an initialized model and tokenizer:

model = AutoModel...
tokenizer = AutoTokenizer...

trainer = ArgillaTrainer(
    dataset=...,
    model=model,
    tokenizer=tokenizer,
    ...
)

Describe alternatives you've considered

Providing a model string and trying to hack together your desired model and tokenizer via trainer.update_config
Extending the ArgillaTrainer somehow to implement this feature.
Train separate from the ArgillaTrainer, but with the framework directly.

Additional context
Should be implemented after #3467

Tom Aarsen

The text was updated successfully, but these errors were encountered:

…3751) Hello! # Description Closes #3631. This is important to give users freedom to very specifically set up their tokenizer. This is required e.g. for SFT with TRL. **Type of change** - [x] New feature (non-breaking change which adds functionality) - [ ] Refactor (change restructuring the codebase without changing functionality) - [ ] Improvement (change adding some improvement to an existing functionality) **How Has This Been Tested** Updated the relevant tests (TRL, Transformers) to also train with the passed model & tokenizer. **Checklist** - [ ] I added relevant documentation - [x] I followed the style guidelines of this project - [x] I did a self-review of my code - [ ] I made corresponding changes to the documentation - [x] My changes generate no new warnings - [x] I have added tests that prove my fix is effective or that my feature works - [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK) (see text above) - [ ] I have added relevant notes to the `CHANGELOG.md` file (See https://keepachangelog.com/) --- **TODO**: - [x] CHANGELOG - [x] Documentation - [x] Double-check docstrings --- - Tom Aarsen --------- Co-authored-by: David Berenstein <david.m.berenstein@gmail.com>

tomaarsen added type: enhancement Indicates new feature requests area: trainer Indicates that an issue or pull request is related to the Argilla Trainer labels Aug 25, 2023

tomaarsen self-assigned this Sep 12, 2023

tomaarsen mentioned this issue Sep 12, 2023

feat: Allow passing model and tokenizer to ArgillaTrainer directly #3751

Merged

14 tasks

tomaarsen closed this as completed in #3751 Sep 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] ArgillaTrainer - allow passing initialized model & tokenizer #3631

[FEATURE] ArgillaTrainer - allow passing initialized model & tokenizer #3631

tomaarsen commented Aug 25, 2023

[FEATURE] ArgillaTrainer - allow passing initialized model & tokenizer #3631

[FEATURE] ArgillaTrainer - allow passing initialized model & tokenizer #3631

Comments

tomaarsen commented Aug 25, 2023