Releases: axolotl-ai-cloud/axolotl
Releases · axolotl-ai-cloud/axolotl
v0.5.2
What's Changed
- move deprecated kwargs from trainer to trainingargs by @winglian in #2028
- add axolotlai docker hub org to publish list by @winglian in #2031
- update actions version for node16 deprecation by @winglian in #2037
- replace references to personal docker hub to org docker hub by @winglian in #2036
- feat: add metharme chat_template by @NanoCode012 in #2033
- change deprecated Stub to App by @winglian in #2038
- fix: handle sharegpt dataset missing by @NanoCode012 in #2035
- add P2P env when multi-gpu but not the full node by @winglian in #2041
- invert the string in string check for p2p device check by @winglian in #2044
- feat: print out dataset length even if not preprocess by @NanoCode012 in #2034
- Add example YAML file for training Mistral using DPO by @olivermolenschot in #2029
- fix: inference not using chat_template by @NanoCode012 in #2019
- feat: cancel ongoing tests if new CI is triggered by @NanoCode012 in #2046
- feat: upgrade to liger 0.4.1 by @NanoCode012 in #2045
- run pypi release action on tag create w version by @winglian in #2047
- make sure to tag images in docker for tagged releases by @winglian in #2051
- retry flaky test_packing_stream_dataset test that timesout on read by @winglian in #2052
- install default torch version if not already, new xformers wheels for torch 2.5.x by @winglian in #2049
- fix push to main and tag semver build for docker ci by @winglian in #2054
- Update unsloth for torch.cuda.amp deprecation by @bursteratom in #2042
- don't cancel the tests on main automatically for concurrency by @winglian in #2055
- ADOPT optimizer integration by @bursteratom in #2032
- Grokfast support by @winglian in #1917
- upgrade to flash-attn 2.7.0 by @winglian in #2048
- make sure to add tags for versioned tag on cloud docker images by @winglian in #2060
- fix duplicate base build by @winglian in #2061
- fix env var extraction by @winglian in #2043
- gradient accumulation tests, embeddings w pad_token fix, smaller models by @winglian in #2059
- upgrade datasets==3.1.0 and add upstream check by @winglian in #2067
- update to be deprecated evaluation_strategy by @winglian in #1682
- remove the bos token from dpo outputs by @winglian in #1733
- support passing trust_remote_code to dataset loading by @winglian in #2050
- support for schedule free and e2e ci smoke test by @winglian in #2066
- Fsdp grad accum monkeypatch by @winglian in #2064
- fix: loading locally downloaded dataset by @NanoCode012 in #2056
- Update
get_unpad_data
patching for multipack by @chiragjn in #2013 - increase worker count to 8 for basic pytests by @winglian in #2075
- upgrade autoawq==0.2.7.post2 for transformers fix by @winglian in #2070
- optim e2e tests to run a bit faster by @winglian in #2069
- don't build bdist by @winglian in #2076
- static assets, readme, and badges update v1 by @winglian in #2077
- Readme updates v2 by @winglian in #2078
- bump transformers for fsdp-grad-accum fix, remove patch by @winglian in #2079
- Feat: Drop long samples and shuffle rl samples by @NanoCode012 in #2040
- add optimizer step to prevent warning in tests by @winglian in #1502
- fix brackets on docker ci builds, add option to skip e2e builds by @winglian in #2080
- remove deprecated extra metadata kwarg from pydantic Field by @winglian in #2081
- release version 0.5.1 by @winglian in #2082
- make sure action has permission to create release by @winglian in #2083
- set manifest and fix for source dist by @winglian in #2084
- add missing dunder-init for monkeypatches and add tests for install from sdist by @winglian in #2085
New Contributors
- @olivermolenschot made their first contribution in #2029
Full Changelog: v0.5.0...v0.5.2
v0.5.0
What's Changed
- fix(log): improve warning to clarify that lora_modules_to_save expect a list by @NanoCode012 in #1197
- Add: colab example by @JohanWork in #1196
- Feat/chatml add system message by @mhenrichsen in #1117
- fix learning rate scheduler's warnings by @RicardoDominguez in #1135
- precompute dpo logprobs setting and fixes by @winglian in #1199
- Update deps 202401 by @winglian in #1204
- make sure to register the base chatml template even if no system message is provided by @winglian in #1207
- workaround for transformers bug requireing do_sample for saveing pretrained by @winglian in #1206
- more checks and fixes for deepspeed and fsdp by @winglian in #1208
- drop py39 docker images, add py311, upgrade pytorch to 2.1.2 by @winglian in #1205
- Update qlora.yml - DeprecationWarning:
max_packed_sequence_len
is n… by @7flash in #1210 - Respect sliding_window=None by @DreamGenX in #1214
- ensure the tests use the same version of torch as the latest base docker images by @winglian in #1215
- ADD: warning if hub_model_id ist set but not any save strategy by @JohanWork in #1202
- run PR e2e docker CI tests in Modal by @winglian in #1217
- Revert "run PR e2e docker CI tests in Modal" by @winglian in #1220
- FEAT: add tagging support to axolotl for DPOTrainer by @filippo82 in #1209
- Peft lotfq by @winglian in #1222
- Fix typos (pretained -> pretrained) by @xhedit in #1231
- Fix and document test_datasets by @DreamGenX in #1228
- set torch version to what is installed during axolotl install by @winglian in #1234
- Cloud motd by @winglian in #1235
- [Nit] Fix callout by @hamelsmu in #1237
- Support for additional_special_tokens by @DreamGenX in #1221
- Peft deepspeed resume by @winglian in #1227
- support for true batches with multipack by @winglian in #1230
- add contact info for dedicated support for axolotl by @winglian in #1243
- fix(model): apply gate fp32 only for mixtral by @NanoCode012 in #1241
- relora: magnitude pruning of the optimizer by @winglian in #1245
- Pretrain transforms by @winglian in #1261
- Fix typo
bloat16
->bfloat16
by @chiragjn in #1257 - Add more save strategies for DPO training. by @PhilipMay in #1255
- BUG FIX: lock pytorch version in colab example by @JohanWork in #1247
- Fix typo preventing
model_kwargs
being injected by @zacbrannelly in #1262 - contributor avatars by @winglian in #1269
- simplify haldning for newer multipack patches so they can be added in a single place by @winglian in #1270
- Add link to axolotl cloud image on latitude by @winglian in #1275
- copy edits by @winglian in #1276
- allow remote data paths by @hamelsmu in #1278
- add support for https remote yamls by @hamelsmu in #1277
- run the docker image builds and push on gh action gpu runners by @winglian in #1218
- Update README.md by @hamelsmu in #1281
- don't use load and push together by @winglian in #1284
- Add MPS support by @maximegmd in #1264
- allow the optimizer prune ration for relora to be configurable by @winglian in #1287
- Scheduler implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? by @jinwonkim93 in #1273
- Add seq2seq eval benchmark callback by @LeonardoEmili in #1274
- Validation always happens on first step by @LeonardoEmili in #1300
- fix(examples): remove is_*_derived as it's parsed automatically by @NanoCode012 in #1297
- Allow load_best_model_at_end to be configured for early stopping on custom evaluation datasets by @dameikle in #1291
- Add instructions for playing with qlora model to colab example by @jaredpalmer in #1290
- fix(readme): update inference md link by @NanoCode012 in #1311
- Adding Google's gemma Model by @monk1337 in #1312
- multipack for gemma by @winglian in #1313
- deprecate: pytorch 2.0.1 image by @NanoCode012 in #1315
- fix(readme): Clarify doc for tokenizer_config by @NanoCode012 in #1323
- [bug-report template] Use yaml codeblock for config.yaml field by @kallewoof in #1303
- make mlflow optional by @winglian in #1317
- Pydantic 2.x cfg by @winglian in #1239
- chore: update readme to be more clear by @NanoCode012 in #1326
- ADD: push checkpoints to mlflow artifact registry by @JohanWork in #1295
- hotfix for capabilities loading by @winglian in #1331
- hotfix for lora rank by @winglian in #1332
- hotfix for missing outputs params by @winglian in #1333
- hotfix to exclude_unset from pydantic config when converting back to a dict by @winglian in #1334
- Add StableLM 2 Example Scripts by @ncoop57 in #1327
- add lion-pytorch optimizer by @maximegmd in #1299
- Support user-defined prompt processing strategies for dpo by @nopperl in #1248
- more pydantic fixes by @winglian in #1338
- Mps mistral lora by @maximegmd in #1292
- fix: checkpoint saving with deepspeed by @NanoCode012 in #1321
- Update debugging.md by @hamelsmu in #1339
- fix steps check for anneal on first cycle by @winglian in #1316
- Update fastchat_conversation_turns.py by @eltociear in #1294
- add gemma instruct chat template by @winglian in #1341
- more fixes 20240228 by @winglian in #1342
- deprecate py 3.9 support, set min pytorch version by @winglian in #1343
- Fix
use_mlflow
to be bool instead of str by @chiragjn in #1344 - fix for protected model_ namespace w pydantic by @winglian in #1345
- run tests again on Modal by @winglian in #1289
- chore: enable sample_packing for Gemma [skip ci] by @NanoCode012 in #1351
- Fix validation for early stopping by @chiragjn in #1358
- plain input/output prompt strategy w/o chat templates by @winglian in #1346
- lora+ support by @winglian in #1352
- allow the sharegpt handler to also better handle datasets destined for openai finetuning by @winglian in #1361
- Update tinyllama lora.yml to fix eval packing issue by @rasbt in #1362
- add starcoder2 by @ehartford in htt...
v0.4.0
New Features (highlights)
- Streaming multipack for continued pre-training
- Mistral & Mixtral support
- Simplified Multipack for Mistral, Falcon, Qwen2, and Phi
- DPO/IPO/KTO-pairs RL-training support via trl
- Improve BatchSampler for multipack support, allows for resume from checkpointing, shuffling data each epoch
- bf16: auto support
- add MLFlow support
- save YAML configs to WandB
- save predictions during evals to WandB
- more tests! more smoke tests for smol model training
- NEFTune support
What's Changed
- document that packaging needs to be installed before flash-attn by @winglian in #559
- Fix pretraining with iterable/streaming Dataset by @jphme in #556
- Add training callback to send predictions to WandB table by @Glavin001 in #521
- fix wandb so mypy doesn't complain by @winglian in #562
- check for the existence of the default accelerate config that can create headaches by @winglian in #561
- add optimization for group-by-len by @winglian in #563
- gracefully handle length feature used for group by by @winglian in #565
- improve how we setup eval/save strategies and steps by @winglian in #547
- let hf trainer handle torch compile by @winglian in #516
- Model parallel by @winglian in #538
- fix save_steps so it doesn't get duplicated by @winglian in #567
- set auto for other params that hf trainer sets for ds. include zero1 json by @winglian in #570
- remove columns after tokenizing for pretraining by @winglian in #571
- mypy wandb ignore by @winglian in #572
- Phi examples by @winglian in #569
- e2e testing by @winglian in #574
- E2e device cuda by @winglian in #575
- E2e passing tests by @winglian in #576
- refactor scripts/finetune.py into new cli modules by @winglian in #550
- update support matrix with btlm and phi by @winglian in #579
- prevent cli functions from getting fired on import by @winglian in #581
- Fix Codellama examples by @Kimiko-AI in #582
- support custom field for completion from yml by @winglian in #580
- Feat(doc): Add features to doc by @NanoCode012 in #583
- Support Sample packing for phi arch by @winglian in #586
- don't resize embeddings if it's already large enough by @winglian in #577
- Enable full (non-sharded) model saving with SHARDED_STATE_DICT by @jphme in #584
- make phi training work with Loras by @winglian in #588
- optionally configure sample packing for evals by @winglian in #589
- don't add position_ids for evals when not using eval sample packing by @winglian in #591
- gather/broadcast the max value of the packing efficiency automatically by @winglian in #463
- Feat(data): Allow loading local csv and text by @NanoCode012 in #594
- add bf16 check by @winglian in #587
- btlm and falcon monkey patches for flash attn by @winglian in #566
- minor tweaks to simplify by @winglian in #597
- Fix for check with cfg and merge_lora by @winglian in #600
- improve handling for empty text on the tokenization step by @winglian in #502
- more sane defaults for openllama 3b used for quickstarts by @winglian in #602
- update dockerfile to not build evoformer since it fails the build by @winglian in #607
- Delete duplicate lines in models.py by @bofenghuang in #606
- support to disable exllama for gptq by @winglian in #604
- Update requirements.txt - Duplicated package by @Psancs05 in #610
- Only run tests when a change to python files is made by @maximegmd in #614
- Create multi-node.md by @maximegmd in #613
- fix distributed devices by @maximegmd in #612
- ignore wandb to resolve isort headaches by @winglian in #619
- skip the gpu memory checks if the device is set to 'auto' by @winglian in #609
- let MAX_JOBS use the default since we're not resource constrained on our self-hosted runners by @winglian in #427
- run eval on the first step to get a baseline by @winglian in #617
- split completion text to sequence_len by @winglian in #616
- misc fixes to add gptq tests by @winglian in #621
- chore(callback): Remove old peft saving code by @NanoCode012 in #510
- update README w deepspeed info by @winglian in #605
- create a model card with axolotl badge by @winglian in #624
- better handling and logging of empty sharegpt turns by @winglian in #603
- tweak: improve base builder for smaller layers by @maximegmd in #500
- Feat(doc): Add eval_sample_packing to doc by @NanoCode012 in #625
- Fix: Fail bf16 check when running on cpu during merge by @NanoCode012 in #631
- default model changed by @mhenrichsen in #629
- Added quotes to the pip install -e command in the documentation to fix an incompatibility … by @Nan-Do in #632
- Feat: Add support for upstream FA2 by @NanoCode012 in #626
- eval_table isn't quite stable enough to be in default llama configs by @winglian in #637
- attention_mask not needed for training by @winglian in #642
- update for recent transformers updates by @winglian in #636
- use fastchat conversations template by @winglian in #578
- skip some flash attn patches unless explicitly enabled by @winglian in #643
- Correct typos in datasets.py by @felixonmars in #639
- Fix bug in dataset loading by @ethanhs in #284
- Warn users to login to HuggingFace by @Napuh in #645
- Mistral flash attn packing by @winglian in #646
- Fix(cfg): Add validation for save_strategy and eval_strategy by @NanoCode012 in #633
- Feat: Add example for Mistral by @NanoCode012 in #644
- Add mistral/README.md by @adarshxs in #647
- fix for flash attn w mistral w/o sammple packing by @winglian in #648
- don't strip the prompt for check since we don't strip to tokenize anymore by @winglian in #650
- add support for defined train split by @winglian in #654
- Fix bug when using pretokenized datasets by @ein-ich in https...
v0.3.0
What's Changed
- Fix sharegpt type in doc by @NanoCode012 in #202
- add support for opimum bettertransformers by @winglian in #92
- Use AutoTokenizer for redpajama example by @sroecker in #209
- issue #205 bugfix by @MaciejKarasek in #206
- Fix tokenizing labels by @winglian in #214
- add float16 docs and tweak typehints by @winglian in #212
- support adamw and grad norm hyperparams by @winglian in #215
- Fixing Data Readme by @msinha251 in #235
- don't fail fast by @winglian in #218
- better py3 support w pre-commit by @winglian in #241
- optionally define whether to use_fast tokenizer by @winglian in #240
- skip the system prompt by @winglian in #243
- push intermediate model checkpoints to hub by @winglian in #244
- System prompt data by @winglian in #224
- Add cfg.push_to_hub_model_id to readme by @NanoCode012 in #252
- Fix typing list in prompt tokenizer by @NanoCode012 in #249
- add option for instruct w sys prompts by @winglian in #246
- open orca support by @winglian in #255
- update pip install command for apex by @winglian in #247
- Fix future deprecation push_to_hub_model_id by @NanoCode012 in #258
- [WIP] Support loading data files from a local directory by @utensil in #221
- Fix(readme): local path loading and custom strategy type by @NanoCode012 in #264
- don't use llama if trust_remote_code is set since that needs to use AutoModel path by @winglian in #266
- params are adam_, not adamw_ by @winglian in #268
- Quadratic warmup by @winglian in #271
- support for loading a model by git revision by @winglian in #272
- Feat(docs): Add model_revision arg by @NanoCode012 in #273
- Feat: Add save_safetensors by @NanoCode012 in #275
- Feat: Set push to hub as private by default by @NanoCode012 in #274
- Allow non-default dataset configurations by @cg123 in #277
- Feat(readme): improve docs on multi-gpu by @NanoCode012 in #279
- Update requirements.txt by @teknium1 in #280
- Logging update: added PID and formatting by @theobjectivedad in #276
- git fetch fix for docker by @winglian in #283
- misc fixes by @winglian in #286
- fix axolotl training args dataclass annotation by @winglian in #287
- fix(readme): remove accelerate config by @NanoCode012 in #288
- add hf_transfer to requirements for faster hf upload by @winglian in #289
- Fix(tokenizing): Use multi-core by @NanoCode012 in #293
- Pytorch 2.0.1 by @winglian in #300
- Fix(readme): Improve wording for push model by @NanoCode012 in #304
- add apache 2.0 license by @winglian in #308
- Flash attention 2 by @winglian in #299
- don't resize embeddings to multiples of 32x by default by @winglian in #313
- Add XGen info to README and example config by @ethanhs in #306
- better handling since xgen tokenizer breaks with convert_tokens_to_ids by @winglian in #307
- add runpod envs to .bashrc, fix bnb env by @winglian in #316
- update prompts for open orca to match the paper by @winglian in #317
- latest HEAD of accelerate causes 0 loss immediately w FSDP by @winglian in #321
- Prune cuda117 by @winglian in #327
- update README for updated docker images by @winglian in #328
- fix FSDP save of final model by @winglian in #329
- pin accelerate so it works with llama2 by @winglian in #330
- add peft install back since it doesn't get installed by setup.py by @winglian in #331
- lora/qlora w flash attention fixes by @winglian in #333
- feat/llama-2 examples by @mhenrichsen in #319
- update README by @tmm1 in #337
- Fix flash-attn + qlora not working with llama models by @tmm1 in #336
- optimize the iteration when tokenizeing large datasets by @winglian in #332
- Added Orca Mini prompt strategy by @jphme in #263
- Update XFormers Attention Monkeypatch to handle Llama-2 70B (GQA) by @ssmi153 in #339
- add a basic ds zero3 config by @winglian in #347
- experimental llama 2 chat support by @jphme in #296
- ensure enable_input_require_grads is called on model before getting the peft model by @winglian in #345
- set
group_by_length
to false in all examples by @tmm1 in #350 - GPU memory usage logging by @tmm1 in #354
- simplify
load_model
signature by @tmm1 in #356 - Clarify pre-tokenize before multigpu by @NanoCode012 in #359
- Update README.md on pretraining_dataset by @NanoCode012 in #360
- bump to latest bitsandbytes release with major bug fixes by @tmm1 in #355
- feat(merge): save tokenizer on merge by @NanoCode012 in #362
- Feat: Add rope scaling by @NanoCode012 in #343
- Fix(message): Improve error message for bad format by @NanoCode012 in #365
- fix(model loading): warn when model revision is passed to gptq by @NanoCode012 in #364
- Add wandb_entity to wandb options, update example configs, update README by @morganmcg1 in #361
- fix(save): save as safetensors by @NanoCode012 in #363
- Attention mask and position id fixes for packing by @winglian in #285
- attempt to run non-base docker builds on regular cpu hosts by @winglian in #369
- revert previous change and build ax images w docker on gpu by @winglian in #371
- extract module for working with cfg by @tmm1 in #372
- quiet noise from llama tokenizer by setting pad token earlier by @tmm1 in #374
- improve GPU logging to break out pytorch cache and system mem by @tmm1 in #376
- simplify
load_tokenizer
by @tmm1 in #375 - fix check for flash attn branching by @w...
v0.2.1
What's Changed
- docker fixes: py310, fix cuda arg in deepspeed by @winglian in #115
- add support for gradient accumulation steps by @winglian in #123
- split up llama model loading so config can be loaded from base config and models can be loaded from a path by @winglian in #120
- copy xformers attn from ooba since we removed dep on alpaca_lora_4bit by @winglian in #124
- Fix(readme): Fix torch missing from readme by @NanoCode012 in #118
- Add accelerate dep by @winglian in #114
- Feat(inference): Swap to GenerationConfig by @NanoCode012 in #119
- add py310 support from base image by @winglian in #127
- add badge info to readme by @winglian in #129
- fix packing so that concatenated sequences reset the attention by @winglian in #131
- swap batch size for gradient accumulation steps to decouple from num gpu by @winglian in #130
- fix batch size calculation by @winglian in #134
- Fix: Update doc for grad_accu and add validation tests for batch size by @NanoCode012 in #135
- Feat: Add lambdalabs instruction by @NanoCode012 in #141
- Feat: Add custom prompt readme and add missing prompt strategies to Readme by @NanoCode012 in #142
- added docker-compose file by @FarisHijazi in #146
- Update README.md for correct image tags by @winglian in #147
- fix device map by @winglian in #148
- clone in docker by @winglian in #149
- new prompters, misc fixes for output dir missing using fsdp, and changing max seq len by @winglian in #155
- fix camel ai, add guanaco/oasst mapping for sharegpt by @winglian in #158
- Fix: Update peft and gptq instruction by @NanoCode012 in #161
- Fix: Move custom prompts out of hidden by @NanoCode012 in #162
- Fix future deprecate prepare_model_for_int8_training by @NanoCode012 in #143
- Feat: Set matmul tf32=True when tf32 passed by @NanoCode012 in #163
- Fix: Validate falcon with fsdp by @NanoCode012 in #164
- Axolotl supports falcon + qlora by @utensil in #132
- Fix: Set to use cfg.seed or 42 for seed by @NanoCode012 in #166
- Fix: Refactor out unmodified save_steps and eval_steps by @NanoCode012 in #167
- Disable Wandb if no wandb project is specified by @bratao in #168
- Feat: Improve lambda labs instruction by @NanoCode012 in #170
- Fix falcon support lora by @NanoCode012 in #171
- Feat: Add landmark attention by @NanoCode012 in #169
- Fix backward compat for peft by @NanoCode012 in #176
- Update README.md to reflect current gradient checkpointing support by @PocketDocLabs in #178
- fix for max sequence len across different model types by @winglian in #179
- Add streaming inference & fix stopping at EOS by @Glavin001 in #180
- add support to extend context with xpos rope by @winglian in #181
- fix for local variable 'LlamaForCausalLM' referenced before assignment by @winglian in #182
- pass a prompt in from stdin for inference by @winglian in #183
- Update FAQS.md by @akj2018 in #186
- various fixes by @winglian in #189
- more config pruning and migrating by @winglian in #190
- Add save_steps and eval_steps to Readme by @NanoCode012 in #191
- Fix config path after config moved by @NanoCode012 in #194
- Fix training over existing lora by @AngainorDev in #159
- config fixes by @winglian in #193
- misc fixes by @winglian in #192
- Fix landmark attention patch by @NanoCode012 in #177
- peft no longer needs device_map by @winglian in #187
- chore: Fix inference README. by @mhenrichsen in #197
- Update README.md to include a community showcase by @PocketDocLabs in #200
- chore: Refactor inf_kwargs out by @NanoCode012 in #199
- tweak config to work by @winglian in #196
New Contributors
- @FarisHijazi made their first contribution in #146
- @utensil made their first contribution in #132
- @bratao made their first contribution in #168
- @PocketDocLabs made their first contribution in #178
- @Glavin001 made their first contribution in #180
- @akj2018 made their first contribution in #186
- @AngainorDev made their first contribution in #159
- @mhenrichsen made their first contribution in #197
Full Changelog: v0.2.0...v0.2.1
v0.2.0
What's Changed
- Add pre-commit: black+flake8+pylint+mypy+isort+bandit by @NanoCode012 in #98
- Qlora openllama 3b example by @fearnworks in #106
- Viktoriussuwandi patch by @viktoriussuwandi in #105
- default to qlora support, make gptq specific image by @winglian in #108
New Contributors
- @fearnworks made their first contribution in #106
- @viktoriussuwandi made their first contribution in #105
Full Changelog: v0.1.0...v0.2.0
current "Stable"
v0.1.0 Merge pull request #111 from OpenAccess-AI-Collective/sharegpt-token-…