v0.5.0
What's Changed
- fix(log): improve warning to clarify that lora_modules_to_save expect a list by @NanoCode012 in #1197
- Add: colab example by @JohanWork in #1196
- Feat/chatml add system message by @mhenrichsen in #1117
- fix learning rate scheduler's warnings by @RicardoDominguez in #1135
- precompute dpo logprobs setting and fixes by @winglian in #1199
- Update deps 202401 by @winglian in #1204
- make sure to register the base chatml template even if no system message is provided by @winglian in #1207
- workaround for transformers bug requireing do_sample for saveing pretrained by @winglian in #1206
- more checks and fixes for deepspeed and fsdp by @winglian in #1208
- drop py39 docker images, add py311, upgrade pytorch to 2.1.2 by @winglian in #1205
- Update qlora.yml - DeprecationWarning:
max_packed_sequence_len
is n… by @7flash in #1210 - Respect sliding_window=None by @DreamGenX in #1214
- ensure the tests use the same version of torch as the latest base docker images by @winglian in #1215
- ADD: warning if hub_model_id ist set but not any save strategy by @JohanWork in #1202
- run PR e2e docker CI tests in Modal by @winglian in #1217
- Revert "run PR e2e docker CI tests in Modal" by @winglian in #1220
- FEAT: add tagging support to axolotl for DPOTrainer by @filippo82 in #1209
- Peft lotfq by @winglian in #1222
- Fix typos (pretained -> pretrained) by @xhedit in #1231
- Fix and document test_datasets by @DreamGenX in #1228
- set torch version to what is installed during axolotl install by @winglian in #1234
- Cloud motd by @winglian in #1235
- [Nit] Fix callout by @hamelsmu in #1237
- Support for additional_special_tokens by @DreamGenX in #1221
- Peft deepspeed resume by @winglian in #1227
- support for true batches with multipack by @winglian in #1230
- add contact info for dedicated support for axolotl by @winglian in #1243
- fix(model): apply gate fp32 only for mixtral by @NanoCode012 in #1241
- relora: magnitude pruning of the optimizer by @winglian in #1245
- Pretrain transforms by @winglian in #1261
- Fix typo
bloat16
->bfloat16
by @chiragjn in #1257 - Add more save strategies for DPO training. by @PhilipMay in #1255
- BUG FIX: lock pytorch version in colab example by @JohanWork in #1247
- Fix typo preventing
model_kwargs
being injected by @zacbrannelly in #1262 - contributor avatars by @winglian in #1269
- simplify haldning for newer multipack patches so they can be added in a single place by @winglian in #1270
- Add link to axolotl cloud image on latitude by @winglian in #1275
- copy edits by @winglian in #1276
- allow remote data paths by @hamelsmu in #1278
- add support for https remote yamls by @hamelsmu in #1277
- run the docker image builds and push on gh action gpu runners by @winglian in #1218
- Update README.md by @hamelsmu in #1281
- don't use load and push together by @winglian in #1284
- Add MPS support by @maximegmd in #1264
- allow the optimizer prune ration for relora to be configurable by @winglian in #1287
- Scheduler implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? by @jinwonkim93 in #1273
- Add seq2seq eval benchmark callback by @LeonardoEmili in #1274
- Validation always happens on first step by @LeonardoEmili in #1300
- fix(examples): remove is_*_derived as it's parsed automatically by @NanoCode012 in #1297
- Allow load_best_model_at_end to be configured for early stopping on custom evaluation datasets by @dameikle in #1291
- Add instructions for playing with qlora model to colab example by @jaredpalmer in #1290
- fix(readme): update inference md link by @NanoCode012 in #1311
- Adding Google's gemma Model by @monk1337 in #1312
- multipack for gemma by @winglian in #1313
- deprecate: pytorch 2.0.1 image by @NanoCode012 in #1315
- fix(readme): Clarify doc for tokenizer_config by @NanoCode012 in #1323
- [bug-report template] Use yaml codeblock for config.yaml field by @kallewoof in #1303
- make mlflow optional by @winglian in #1317
- Pydantic 2.x cfg by @winglian in #1239
- chore: update readme to be more clear by @NanoCode012 in #1326
- ADD: push checkpoints to mlflow artifact registry by @JohanWork in #1295
- hotfix for capabilities loading by @winglian in #1331
- hotfix for lora rank by @winglian in #1332
- hotfix for missing outputs params by @winglian in #1333
- hotfix to exclude_unset from pydantic config when converting back to a dict by @winglian in #1334
- Add StableLM 2 Example Scripts by @ncoop57 in #1327
- add lion-pytorch optimizer by @maximegmd in #1299
- Support user-defined prompt processing strategies for dpo by @nopperl in #1248
- more pydantic fixes by @winglian in #1338
- Mps mistral lora by @maximegmd in #1292
- fix: checkpoint saving with deepspeed by @NanoCode012 in #1321
- Update debugging.md by @hamelsmu in #1339
- fix steps check for anneal on first cycle by @winglian in #1316
- Update fastchat_conversation_turns.py by @eltociear in #1294
- add gemma instruct chat template by @winglian in #1341
- more fixes 20240228 by @winglian in #1342
- deprecate py 3.9 support, set min pytorch version by @winglian in #1343
- Fix
use_mlflow
to be bool instead of str by @chiragjn in #1344 - fix for protected model_ namespace w pydantic by @winglian in #1345
- run tests again on Modal by @winglian in #1289
- chore: enable sample_packing for Gemma [skip ci] by @NanoCode012 in #1351
- Fix validation for early stopping by @chiragjn in #1358
- plain input/output prompt strategy w/o chat templates by @winglian in #1346
- lora+ support by @winglian in #1352
- allow the sharegpt handler to also better handle datasets destined for openai finetuning by @winglian in #1361
- Update tinyllama lora.yml to fix eval packing issue by @rasbt in #1362
- add starcoder2 by @ehartford in #1349
- Fix supported python versions in README, as python 3.9 was recently deprecated by @nirogu in #1364
- support for DoRA w/ PEFT by @winglian in #1363
- add docs for
input_output
format by @hamelsmu in #1367 - update flash attention for gemma support by @winglian in #1368
- JarvisLabs by @winglian in #1372
- FDSP + QLoRA by @winglian in #1378
- validation for fsdp and deepspeed by @winglian in #1388
- support for rslora by @winglian in #1387
- Fix pydantic configuration for the max_memory input by @dandm1 in #1385
- Set
gradient_clipping
toauto
in DeepSpeed configs by @seungduk-yanolja in #1382 - Add Glaive conversation format support by @brianfitzgerald in #1365
- chore: lint by @winglian in #1389
- add handling for argilla dpo-mix by @winglian in #1397
- Update ChatTemplate enum to include alpaca and gemma by @chiragjn in #1396
- Add QLoRA + FSDP Docs by @hamelsmu in #1403
- Don't disable existing loggers when configuring axolotl logging by @chiragjn in #1395
- Train parameters exclusively in specific ranges by @seungduk-yanolja in #1390
- Fix Gemma 7b qlora.yml by @rasbt in #1405
- beta support for multipack with gemmoe by @winglian in #1402
- Feat(readme): Add instructions for Google GPU VM instances by @NanoCode012 in #1410
- Fix(readme): Improve README QuickStart info by @NanoCode012 in #1408
- chore(script): remove redundant setting by @NanoCode012 in #1411
- Add Phorm AI Badge (Morph Labs) by @bentleylong in #1418
- ORPO by @winglian in #1419
- fix(config): passing gradient_checkpoint_kwargs by @NanoCode012 in #1412
- Add a config not to shuffle merged dataset by @seungduk-yanolja in #1394
- Feat: Add sharegpt multirole by @NanoCode012 in #1137
- support galore once upstreamed into transformers by @winglian in #1409
- fixes for dpo and orpo template loading by @winglian in #1424
- HF / FEAT: Optimize HF tags by @younesbelkada in #1425
- strip out hacky qlora-fsdp workarounds now that qlora-fsdp fixes are upstreamed by @winglian in #1428
- Bootstrap Hosted Axolotl Docs w/Quarto by @hamelsmu in #1429
- Orpo fix wip by @winglian in #1433
- chore(config): refactor old mistral config by @NanoCode012 in #1435
- docs: update link to docs of advanced topics in README.md by @pphuc25 in #1437
- fix(dataset): normalize tokenizer config and change hash from tokenizer class to tokenizer path by @NanoCode012 in #1298
- make sure to capture non-null defaults from config validation by @winglian in #1415
- Turn on sample_packing for Gemma training by @satpalsr in #1438
- Fix falcon tokenization step by @pharaouk in #1441
- Remove seq_len arg in rotary_emb by @BMPixel in #1443
- fix for accelerate env var for auto bf16, add new base image and expand torch_cuda_arch_list support by @winglian in #1413
- support layer replication for peft and fix rslora integration by @winglian in #1445
- fix layer_replication arg to peft by @winglian in #1446
- Jamba by @winglian in #1451
- Support loading datasets saved via save_to_disk by @fozziethebeat in #1432
- fix some of the edge cases for Jamba by @winglian in #1452
- configure nightly docker builds by @winglian in #1454
- fix how nightly tag is generated by @winglian in #1456
- fix yaml parsing for workflow by @winglian in #1457
- Nightlies fix v4 by @winglian in #1458
- qwen2_moe support w multipack by @winglian in #1455
- make sure to install causal_conv1d in docker by @winglian in #1459
- Lisa by @winglian in #1469
- feat: add deepspeed 3 with cpuoffload by @NanoCode012 in #1466
- reduce verbosity of the special tokens by @winglian in #1472
- Reorganize Docs by @hamelsmu in #1468
- fix pretraining_ on odd datasets by @mapmeld in #1463
- Added pip install ninja to accelerate installation of flash-attn by @melvinebenezer in #1461
- Pretrain multipack v2 by @winglian in #1470
- Feat: update doc by @NanoCode012 in #1475
- refactor utils.data module for line count linter by @winglian in #1476
- don't use deepspeed or fsdp when merging loras by @winglian in #1479
- add support for cohere chat template by @winglian in #1478
- feat: validate sample packing requires flash_attention by @NanoCode012 in #1465
- fix: reduce sample_packing FA error to warning by @NanoCode012 in #1484
- drop empty token from beginning if tokenizer has no bos_token (in the case of qwen) by @winglian in #1490
- Remove
validate_quantized_dora
by @xzuyn in #1485 - ignore issues with calculating # params when printing by @winglian in #1493
- add field to sft dataset pydantic for completion support by @winglian in #1497
- Fix the wrong adapter in qwen2-moe-qlora example by @maziyarpanahi in #1501
- Print versions by @winglian in #1496
- Correctly handle splits for datasets.arrow_dataset.Dataset objects by @scottfleming in #1504
- WIP: Support table logging for mlflow, too by @DavidFarago in #1506
- use locale agnostic seperator to make large nums easier to read by @winglian in #1503
- Update SaveAxolotlConfigtoWandBCallback to use artifact instead of save by @tcapelle in #1483
- DBRX Model Support by @winglian in #1462
- Unsloth gradient checkpointing offload by @winglian in #1528
- add docs around pre-processing by @winglian in #1529
- Update README.md by @emilytin0206 in #1521
- Update Readme to include support for Mixtral8X22B by @Barbarian7676 in #1518
- Create mixtral_22.yml by @Barbarian7676 in #1514
- feat(doc): Add config example for pad_token by @NanoCode012 in #1535
- llama-3 examples by @winglian in #1537
- Adding Llama-3 qlora by @monk1337 in #1536
- fix broken linting by @winglian in #1541
- fix(packages): lock datasets version by @NanoCode012 in #1545
- fix(yml): update llama-3 config by @NanoCode012 in #1543
- ORPO Trainer replacement by @winglian in #1551
- wrap prepared_ds_path in str() to avoid TypeError in fsspec package by @FrankRuis in #1548
- Add support for Gemma chat template by @Haoxiang-Wang in #1530
- make sure everything stays in the same dtype when using dpo + FSDP by @winglian in #1559
- Add ORPO example and e2e test by @tokestermw in #1572
- Pose context length ext by @winglian in #1567
- chore: clarify microbatch size by @NanoCode012 in #1579
- Add debug option for RL dataset preprocessing by @abhinand5 in #1404
- ADD: warning hub model by @JohanWork in #1301
- FIX: TRL trainer preprocessing step was running in one process by @ali-mosavian in #1583
- Pass weakref to model in the SIGINT handler to free up model post train function by @chiragjn in #1581
- improve save callbacks by @winglian in #1592
- fix for jupyterlab on cloud start by @winglian in #1594
- add torch 2.3.0 to builds by @winglian in #1593
- docs(config.qmd): add loraplus example by @tpoisonooo in #1577
- Gradio configuration parameters by @marijnfs in #1591
- Pass
deepspeed
andfsdp
asNone
explicitly when merging adapters to allow custom device_map by @chiragjn in #1575 - feat: exclude mamba blocks for jamba when load8bit by @NanoCode012 in #1578
- improve tool handling roles by @winglian in #1587
- make sure to save the lora adapter at the end of RL/dpo training by @winglian in #1573
- ignore the fsdp_config section too by @winglian in #1606
- adding llama3 fastchat conversation monkeypatch by @TJ-Solergibert in #1539
- feat: Add LLaMA-3 instruct prompt strategies for fine-tuning by @0-hero in #1553
- Llama3 dpo by @winglian in #1610
- add dstack section by @deep-diver in #1612
- fix attention mask collation by @winglian in #1603
- make sure to save on the last step by @winglian in #1615
- FIX: max_length and max_prompt_length was not being sent to ORPOTrainer by @ali-mosavian in #1584
- Fix
total_num_steps
by @bofenghuang in #1566 - update torch 2.2.1 -> 2.2.2 by @winglian in #1622
- update outputs path so that we can mount workspace to /workspace/data by @winglian in #1623
- bump versions of deps by @winglian in #1621
- fix symlinks for axolotl outputs by @winglian in #1625
- fix setting the authorized keys when there are more than one in the env var by @winglian in #1626
- install rsync too by @winglian in #1627
- cloud image w/o tmux by @winglian in #1628
- more fixes to work with runpod + skypilot by @winglian in #1629
- fix ray install by @winglian in #1630
- add save_only_model option by @jquesnelle in #1634
- Unsloth optims for Llama by @winglian in #1609
- fixes to save on fractional save_steps by @winglian in #1643
- Add KTO support by @benredmond in #1640
- Fix llama3 chat_template (extra <|eot_id|> on last turn) by @lhl in #1635
- allow report_to for multiple providers by @winglian in #1647
- Enable LoRA+ setting for dpo trainer by @thepowerfuldeez in #1646
- Update tiny-llama qlora.yml addressing eval packing error by @jaydeepthik in #1638
- support for custom messages field in sharegpt by @winglian in #1651
- Switch to parallel FFD bin packing algorithm. by @winglian in #1619
- document how to use
share_strategy="no"
by @charlesfrye in #1653 - update deps by @winglian in #1663
- Fix Google Colab notebook 2024-05 by @maciejgryka in #1662
- Generalizing the chat_template prompt strategy by @fozziethebeat in #1660
- Fix Lora config error for Llama3 by @oaishi in #1659
- fix lint issue that snuck through by @winglian in #1665
- Fix: ensure correct handling of
val_set_size
asfloat
orint
by @davidecaroselli in #1655 - Correct name of MixtralBlockSparseTop2MLP (L -> l) by @seungduk-yanolja in #1667
- Fix README quick start example usage model dirs by @abevoelker in #1668
- make sure the CI fails when pytest script fails by @winglian in #1669
- handle the system role too for chat templates by @winglian in #1671
- revert multipack batch sampler changes by @winglian in #1672
- re-enable phi for tests in modal ci by @winglian in #1373
- use mixins for orpo and kto configs so they work with axolotl customi zations by @winglian in #1674
- set chat_template in datasets config automatically by @winglian in #1664
- load explicit splits on datasets by @winglian in #1652
- cleanup the deepspeed proxy model at the end of training by @winglian in #1675
- need to add back drop_last for sampler by @winglian in #1676
- Fix the broken link in README by @saeedesmaili in #1678
- re-enable DPO for tests in modal ci by @winglian in #1374
- add support for rpo_alpha by @winglian in #1681
- Phi-3 conversation format, example training script and perplexity metric by @brianfitzgerald in #1582
- Adding Phi-3 model by @monk1337 in #1580
- ensure explicit eval_sample_packing to avoid mismatch issues by @winglian in #1692
- add qwen2-72b fsdp example by @winglian in #1696
- add back packing efficiency estimate so epochs and multi-gpu works properly by @winglian in #1697
- Sample packing eval fix by @winglian in #1695
- bump deepspeed for fix for grad norm compute putting tensors on different devices by @winglian in #1699
- verbose failure message by @winglian in #1694
- download model weights on preprocess step by @winglian in #1693
- drop length column for issues with eval without packing by @winglian in #1711
- add support for multipack for deepseek_v2 by @winglian in #1712
- Allow "weight: 0" in messages to mask them by @DavidFarago in #1703
- improve Pre-Tokenized Dataset docs by @josharian in #1684
- support for gemma2 w sample packing by @winglian in #1718
- add support for .env files for env vars by @winglian in #1724
- full weights fsdp training seems broken with fsdp_cpu_ram_efficient_loading by @winglian in #1726
- sanity check ranges in freeze.py by @josharian in #1686
- bump trl and accelerate for latest releases by @winglian in #1730
- Fixes the urls after org move by @mhenrichsen in #1734
- add tests so CI can catch updates where patches will break with unsloth by @winglian in #1737
- typo by @Klingefjord in #1685
- add torch 2.3.1 base image by @winglian in #1745
- fixes to prevent vram spike when train starts by @winglian in #1742
- update to pytorch 2.3.1 by @winglian in #1746
- bump xformers to 0.0.27 by @akshaylive in #1740
- Changed URL for dataset docs by @dameikle in #1744
- Fix eval_sample_packing in llama-3 lora example by @RodriMora in #1716
- bump flash attention 2.5.8 -> 2.6.1 by @winglian in #1738
- add basic support for the optimi adamw optimizer by @winglian in #1727
- update modal package and don't cache pip install by @winglian in #1757
- torch compile and cuda alloc improvements by @winglian in #1755
- support for llama multipack using updated code/patches by @winglian in #1754
- fix num gpu check by @winglian in #1760
- fixes to accelerator so that iterable pretraining datasets work by @winglian in #1759
- add torch_compile_mode options by @winglian in #1763
- re-enable PYTORCH_CUDA_ALLOC_CONF expandable_segments by @winglian in #1765
- set the number of dataset processes on the DPO Config rather than the trainer by @winglian in #1762
- Unsloth rope by @winglian in #1767
- bump transformers and set roundup_power2_divisions for more VRAM improvements, low bit ao optimizers by @winglian in #1769
- Fix untrained tokens by @winglian in #1771
- Add a
chat_template
prompt strategy for DPO by @fozziethebeat in #1725 - swaps to use newer sample packing for mistral by @winglian in #1773
- bump transformers for updated llama 3.1 by @winglian in #1778
- bump flash attention to 2.6.2 by @winglian in #1781
- fix fsdp loading of models, esp 70b by @winglian in #1780
- add support for simpo via cpo trainer by @winglian in #1772
- Bump deepspeed 20240727 by @winglian in #1790
- various batch of fixes by @winglian in #1785
- Add flexible configuration options for
chat_template
dataset training by @Tostino in #1756 - Update README.md by @mhenrichsen in #1792
- move to supporting mostly 12.1 w 2.3.1 and add new 12.4 with 2.4.0 by @winglian in #1793
- fix dockerfile and base builder by @winglian in #1795
- use 12.4.1 instead of 12.4 [skip-ci] by @winglian in #1796
- update test and main/nightly builds by @winglian in #1797
- publish axolotl images without extras in the tag name by @winglian in #1798
- qlora-fsdp ram efficient loading with hf trainer by @winglian in #1791
- fix roles to train defaults and make logging less verbose by @winglian in #1801
- Fix colab example notebook by @srib in #1805
- Fix setting correct repo id when pushing dataset to hub by @chrislee973 in #1657
- Update instruct-lora-8b.yml by @monk1337 in #1789
- Update conversation.qmd by @penfever in #1788
- One cycle lr by @winglian in #1803
- remove un-necessary zero-first guard as it's already called in a parent fn by @winglian in #1810
- set z3 leaf for deepseek v2 by @winglian in #1809
- logging improvements by @winglian in #1808
- update peft and transformers by @winglian in #1811
- skip no commit to main on ci by @winglian in #1814
- fix z3 leaf configuration when not using lists by @winglian in #1817
- update tinyllama to use final instead of checkpoints [skip ci] by @winglian in #1820
- Attempt to run multigpu in PR CI for now to ensure it works by @winglian in #1815
- fix the incorrect
max_length
for chat template by @chiwanpark in #1818 - bump hf dependencies by @winglian in #1823
- fix: parse eager_attention from cfg by @NanoCode012 in #1824
- fix: parse model_kwargs by @NanoCode012 in #1825
- update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model by @winglian in #1821
- add validation to prevent 8bit lora finetuning on H100s by @winglian in #1827
- optionally save the final FSDP model as a sharded state dict by @winglian in #1828
- fix: dont change quant storage dtype in case of fsdp by @xgal in #1837
- pretrain: fix with sample_packing=false by @tmm1 in #1841
- feat: add jamba chat_template by @xgal in #1843
- examples: fix tiny-llama pretrain yml syntax by @tmm1 in #1840
- rename jamba example by @xgal in #1846
- numpy 2.1.0 was released, but incompatible with numba by @winglian in #1849
- ensure that the bias is also in the correct dtype by @winglian in #1848
- make the train_on_eos default to turn so all eos tokens are treated the same by @winglian in #1847
- fix: prompt phi by @JohanWork in #1845
- docs: minor syntax highlight fix by @tmm1 in #1839
- ensure that the hftrainer deepspeed config is set before the trainer class is ever init'ed by @winglian in #1850
- run nightly ci builds against upstream main by @winglian in #1851
- rename nightly test and add badge by @winglian in #1853
- most model types now support flash attention 2 regardless of multipack support by @winglian in #1854
- add axolotl community license by @winglian in #1862
- don't mess with bnb since it needs compiled wheels by @winglian in #1859
- Liger Kernel integration by @winglian in #1861
- add liger example by @winglian in #1864
- add liger to readme by @winglian in #1865
- change up import to prevent AttributeError by @winglian in #1863
- simplify logic by @winglian in #1856
- better handling of llama-3 tool role by @winglian in #1782
- Spectrum plugin by @winglian in #1866
- update specturm authors by @winglian in #1869
- Fix
drop_long_seq
bug due to truncation in prompt tokenization strategies when usingchat_template
by @chiwanpark in #1867 - clear cuda cache to help with memory leak/creep by @winglian in #1858
- Add Liger Kernal support for Qwen2 by @chiwanpark in #1871
- Sample pack trust remote code v2 by @winglian in #1873
- monkey-patch transformers to simplify monkey-patching modeling code by @tmm1 in #1877
- fix liger plugin load issues by @tmm1 in #1876
- deepseekv2 liger support by @tmm1 in #1878
- Add liger kernel to features section by @ByronHsu in #1881
- pin liger-kernel to latest 0.2.1 by @winglian in #1882
- Update supported models for Liger Kernel by @DocShotgun in #1875
- run pytests with varied pytorch versions too by @winglian in #1883
- Fix RMSNorm monkey patch for Gemma models by @chiwanpark in #1886
- add e2e smoke tests for llama liger integration by @winglian in #1884
- support for auto_find_batch_size when packing by @winglian in #1885
- fix optimizer + fsdp combination in example [skip ci] by @winglian in #1893
- Docs for AMD-based HPC systems by @tijmen in #1891
- lint fix and update gha regex by @winglian in #1899
- Fix documentation for pre-tokenized dataset by @alpayariyak in #1894
- fix zero3 integration by @winglian in #1897
- bump accelerate to 0.34.2 by @winglian in #1901
- remove dynamic module loader monkeypatch as this was fixed upstream by @winglian in #1914
- Trigger the original tokenization behavior when no advanced turn settings are provided by @fozziethebeat in #1915
- validation fixes 20240923 by @winglian in #1925
- update upstream deps versions and replace lora+ by @winglian in #1928
- fix for empty lora+ lr embedding by @winglian in #1932
- bump transformers to 4.45.1 by @winglian in #1936
- Multimodal Vision Llama - rudimentary support by @winglian in #1940
- add 2.4.1 to base models by @winglian in #1953
- upgrade pytorch from 2.4.0 => 2.4.1 by @winglian in #1950
- fix(log): update perplexity log to clarify from eval split by @NanoCode012 in #1952
- Fix type annotations in relora.py by @bxptr in #1941
- Comet integration by @Lothiraldan in #1939
- Fixing/Adding Mistral Templates by @pandora-s-git in #1927
- lm_eval harness post train by @winglian in #1926
- Axo logo new by @winglian in #1956
- Add Support for
revision
Dataset Parameter to specify reading from Huggingface Dataset Revision by @thomascleberg in #1912 - Add MLFlow run name option in config by @awhazell in #1961
- add warning that sharegpt will be deprecated by @winglian in #1957
- Handle image input as string paths for MMLMs by @afrizalhasbi in #1958
- update hf deps by @winglian in #1964
- only install torchao for torch versions >= 2.4.0 by @winglian in #1963
- Fixing Validation - Mistral Templates by @pandora-s-git in #1962
- fix(doc): update eval causal lm metrics doc to add perplexity by @NanoCode012 in #1951
- Add support for qwen 2.5 chat template by @amazingvince in #1934
- wip add new proposed message structure by @winglian in #1904
- Reward model by @winglian in #1879
- add ds zero3 to multigpu biweekly tests by @winglian in #1900
- upgrade accelerate to 1.0.1 by @winglian in #1969
- examples: Fix config llama3 by @JohanWork in #1833
- also debug if other debug args are set by @winglian in #1977
- memoize dataset length for eval sample packing by @bursteratom in #1974
- add pytorch 2.5.0 base images by @winglian in #1979
- first pass at pytorch 2.5.0 support by @winglian in #1982
- fix builds so pytorch version isn't clobbered by @winglian in #1986
- use torch 2.4.1 images as latest now that torch 2.5.0 is out by @winglian in #1987
- Log checkpoints as mlflow artifacts by @awhazell in #1976
- revert image tagged as main-latest by @winglian in #1990
- Refactor func load_model to class ModelLoader by @MengqingCao in #1909
- Fix: Gradient Accumulation issue by @NanoCode012 in #1980
- fix zero3 by @winglian in #1994
- add option for resizing embeddings when adding new tokens by @winglian in #2000
- Feat: Add support for tokenizer’s or custom jinja chat_template by @NanoCode012 in #1970
- Hardware requirements by @OliverKunc in #1997
- feat: update yml chat_template to specify dataset field by @NanoCode012 in #2001
- remove skipped test by @winglian in #2002
- feat: add Exaone3 chat_template by @shing100 in #1995
- Fix get_chat_template call for trainer builder by @chiragjn in #2003
- Fix: modelloader handling of model_kwargs load_in*bit by @NanoCode012 in #1999
- Add plugin manager's callback hooks to training flow by @chiragjn in #2006
- add retries for load datasets requests failures by @winglian in #2007
- Base 2 5 1 by @winglian in #2010
- only run the remainder of the gpu test suite if one case passes first by @winglian in #2009
- upgrade liger to 0.4.0 by @winglian in #1973
- janky workaround to install FA2 on torch 2.5.1 base image since it takes forever to build by @winglian in #2022
- upgrade pytorch to 2.5.1 by @winglian in #2024
- Add weighted optimisation support for trl DPO trainer integration by @bursteratom in #2016
- remove fastchat and sharegpt by @winglian in #2021
- increment version to 0.5.0 for next release by @winglian in #2025
- make publish to pypi manually dispatchable as a workflow by @winglian in #2026
- remove unused direct dependency on fused dense lib by @winglian in #2027
New Contributors
- @7flash made their first contribution in #1210
- @DreamGenX made their first contribution in #1214
- @filippo82 made their first contribution in #1209
- @xhedit made their first contribution in #1231
- @chiragjn made their first contribution in #1257
- @PhilipMay made their first contribution in #1255
- @zacbrannelly made their first contribution in #1262
- @LeonardoEmili made their first contribution in #1274
- @dameikle made their first contribution in #1291
- @jaredpalmer made their first contribution in #1290
- @monk1337 made their first contribution in #1312
- @ncoop57 made their first contribution in #1327
- @nopperl made their first contribution in #1248
- @rasbt made their first contribution in #1362
- @nirogu made their first contribution in #1364
- @dandm1 made their first contribution in #1385
- @brianfitzgerald made their first contribution in #1365
- @bentleylong made their first contribution in #1418
- @pphuc25 made their first contribution in #1437
- @satpalsr made their first contribution in #1438
- @pharaouk made their first contribution in #1441
- @BMPixel made their first contribution in #1443
- @fozziethebeat made their first contribution in #1432
- @mapmeld made their first contribution in #1463
- @melvinebenezer made their first contribution in #1461
- @maziyarpanahi made their first contribution in #1501
- @scottfleming made their first contribution in #1504
- @DavidFarago made their first contribution in #1506
- @tcapelle made their first contribution in #1483
- @emilytin0206 made their first contribution in #1521
- @Barbarian7676 made their first contribution in #1518
- @FrankRuis made their first contribution in #1548
- @abhinand5 made their first contribution in #1404
- @ali-mosavian made their first contribution in #1583
- @tpoisonooo made their first contribution in #1577
- @marijnfs made their first contribution in #1591
- @TJ-Solergibert made their first contribution in #1539
- @0-hero made their first contribution in #1553
- @deep-diver made their first contribution in #1612
- @jquesnelle made their first contribution in #1634
- @benredmond made their first contribution in #1640
- @lhl made their first contribution in #1635
- @thepowerfuldeez made their first contribution in #1646
- @jaydeepthik made their first contribution in #1638
- @charlesfrye made their first contribution in #1653
- @maciejgryka made their first contribution in #1662
- @oaishi made their first contribution in #1659
- @davidecaroselli made their first contribution in #1655
- @abevoelker made their first contribution in #1668
- @saeedesmaili made their first contribution in #1678
- @josharian made their first contribution in #1684
- @Klingefjord made their first contribution in #1685
- @akshaylive made their first contribution in #1740
- @RodriMora made their first contribution in #1716
- @Tostino made their first contribution in #1756
- @srib made their first contribution in #1805
- @chrislee973 made their first contribution in #1657
- @penfever made their first contribution in #1788
- @chiwanpark made their first contribution in #1818
- @xgal made their first contribution in #1837
- @ByronHsu made their first contribution in #1881
- @DocShotgun made their first contribution in #1875
- @tijmen made their first contribution in #1891
- @bxptr made their first contribution in #1941
- @Lothiraldan made their first contribution in #1939
- @pandora-s-git made their first contribution in #1927
- @thomascleberg made their first contribution in #1912
- @awhazell made their first contribution in #1961
- @afrizalhasbi made their first contribution in #1958
- @amazingvince made their first contribution in #1934
- @bursteratom made their first contribution in #1974
- @MengqingCao made their first contribution in #1909
- @OliverKunc made their first contribution in #1997
- @shing100 made their first contribution in #1995
Full Changelog: v0.4.0...v0.5.0