Release v0.5.0 · axolotl-ai-cloud/axolotl

What's Changed

fix(log): improve warning to clarify that lora_modules_to_save expect a list by @NanoCode012 in #1197
Add: colab example by @JohanWork in #1196
Feat/chatml add system message by @mhenrichsen in #1117
fix learning rate scheduler's warnings by @RicardoDominguez in #1135
precompute dpo logprobs setting and fixes by @winglian in #1199
Update deps 202401 by @winglian in #1204
make sure to register the base chatml template even if no system message is provided by @winglian in #1207
workaround for transformers bug requireing do_sample for saveing pretrained by @winglian in #1206
more checks and fixes for deepspeed and fsdp by @winglian in #1208
drop py39 docker images, add py311, upgrade pytorch to 2.1.2 by @winglian in #1205
Update qlora.yml - DeprecationWarning: max_packed_sequence_len is n… by @7flash in #1210
Respect sliding_window=None by @DreamGenX in #1214
ensure the tests use the same version of torch as the latest base docker images by @winglian in #1215
ADD: warning if hub_model_id ist set but not any save strategy by @JohanWork in #1202
run PR e2e docker CI tests in Modal by @winglian in #1217
Revert "run PR e2e docker CI tests in Modal" by @winglian in #1220
FEAT: add tagging support to axolotl for DPOTrainer by @filippo82 in #1209
Peft lotfq by @winglian in #1222
Fix typos (pretained -> pretrained) by @xhedit in #1231
Fix and document test_datasets by @DreamGenX in #1228
set torch version to what is installed during axolotl install by @winglian in #1234
Cloud motd by @winglian in #1235
[Nit] Fix callout by @hamelsmu in #1237
Support for additional_special_tokens by @DreamGenX in #1221
Peft deepspeed resume by @winglian in #1227
support for true batches with multipack by @winglian in #1230
add contact info for dedicated support for axolotl by @winglian in #1243
fix(model): apply gate fp32 only for mixtral by @NanoCode012 in #1241
relora: magnitude pruning of the optimizer by @winglian in #1245
Pretrain transforms by @winglian in #1261
Fix typo bloat16 -> bfloat16 by @chiragjn in #1257
Add more save strategies for DPO training. by @PhilipMay in #1255
BUG FIX: lock pytorch version in colab example by @JohanWork in #1247
Fix typo preventing model_kwargs being injected by @zacbrannelly in #1262
contributor avatars by @winglian in #1269
simplify haldning for newer multipack patches so they can be added in a single place by @winglian in #1270
Add link to axolotl cloud image on latitude by @winglian in #1275
copy edits by @winglian in #1276
allow remote data paths by @hamelsmu in #1278
add support for https remote yamls by @hamelsmu in #1277
run the docker image builds and push on gh action gpu runners by @winglian in #1218
Update README.md by @hamelsmu in #1281
don't use load and push together by @winglian in #1284
Add MPS support by @maximegmd in #1264
allow the optimizer prune ration for relora to be configurable by @winglian in #1287
Scheduler implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? by @jinwonkim93 in #1273
Add seq2seq eval benchmark callback by @LeonardoEmili in #1274
Validation always happens on first step by @LeonardoEmili in #1300
fix(examples): remove is_*_derived as it's parsed automatically by @NanoCode012 in #1297
Allow load_best_model_at_end to be configured for early stopping on custom evaluation datasets by @dameikle in #1291
Add instructions for playing with qlora model to colab example by @jaredpalmer in #1290
fix(readme): update inference md link by @NanoCode012 in #1311
Adding Google's gemma Model by @monk1337 in #1312
multipack for gemma by @winglian in #1313
deprecate: pytorch 2.0.1 image by @NanoCode012 in #1315
fix(readme): Clarify doc for tokenizer_config by @NanoCode012 in #1323
[bug-report template] Use yaml codeblock for config.yaml field by @kallewoof in #1303
make mlflow optional by @winglian in #1317
Pydantic 2.x cfg by @winglian in #1239
chore: update readme to be more clear by @NanoCode012 in #1326
ADD: push checkpoints to mlflow artifact registry by @JohanWork in #1295
hotfix for capabilities loading by @winglian in #1331
hotfix for lora rank by @winglian in #1332
hotfix for missing outputs params by @winglian in #1333
hotfix to exclude_unset from pydantic config when converting back to a dict by @winglian in #1334
Add StableLM 2 Example Scripts by @ncoop57 in #1327
add lion-pytorch optimizer by @maximegmd in #1299
Support user-defined prompt processing strategies for dpo by @nopperl in #1248
more pydantic fixes by @winglian in #1338
Mps mistral lora by @maximegmd in #1292
fix: checkpoint saving with deepspeed by @NanoCode012 in #1321
Update debugging.md by @hamelsmu in #1339
fix steps check for anneal on first cycle by @winglian in #1316
Update fastchat_conversation_turns.py by @eltociear in #1294
add gemma instruct chat template by @winglian in #1341
more fixes 20240228 by @winglian in #1342
deprecate py 3.9 support, set min pytorch version by @winglian in #1343
Fix use_mlflow to be bool instead of str by @chiragjn in #1344
fix for protected model_ namespace w pydantic by @winglian in #1345
run tests again on Modal by @winglian in #1289
chore: enable sample_packing for Gemma [skip ci] by @NanoCode012 in #1351
Fix validation for early stopping by @chiragjn in #1358
plain input/output prompt strategy w/o chat templates by @winglian in #1346
lora+ support by @winglian in #1352
allow the sharegpt handler to also better handle datasets destined for openai finetuning by @winglian in #1361
Update tinyllama lora.yml to fix eval packing issue by @rasbt in #1362
add starcoder2 by @ehartford in #1349
Fix supported python versions in README, as python 3.9 was recently deprecated by @nirogu in #1364
support for DoRA w/ PEFT by @winglian in #1363
add docs for input_output format by @hamelsmu in #1367
update flash attention for gemma support by @winglian in #1368
JarvisLabs by @winglian in #1372
FDSP + QLoRA by @winglian in #1378
validation for fsdp and deepspeed by @winglian in #1388
support for rslora by @winglian in #1387
Fix pydantic configuration for the max_memory input by @dandm1 in #1385
Set gradient_clipping to auto in DeepSpeed configs by @seungduk-yanolja in #1382
Add Glaive conversation format support by @brianfitzgerald in #1365
chore: lint by @winglian in #1389
add handling for argilla dpo-mix by @winglian in #1397
Update ChatTemplate enum to include alpaca and gemma by @chiragjn in #1396
Add QLoRA + FSDP Docs by @hamelsmu in #1403
Don't disable existing loggers when configuring axolotl logging by @chiragjn in #1395
Train parameters exclusively in specific ranges by @seungduk-yanolja in #1390
Fix Gemma 7b qlora.yml by @rasbt in #1405
beta support for multipack with gemmoe by @winglian in #1402
Feat(readme): Add instructions for Google GPU VM instances by @NanoCode012 in #1410
Fix(readme): Improve README QuickStart info by @NanoCode012 in #1408
chore(script): remove redundant setting by @NanoCode012 in #1411
Add Phorm AI Badge (Morph Labs) by @bentleylong in #1418
ORPO by @winglian in #1419
fix(config): passing gradient_checkpoint_kwargs by @NanoCode012 in #1412
Add a config not to shuffle merged dataset by @seungduk-yanolja in #1394
Feat: Add sharegpt multirole by @NanoCode012 in #1137
support galore once upstreamed into transformers by @winglian in #1409
fixes for dpo and orpo template loading by @winglian in #1424
HF / FEAT: Optimize HF tags by @younesbelkada in #1425
strip out hacky qlora-fsdp workarounds now that qlora-fsdp fixes are upstreamed by @winglian in #1428
Bootstrap Hosted Axolotl Docs w/Quarto by @hamelsmu in #1429
Orpo fix wip by @winglian in #1433
chore(config): refactor old mistral config by @NanoCode012 in #1435
docs: update link to docs of advanced topics in README.md by @pphuc25 in #1437
fix(dataset): normalize tokenizer config and change hash from tokenizer class to tokenizer path by @NanoCode012 in #1298
make sure to capture non-null defaults from config validation by @winglian in #1415
Turn on sample_packing for Gemma training by @satpalsr in #1438
Fix falcon tokenization step by @pharaouk in #1441
Remove seq_len arg in rotary_emb by @BMPixel in #1443
fix for accelerate env var for auto bf16, add new base image and expand torch_cuda_arch_list support by @winglian in #1413
support layer replication for peft and fix rslora integration by @winglian in #1445
fix layer_replication arg to peft by @winglian in #1446
Jamba by @winglian in #1451
Support loading datasets saved via save_to_disk by @fozziethebeat in #1432
fix some of the edge cases for Jamba by @winglian in #1452
configure nightly docker builds by @winglian in #1454
fix how nightly tag is generated by @winglian in #1456
fix yaml parsing for workflow by @winglian in #1457
Nightlies fix v4 by @winglian in #1458
qwen2_moe support w multipack by @winglian in #1455
make sure to install causal_conv1d in docker by @winglian in #1459
Lisa by @winglian in #1469
feat: add deepspeed 3 with cpuoffload by @NanoCode012 in #1466
reduce verbosity of the special tokens by @winglian in #1472
Reorganize Docs by @hamelsmu in #1468
fix pretraining_ on odd datasets by @mapmeld in #1463
Added pip install ninja to accelerate installation of flash-attn by @melvinebenezer in #1461
Pretrain multipack v2 by @winglian in #1470
Feat: update doc by @NanoCode012 in #1475
refactor utils.data module for line count linter by @winglian in #1476
don't use deepspeed or fsdp when merging loras by @winglian in #1479
add support for cohere chat template by @winglian in #1478
feat: validate sample packing requires flash_attention by @NanoCode012 in #1465
fix: reduce sample_packing FA error to warning by @NanoCode012 in #1484
drop empty token from beginning if tokenizer has no bos_token (in the case of qwen) by @winglian in #1490
Remove validate_quantized_dora by @xzuyn in #1485
ignore issues with calculating # params when printing by @winglian in #1493
add field to sft dataset pydantic for completion support by @winglian in #1497
Fix the wrong adapter in qwen2-moe-qlora example by @maziyarpanahi in #1501
Print versions by @winglian in #1496
Correctly handle splits for datasets.arrow_dataset.Dataset objects by @scottfleming in #1504
WIP: Support table logging for mlflow, too by @DavidFarago in #1506
use locale agnostic seperator to make large nums easier to read by @winglian in #1503
Update SaveAxolotlConfigtoWandBCallback to use artifact instead of save by @tcapelle in #1483
DBRX Model Support by @winglian in #1462
Unsloth gradient checkpointing offload by @winglian in #1528
add docs around pre-processing by @winglian in #1529
Update README.md by @emilytin0206 in #1521
Update Readme to include support for Mixtral8X22B by @Barbarian7676 in #1518
Create mixtral_22.yml by @Barbarian7676 in #1514
feat(doc): Add config example for pad_token by @NanoCode012 in #1535
llama-3 examples by @winglian in #1537
Adding Llama-3 qlora by @monk1337 in #1536
fix broken linting by @winglian in #1541
fix(packages): lock datasets version by @NanoCode012 in #1545
fix(yml): update llama-3 config by @NanoCode012 in #1543
ORPO Trainer replacement by @winglian in #1551
wrap prepared_ds_path in str() to avoid TypeError in fsspec package by @FrankRuis in #1548
Add support for Gemma chat template by @Haoxiang-Wang in #1530
make sure everything stays in the same dtype when using dpo + FSDP by @winglian in #1559
Add ORPO example and e2e test by @tokestermw in #1572
Pose context length ext by @winglian in #1567
chore: clarify microbatch size by @NanoCode012 in #1579
Add debug option for RL dataset preprocessing by @abhinand5 in #1404
ADD: warning hub model by @JohanWork in #1301
FIX: TRL trainer preprocessing step was running in one process by @ali-mosavian in #1583
Pass weakref to model in the SIGINT handler to free up model post train function by @chiragjn in #1581
improve save callbacks by @winglian in #1592
fix for jupyterlab on cloud start by @winglian in #1594
add torch 2.3.0 to builds by @winglian in #1593
docs(config.qmd): add loraplus example by @tpoisonooo in #1577
Gradio configuration parameters by @marijnfs in #1591
Pass deepspeed and fsdp as None explicitly when merging adapters to allow custom device_map by @chiragjn in #1575
feat: exclude mamba blocks for jamba when load8bit by @NanoCode012 in #1578
improve tool handling roles by @winglian in #1587
make sure to save the lora adapter at the end of RL/dpo training by @winglian in #1573
ignore the fsdp_config section too by @winglian in #1606
adding llama3 fastchat conversation monkeypatch by @TJ-Solergibert in #1539
feat: Add LLaMA-3 instruct prompt strategies for fine-tuning by @0-hero in #1553
Llama3 dpo by @winglian in #1610
add dstack section by @deep-diver in #1612
fix attention mask collation by @winglian in #1603
make sure to save on the last step by @winglian in #1615
FIX: max_length and max_prompt_length was not being sent to ORPOTrainer by @ali-mosavian in #1584
Fix total_num_steps by @bofenghuang in #1566
update torch 2.2.1 -> 2.2.2 by @winglian in #1622
update outputs path so that we can mount workspace to /workspace/data by @winglian in #1623
bump versions of deps by @winglian in #1621
fix symlinks for axolotl outputs by @winglian in #1625
fix setting the authorized keys when there are more than one in the env var by @winglian in #1626
install rsync too by @winglian in #1627
cloud image w/o tmux by @winglian in #1628
more fixes to work with runpod + skypilot by @winglian in #1629
fix ray install by @winglian in #1630
add save_only_model option by @jquesnelle in #1634
Unsloth optims for Llama by @winglian in #1609
fixes to save on fractional save_steps by @winglian in #1643
Add KTO support by @benredmond in #1640
Fix llama3 chat_template (extra <|eot_id|> on last turn) by @lhl in #1635
allow report_to for multiple providers by @winglian in #1647
Enable LoRA+ setting for dpo trainer by @thepowerfuldeez in #1646
Update tiny-llama qlora.yml addressing eval packing error by @jaydeepthik in #1638
support for custom messages field in sharegpt by @winglian in #1651
Switch to parallel FFD bin packing algorithm. by @winglian in #1619
document how to use share_strategy="no" by @charlesfrye in #1653
update deps by @winglian in #1663
Fix Google Colab notebook 2024-05 by @maciejgryka in #1662
Generalizing the chat_template prompt strategy by @fozziethebeat in #1660
Fix Lora config error for Llama3 by @oaishi in #1659
fix lint issue that snuck through by @winglian in #1665
Fix: ensure correct handling of val_set_size as float or int by @davidecaroselli in #1655
Correct name of MixtralBlockSparseTop2MLP (L -> l) by @seungduk-yanolja in #1667
Fix README quick start example usage model dirs by @abevoelker in #1668
make sure the CI fails when pytest script fails by @winglian in #1669
handle the system role too for chat templates by @winglian in #1671
revert multipack batch sampler changes by @winglian in #1672
re-enable phi for tests in modal ci by @winglian in #1373
use mixins for orpo and kto configs so they work with axolotl customi zations by @winglian in #1674
set chat_template in datasets config automatically by @winglian in #1664
load explicit splits on datasets by @winglian in #1652
cleanup the deepspeed proxy model at the end of training by @winglian in #1675
need to add back drop_last for sampler by @winglian in #1676
Fix the broken link in README by @saeedesmaili in #1678
re-enable DPO for tests in modal ci by @winglian in #1374
add support for rpo_alpha by @winglian in #1681
Phi-3 conversation format, example training script and perplexity metric by @brianfitzgerald in #1582
Adding Phi-3 model by @monk1337 in #1580
ensure explicit eval_sample_packing to avoid mismatch issues by @winglian in #1692
add qwen2-72b fsdp example by @winglian in #1696
add back packing efficiency estimate so epochs and multi-gpu works properly by @winglian in #1697
Sample packing eval fix by @winglian in #1695
bump deepspeed for fix for grad norm compute putting tensors on different devices by @winglian in #1699
verbose failure message by @winglian in #1694
download model weights on preprocess step by @winglian in #1693
drop length column for issues with eval without packing by @winglian in #1711
add support for multipack for deepseek_v2 by @winglian in #1712
Allow "weight: 0" in messages to mask them by @DavidFarago in #1703
improve Pre-Tokenized Dataset docs by @josharian in #1684
support for gemma2 w sample packing by @winglian in #1718
add support for .env files for env vars by @winglian in #1724
full weights fsdp training seems broken with fsdp_cpu_ram_efficient_loading by @winglian in #1726
sanity check ranges in freeze.py by @josharian in #1686
bump trl and accelerate for latest releases by @winglian in #1730
Fixes the urls after org move by @mhenrichsen in #1734
add tests so CI can catch updates where patches will break with unsloth by @winglian in #1737
typo by @Klingefjord in #1685
add torch 2.3.1 base image by @winglian in #1745
fixes to prevent vram spike when train starts by @winglian in #1742
update to pytorch 2.3.1 by @winglian in #1746
bump xformers to 0.0.27 by @akshaylive in #1740
Changed URL for dataset docs by @dameikle in #1744
Fix eval_sample_packing in llama-3 lora example by @RodriMora in #1716
bump flash attention 2.5.8 -> 2.6.1 by @winglian in #1738
add basic support for the optimi adamw optimizer by @winglian in #1727
update modal package and don't cache pip install by @winglian in #1757
torch compile and cuda alloc improvements by @winglian in #1755
support for llama multipack using updated code/patches by @winglian in #1754
fix num gpu check by @winglian in #1760
fixes to accelerator so that iterable pretraining datasets work by @winglian in #1759
add torch_compile_mode options by @winglian in #1763
re-enable PYTORCH_CUDA_ALLOC_CONF expandable_segments by @winglian in #1765
set the number of dataset processes on the DPO Config rather than the trainer by @winglian in #1762
Unsloth rope by @winglian in #1767
bump transformers and set roundup_power2_divisions for more VRAM improvements, low bit ao optimizers by @winglian in #1769
Fix untrained tokens by @winglian in #1771
Add a chat_template prompt strategy for DPO by @fozziethebeat in #1725
swaps to use newer sample packing for mistral by @winglian in #1773
bump transformers for updated llama 3.1 by @winglian in #1778
bump flash attention to 2.6.2 by @winglian in #1781
fix fsdp loading of models, esp 70b by @winglian in #1780
add support for simpo via cpo trainer by @winglian in #1772
Bump deepspeed 20240727 by @winglian in #1790
various batch of fixes by @winglian in #1785
Add flexible configuration options for chat_template dataset training by @Tostino in #1756
Update README.md by @mhenrichsen in #1792
move to supporting mostly 12.1 w 2.3.1 and add new 12.4 with 2.4.0 by @winglian in #1793
fix dockerfile and base builder by @winglian in #1795
use 12.4.1 instead of 12.4 [skip-ci] by @winglian in #1796
update test and main/nightly builds by @winglian in #1797
publish axolotl images without extras in the tag name by @winglian in #1798
qlora-fsdp ram efficient loading with hf trainer by @winglian in #1791
fix roles to train defaults and make logging less verbose by @winglian in #1801
Fix colab example notebook by @srib in #1805
Fix setting correct repo id when pushing dataset to hub by @chrislee973 in #1657
Update instruct-lora-8b.yml by @monk1337 in #1789
Update conversation.qmd by @penfever in #1788
One cycle lr by @winglian in #1803
remove un-necessary zero-first guard as it's already called in a parent fn by @winglian in #1810
set z3 leaf for deepseek v2 by @winglian in #1809
logging improvements by @winglian in #1808
update peft and transformers by @winglian in #1811
skip no commit to main on ci by @winglian in #1814
fix z3 leaf configuration when not using lists by @winglian in #1817
update tinyllama to use final instead of checkpoints [skip ci] by @winglian in #1820
Attempt to run multigpu in PR CI for now to ensure it works by @winglian in #1815
fix the incorrect max_length for chat template by @chiwanpark in #1818
bump hf dependencies by @winglian in #1823
fix: parse eager_attention from cfg by @NanoCode012 in #1824
fix: parse model_kwargs by @NanoCode012 in #1825
update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model by @winglian in #1821
add validation to prevent 8bit lora finetuning on H100s by @winglian in #1827
optionally save the final FSDP model as a sharded state dict by @winglian in #1828
fix: dont change quant storage dtype in case of fsdp by @xgal in #1837
pretrain: fix with sample_packing=false by @tmm1 in #1841
feat: add jamba chat_template by @xgal in #1843
examples: fix tiny-llama pretrain yml syntax by @tmm1 in #1840
rename jamba example by @xgal in #1846
numpy 2.1.0 was released, but incompatible with numba by @winglian in #1849
ensure that the bias is also in the correct dtype by @winglian in #1848
make the train_on_eos default to turn so all eos tokens are treated the same by @winglian in #1847
fix: prompt phi by @JohanWork in #1845
docs: minor syntax highlight fix by @tmm1 in #1839
ensure that the hftrainer deepspeed config is set before the trainer class is ever init'ed by @winglian in #1850
run nightly ci builds against upstream main by @winglian in #1851
rename nightly test and add badge by @winglian in #1853
most model types now support flash attention 2 regardless of multipack support by @winglian in #1854
add axolotl community license by @winglian in #1862
don't mess with bnb since it needs compiled wheels by @winglian in #1859
Liger Kernel integration by @winglian in #1861
add liger example by @winglian in #1864
add liger to readme by @winglian in #1865
change up import to prevent AttributeError by @winglian in #1863
simplify logic by @winglian in #1856
better handling of llama-3 tool role by @winglian in #1782
Spectrum plugin by @winglian in #1866
update specturm authors by @winglian in #1869
Fix drop_long_seq bug due to truncation in prompt tokenization strategies when using chat_template by @chiwanpark in #1867
clear cuda cache to help with memory leak/creep by @winglian in #1858
Add Liger Kernal support for Qwen2 by @chiwanpark in #1871
Sample pack trust remote code v2 by @winglian in #1873
monkey-patch transformers to simplify monkey-patching modeling code by @tmm1 in #1877
fix liger plugin load issues by @tmm1 in #1876
deepseekv2 liger support by @tmm1 in #1878
Add liger kernel to features section by @ByronHsu in #1881
pin liger-kernel to latest 0.2.1 by @winglian in #1882
Update supported models for Liger Kernel by @DocShotgun in #1875
run pytests with varied pytorch versions too by @winglian in #1883
Fix RMSNorm monkey patch for Gemma models by @chiwanpark in #1886
add e2e smoke tests for llama liger integration by @winglian in #1884
support for auto_find_batch_size when packing by @winglian in #1885
fix optimizer + fsdp combination in example [skip ci] by @winglian in #1893
Docs for AMD-based HPC systems by @tijmen in #1891
lint fix and update gha regex by @winglian in #1899
Fix documentation for pre-tokenized dataset by @alpayariyak in #1894
fix zero3 integration by @winglian in #1897
bump accelerate to 0.34.2 by @winglian in #1901
remove dynamic module loader monkeypatch as this was fixed upstream by @winglian in #1914
Trigger the original tokenization behavior when no advanced turn settings are provided by @fozziethebeat in #1915
validation fixes 20240923 by @winglian in #1925
update upstream deps versions and replace lora+ by @winglian in #1928
fix for empty lora+ lr embedding by @winglian in #1932
bump transformers to 4.45.1 by @winglian in #1936
Multimodal Vision Llama - rudimentary support by @winglian in #1940
add 2.4.1 to base models by @winglian in #1953
upgrade pytorch from 2.4.0 => 2.4.1 by @winglian in #1950
fix(log): update perplexity log to clarify from eval split by @NanoCode012 in #1952
Fix type annotations in relora.py by @bxptr in #1941
Comet integration by @Lothiraldan in #1939
Fixing/Adding Mistral Templates by @pandora-s-git in #1927
lm_eval harness post train by @winglian in #1926
Axo logo new by @winglian in #1956
Add Support for revision Dataset Parameter to specify reading from Huggingface Dataset Revision by @thomascleberg in #1912
Add MLFlow run name option in config by @awhazell in #1961
add warning that sharegpt will be deprecated by @winglian in #1957
Handle image input as string paths for MMLMs by @afrizalhasbi in #1958
update hf deps by @winglian in #1964
only install torchao for torch versions >= 2.4.0 by @winglian in #1963
Fixing Validation - Mistral Templates by @pandora-s-git in #1962
fix(doc): update eval causal lm metrics doc to add perplexity by @NanoCode012 in #1951
Add support for qwen 2.5 chat template by @amazingvince in #1934
wip add new proposed message structure by @winglian in #1904
Reward model by @winglian in #1879
add ds zero3 to multigpu biweekly tests by @winglian in #1900
upgrade accelerate to 1.0.1 by @winglian in #1969
examples: Fix config llama3 by @JohanWork in #1833
also debug if other debug args are set by @winglian in #1977
memoize dataset length for eval sample packing by @bursteratom in #1974
add pytorch 2.5.0 base images by @winglian in #1979
first pass at pytorch 2.5.0 support by @winglian in #1982
fix builds so pytorch version isn't clobbered by @winglian in #1986
use torch 2.4.1 images as latest now that torch 2.5.0 is out by @winglian in #1987
Log checkpoints as mlflow artifacts by @awhazell in #1976
revert image tagged as main-latest by @winglian in #1990
Refactor func load_model to class ModelLoader by @MengqingCao in #1909
Fix: Gradient Accumulation issue by @NanoCode012 in #1980
fix zero3 by @winglian in #1994
add option for resizing embeddings when adding new tokens by @winglian in #2000
Feat: Add support for tokenizer’s or custom jinja chat_template by @NanoCode012 in #1970
Hardware requirements by @OliverKunc in #1997
feat: update yml chat_template to specify dataset field by @NanoCode012 in #2001
remove skipped test by @winglian in #2002
feat: add Exaone3 chat_template by @shing100 in #1995
Fix get_chat_template call for trainer builder by @chiragjn in #2003
Fix: modelloader handling of model_kwargs load_in*bit by @NanoCode012 in #1999
Add plugin manager's callback hooks to training flow by @chiragjn in #2006
add retries for load datasets requests failures by @winglian in #2007
Base 2 5 1 by @winglian in #2010
only run the remainder of the gpu test suite if one case passes first by @winglian in #2009
upgrade liger to 0.4.0 by @winglian in #1973
janky workaround to install FA2 on torch 2.5.1 base image since it takes forever to build by @winglian in #2022
upgrade pytorch to 2.5.1 by @winglian in #2024
Add weighted optimisation support for trl DPO trainer integration by @bursteratom in #2016
remove fastchat and sharegpt by @winglian in #2021
increment version to 0.5.0 for next release by @winglian in #2025
make publish to pypi manually dispatchable as a workflow by @winglian in #2026
remove unused direct dependency on fused dense lib by @winglian in #2027

New Contributors

@7flash made their first contribution in #1210
@DreamGenX made their first contribution in #1214
@filippo82 made their first contribution in #1209
@xhedit made their first contribution in #1231
@chiragjn made their first contribution in #1257
@PhilipMay made their first contribution in #1255
@zacbrannelly made their first contribution in #1262
@LeonardoEmili made their first contribution in #1274
@dameikle made their first contribution in #1291
@jaredpalmer made their first contribution in #1290
@monk1337 made their first contribution in #1312
@ncoop57 made their first contribution in #1327
@nopperl made their first contribution in #1248
@rasbt made their first contribution in #1362
@nirogu made their first contribution in #1364
@dandm1 made their first contribution in #1385
@brianfitzgerald made their first contribution in #1365
@bentleylong made their first contribution in #1418
@pphuc25 made their first contribution in #1437
@satpalsr made their first contribution in #1438
@pharaouk made their first contribution in #1441
@BMPixel made their first contribution in #1443
@fozziethebeat made their first contribution in #1432
@mapmeld made their first contribution in #1463
@melvinebenezer made their first contribution in #1461
@maziyarpanahi made their first contribution in #1501
@scottfleming made their first contribution in #1504
@DavidFarago made their first contribution in #1506
@tcapelle made their first contribution in #1483
@emilytin0206 made their first contribution in #1521
@Barbarian7676 made their first contribution in #1518
@FrankRuis made their first contribution in #1548
@abhinand5 made their first contribution in #1404
@ali-mosavian made their first contribution in #1583
@tpoisonooo made their first contribution in #1577
@marijnfs made their first contribution in #1591
@TJ-Solergibert made their first contribution in #1539
@0-hero made their first contribution in #1553
@deep-diver made their first contribution in #1612
@jquesnelle made their first contribution in #1634
@benredmond made their first contribution in #1640
@lhl made their first contribution in #1635
@thepowerfuldeez made their first contribution in #1646
@jaydeepthik made their first contribution in #1638
@charlesfrye made their first contribution in #1653
@maciejgryka made their first contribution in #1662
@oaishi made their first contribution in #1659
@davidecaroselli made their first contribution in #1655
@abevoelker made their first contribution in #1668
@saeedesmaili made their first contribution in #1678
@josharian made their first contribution in #1684
@Klingefjord made their first contribution in #1685
@akshaylive made their first contribution in #1740
@RodriMora made their first contribution in #1716
@Tostino made their first contribution in #1756
@srib made their first contribution in #1805
@chrislee973 made their first contribution in #1657
@penfever made their first contribution in #1788
@chiwanpark made their first contribution in #1818
@xgal made their first contribution in #1837
@ByronHsu made their first contribution in #1881
@DocShotgun made their first contribution in #1875
@tijmen made their first contribution in #1891
@bxptr made their first contribution in #1941
@Lothiraldan made their first contribution in #1939
@pandora-s-git made their first contribution in #1927
@thomascleberg made their first contribution in #1912
@awhazell made their first contribution in #1961
@afrizalhasbi made their first contribution in #1958
@amazingvince made their first contribution in #1934
@bursteratom made their first contribution in #1974
@MengqingCao made their first contribution in #1909
@OliverKunc made their first contribution in #1997
@shing100 made their first contribution in #1995

Full Changelog: v0.4.0...v0.5.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.5.0

What's Changed

New Contributors

Contributors