Releases: drisspg/transformer_nuggets
Releases · drisspg/transformer_nuggets
Woohoooo
What's Changed
- Flash Attention V2 w/ arbitrary attention bias by @drisspg in #1
- Update torch.cuda.memory API calls for memory profiling by @janeyx99 in #4
- updated by @drisspg in #6
- Simple Fp8 delayed scaling kernel by @drisspg in #7
- use ufmt on prs by @drisspg in #8
- Add Llama Training scripts by @drisspg in #10
- Pre commit by @drisspg in #11
- add_nan_inf_detect_mode by @drisspg in #12
- enable qlora finetuning on single GPU by @weifengpy in #13
- added qlora + fsdp by @weifengpy in #14
- alll the flake8s by @drisspg in #16
- fix_tests by @drisspg in #17
- Make Nf4 a NF4 Tensor subclass by @drisspg in #18
- enable per-parameter-sharding FSDP + qlora by @weifengpy in #15
- Add op table for torch dispatch by @drisspg in #22
- fix qlora mlp bug and add script for getting memory traces by @drisspg in #23
- Add ShapeLog mode to utilities by @drisspg in #25
- Remove dtype restriction and test by @drisspg in #26
- Block mask by @drisspg in #3
- Dynamic scaling triton kernel by @drisspg in #28
- Allow for score mod and change of base perf trick by @drisspg in #29
- Updates to ruff by @drisspg in #32
- import nanif to init by @drisspg in #33
- add doctsring to profiler by @drisspg in #34
- Add some utils for working with flex by @drisspg in #35
New Contributors
- @janeyx99 made their first contribution in #4
- @weifengpy made their first contribution in #13
Full Changelog: https://github.com/drisspg/transformer_nuggets/commits/v0.0.1