Releases · drisspg/transformer_nuggets · GitHub

07 Aug 21:00

drisspg

Woohoooo Latest

Latest

What's Changed

Flash Attention V2 w/ arbitrary attention bias by @drisspg in #1
Update torch.cuda.memory API calls for memory profiling by @janeyx99 in #4
updated by @drisspg in #6
Simple Fp8 delayed scaling kernel by @drisspg in #7
use ufmt on prs by @drisspg in #8
Add Llama Training scripts by @drisspg in #10
Pre commit by @drisspg in #11
add_nan_inf_detect_mode by @drisspg in #12
enable qlora finetuning on single GPU by @weifengpy in #13
added qlora + fsdp by @weifengpy in #14
alll the flake8s by @drisspg in #16
fix_tests by @drisspg in #17
Make Nf4 a NF4 Tensor subclass by @drisspg in #18
enable per-parameter-sharding FSDP + qlora by @weifengpy in #15
Add op table for torch dispatch by @drisspg in #22
fix qlora mlp bug and add script for getting memory traces by @drisspg in #23
Add ShapeLog mode to utilities by @drisspg in #25
Remove dtype restriction and test by @drisspg in #26
Block mask by @drisspg in #3
Dynamic scaling triton kernel by @drisspg in #28
Allow for score mod and change of base perf trick by @drisspg in #29
Updates to ruff by @drisspg in #32
import nanif to init by @drisspg in #33
add doctsring to profiler by @drisspg in #34
Add some utils for working with flex by @drisspg in #35

New Contributors

@janeyx99 made their first contribution in #4
@weifengpy made their first contribution in #13

Full Changelog: https://github.com/drisspg/transformer_nuggets/commits/v0.0.1

Contributors

janeyx99, drisspg, and weifengpy

Assets 2