From 0e15e8c2476b34d2b8a1b23f217c5eee282e8aa6 Mon Sep 17 00:00:00 2001 From: achew010 <165894159+achew010@users.noreply.github.com> Date: Fri, 2 Aug 2024 11:09:10 +0800 Subject: [PATCH] Additional README Changes for PR #57 (#61) * edits to readme Signed-off-by: 1000960000 user * Apply suggestions from code review Co-authored-by: Yu Chin Fabian Lim Signed-off-by: 1000960000 user * more readme changes Signed-off-by: 1000960000 user --------- Signed-off-by: 1000960000 user Co-authored-by: Yu Chin Fabian Lim --- README.md | 6 ++++-- plugins/instruct-lab/README.md | 8 ++++---- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index a09ab381..f79026f4 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,7 @@ The fms-acceleration framework includes accelerators for Full and Parameter Effi - Bits-and-Bytes (BNB) quantised LoRA : QLoRA acceleration - AutoGPTQ quantised LoRA : GPTQ-LoRA acceleration - Full Fine Tuning acceleration (coming soon) + - Padding-Free Attention Our tests show a significant increase in training token throughput using this fms-acceleration framework. @@ -29,9 +30,10 @@ For example: Plugin | Description | Depends | License | Status --|--|--|--|-- -[framework](./plugins/framework/README.md) | This acceleration framework for integration with huggingface trainers | | | Beta -[accelerated-peft](./plugins/accelerated-peft/README.md) | For PEFT-training, e.g., 4bit QLoRA. | Huggingface
AutoGPTQ | Apache 2.0
MIT | Beta +[framework](./plugins/framework/README.md) | This acceleration framework for integration with huggingface trainers | | | Alpha +[accelerated-peft](./plugins/accelerated-peft/README.md) | For PEFT-training, e.g., 4bit QLoRA. | Huggingface
AutoGPTQ | Apache 2.0
MIT | Alpha [fused-op-and-kernels](./plugins/fused-ops-and-kernels/README.md) | Fused LoRA and triton kernels (e.g., fast cross-entropy, rms, rope) | -- | Apache 2.0 [(contains extracted code)](./plugins/fused-ops-and-kernels/README.md#code-extracted-from-unsloth)| Beta +[instruct-lab](./plugins/instruct-lab/README.md) | Padding-Free Flash Attention Computation | flash-attn | Apache 2.0 | Beta MOE-training-acceleration | [MegaBlocks](https://github.com/databricks/megablocks) inspired triton Kernels and acclerations for Mixture-of-Expert models | | Apache 2.0 | Coming Soon ## Usage with FMS HF Tuning diff --git a/plugins/instruct-lab/README.md b/plugins/instruct-lab/README.md index ca1ea246..d76f327e 100644 --- a/plugins/instruct-lab/README.md +++ b/plugins/instruct-lab/README.md @@ -9,12 +9,12 @@ This library contains plugins to accelerate finetuning with the following optimi Plugin | Description | Depends | Loading | Augmentation | Callbacks --|--|--|--|--|-- -[padding_free](./src/fms_acceleration_ilab/framework_plugin_padding_free.py) | Padding-Free Flash Attention Computation | flash_attn | ✅ | ✅ +[padding_free](./src/fms_acceleration_ilab/framework_plugin_padding_free.py) | Padding-Free Flash Attention Computation | flash_attn | | ✅ | ✅ -## Native Transformers Support from V4.44.0 -Transformers natively supports padding-free from v4.44.0. The padding-free plugin will use the transformers library if compatible, -otherwise if `transformers < V4.44.0` the plugin will use an internal implementation instead. +## Native Transformers Support from v4.44.0 +Transformers natively supports padding-free from v4.44.0 [see here](https://github.com/huggingface/transformers/pull/31629). The padding-free plugin will use the transformers library if compatible, +otherwise if `transformers < v4.44.0` the plugin will use an internal implementation instead. ## Known Issues