From 0e15e8c2476b34d2b8a1b23f217c5eee282e8aa6 Mon Sep 17 00:00:00 2001
From: achew010 <165894159+achew010@users.noreply.github.com>
Date: Fri, 2 Aug 2024 11:09:10 +0800
Subject: [PATCH] Additional README Changes for PR #57 (#61)

* edits to readme

Signed-off-by: 1000960000 user <aaron.chew1@ibm.com>

* Apply suggestions from code review

Co-authored-by: Yu Chin Fabian Lim <fabianlim@users.noreply.github.com>
Signed-off-by: 1000960000 user <aaron.chew1@ibm.com>

* more readme changes

Signed-off-by: 1000960000 user <aaron.chew1@ibm.com>

---------

Signed-off-by: 1000960000 user <aaron.chew1@ibm.com>
Co-authored-by: Yu Chin Fabian Lim <fabianlim@users.noreply.github.com>
---
 README.md                      | 6 ++++--
 plugins/instruct-lab/README.md | 8 ++++----
 2 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/README.md b/README.md
index a09ab381..f79026f4 100644
--- a/README.md
+++ b/README.md
@@ -10,6 +10,7 @@ The fms-acceleration framework includes accelerators for Full and Parameter Effi
  - Bits-and-Bytes (BNB) quantised LoRA : QLoRA acceleration
  - AutoGPTQ quantised LoRA : GPTQ-LoRA acceleration
  - Full Fine Tuning acceleration (coming soon)
+ - Padding-Free Attention
 
 Our tests show a significant increase in training token throughput using this fms-acceleration framework.
 
@@ -29,9 +30,10 @@ For example:
 
 Plugin | Description | Depends | License | Status
 --|--|--|--|--
-[framework](./plugins/framework/README.md) | This acceleration framework for integration with huggingface trainers | | | Beta
-[accelerated-peft](./plugins/accelerated-peft/README.md) | For PEFT-training, e.g., 4bit QLoRA. | Huggingface<br>AutoGPTQ | Apache 2.0<br>MIT | Beta
+[framework](./plugins/framework/README.md) | This acceleration framework for integration with huggingface trainers | | | Alpha
+[accelerated-peft](./plugins/accelerated-peft/README.md) | For PEFT-training, e.g., 4bit QLoRA. | Huggingface<br>AutoGPTQ | Apache 2.0<br>MIT | Alpha
 [fused-op-and-kernels](./plugins/fused-ops-and-kernels/README.md)  | Fused LoRA and triton kernels (e.g., fast cross-entropy, rms, rope) | -- | Apache 2.0 [(contains extracted code)](./plugins/fused-ops-and-kernels/README.md#code-extracted-from-unsloth)| Beta
+[instruct-lab](./plugins/instruct-lab/README.md)  | Padding-Free Flash Attention Computation | flash-attn | Apache 2.0 | Beta
  MOE-training-acceleration  | [MegaBlocks](https://github.com/databricks/megablocks) inspired triton Kernels and acclerations for Mixture-of-Expert models |  | Apache 2.0 | Coming Soon
 
 ## Usage with FMS HF Tuning
diff --git a/plugins/instruct-lab/README.md b/plugins/instruct-lab/README.md
index ca1ea246..d76f327e 100644
--- a/plugins/instruct-lab/README.md
+++ b/plugins/instruct-lab/README.md
@@ -9,12 +9,12 @@ This library contains plugins to accelerate finetuning with the following optimi
 
 Plugin | Description | Depends | Loading | Augmentation | Callbacks
 --|--|--|--|--|--
-[padding_free](./src/fms_acceleration_ilab/framework_plugin_padding_free.py) | Padding-Free Flash Attention Computation | flash_attn | ✅ | ✅
+[padding_free](./src/fms_acceleration_ilab/framework_plugin_padding_free.py) | Padding-Free Flash Attention Computation | flash_attn | | ✅ | ✅
 
 
-## Native Transformers Support from V4.44.0
-Transformers natively supports padding-free from v4.44.0. The padding-free plugin will use the transformers library if compatible, 
-otherwise if `transformers < V4.44.0` the plugin will use an internal implementation instead.
+## Native Transformers Support from v4.44.0
+Transformers natively supports padding-free from v4.44.0 [see here](https://github.com/huggingface/transformers/pull/31629). The padding-free plugin will use the transformers library if compatible, 
+otherwise if `transformers < v4.44.0` the plugin will use an internal implementation instead.
 
 ## Known Issues