Fix Mistakes with FA Padding Free #62

fabianlim · 2024-08-01T16:09:36Z

The PR #57 had a couple of mistakes that needed to be fix, This is because of two things

the flash_attention_forward was moved out earlier
the actual padding free fix was done later, and is still not yet relaased (probably 4.44)

The strategy now is simple:

if we can import DataCollatorWithFlattening successfully, means the padding free fix is done
if we can import _flash_attention_forward, means the function has been seperated out

Augmentation

If padding free fix is done, then nothing to do, otherwise some patching is required
Patch the static or method _flash_attention_forward depending on version.

Some redesign is done, since _flash_attention_forward couild either be a method or function, then thje previous method to bind _flash_attention_forward by closure doesnt hold. So we need to install a method on the backbone to intercept the position ids, then modify _flash_attention_forward to be able to access the position ids, and bind them

Bad news is that once this is done properly, the speed dropped. However, we verified that the speed is consistent when we upgrade transformers to latest main which means our implementation is correct

{'loss': 0.8762, 'grad_norm': 69.0, 'learning_rate': 2e-05, 'epoch': 0.0}
{'loss': 0.9877, 'grad_norm': 29.40625, 'learning_rate': 1.7777777777777777e-05, 'epoch': 0.0}
{'loss': 1.0518, 'grad_norm': 38.90625, 'learning_rate': 1.555555555555556e-05, 'epoch': 0.0}
{'loss': 1.1429, 'grad_norm': 85.625, 'learning_rate': 1.3333333333333333e-05, 'epoch': 0.0}
{'loss': 1.0771, 'grad_norm': 22.890625, 'learning_rate': 1.1111111111111113e-05, 'epoch': 0.0}
{'loss': 0.9842, 'grad_norm': 33.5, 'learning_rate': 8.888888888888888e-06, 'epoch': 0.0}
{'loss': 2.4449, 'grad_norm': 19.9375, 'learning_rate': 6.666666666666667e-06, 'epoch': 0.01}
{'loss': 0.9717, 'grad_norm': 35.5625, 'learning_rate': 4.444444444444444e-06, 'epoch': 0.01}
{'loss': 0.8958, 'grad_norm': 25.203125, 'learning_rate': 2.222222222222222e-06, 'epoch': 0.01}
{'loss': 0.9145, 'grad_norm': 18.296875, 'learning_rate': 0.0, 'epoch': 0.01}
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:07<00:00,  1.41it/s]

Training completed. Do not forget to share your model on huggingface.co/models =)


{'train_runtime': 67.8947, 'train_samples_per_second': 5.891, 'train_steps_per_second': 1.473, 'train_tokens_per_second': 2029.615, 'train_loss': 1.1346958923339843, 'init_mem_cpu_alloc_delta': -14387679232, 'init_mem_gpu_alloc_delta': 14483611648, 'init_mem_cpu_peaked_delta': 14483382272, 'init_mem_gpu_peaked_delta': 0, 'train_mem_cpu_alloc_delta': 691978240, 'train_mem_gpu_alloc_delta': 28984245248, 'train_mem_cpu_peaked_delta': 0, 'train_mem_gpu_peaked_delta': 28990169600, 'before_init_mem_cpu': 15096680448, 'before_init_mem_gpu': 0, 'epoch': 0.01}

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

fabianlim · 2024-08-01T16:10:43Z

Potentially, this can be improved by having the bakbone function compute the cumsum once for all layers.

fabianlim added 3 commits August 2, 2024 00:00

fix FA2 patching

f7443df

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

clean up

a19b115

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

fmt

f5d1668

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

fabianlim requested a review from achew010 August 1, 2024 16:10

fabianlim mentioned this pull request Aug 1, 2024

Additional README Changes for PR #57 #61

Merged

fabianlim merged commit d510923 into main Aug 2, 2024
6 checks passed

fabianlim deleted the fixup/fa branch August 2, 2024 02:35

achew010 mentioned this pull request Aug 2, 2024

feat: Add DataClass Arguments to Activate Padding-Free and MultiPack Plugin and FastKernels foundation-model-stack/fms-hf-tuning#280

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Mistakes with FA Padding Free #62

Fix Mistakes with FA Padding Free #62

fabianlim commented Aug 1, 2024 •

edited

Loading

fabianlim commented Aug 1, 2024

Fix Mistakes with FA Padding Free #62

Fix Mistakes with FA Padding Free #62

Conversation

fabianlim commented Aug 1, 2024 • edited Loading

fabianlim commented Aug 1, 2024

fabianlim commented Aug 1, 2024 •

edited

Loading