How to use 3090 to train 16k model? #4

aresa7796 · 2023-07-01T09:43:34Z

I have 80k supervised data, but only 3090 graphics card, how to use 3090 to train 16k model?

musabgultekin · 2023-07-01T18:14:22Z

While technically it can work, its probably gonna take too much VRAM and will be horribly slow.
Checkout:
https://huggingface.co/docs/transformers/perf_train_gpu_one

DachengLi1 · 2023-07-01T20:15:18Z

@aresa7796 The current code is assuming 8xA100 40GB. I think 3090 should be able to run after applying some system techniques. I think if we can support training for 3090 GPUs (or non-A100), it will be really amazing. We just didn't get a hand on it now, can you try and share some of your feedback? Here are the steps I think should work:

(1) Use deepspeed zero offloading as shared by @musabgultekin ;
(2) Change the monkey patch from flash attention to xformer by calling this function. Xformer is a memory efficient attention which supports non-a100 GPUs. I already have the monkey patch implemented.:P
(3) Change bf16 (delete the tf32 argument as well) to fp16 in the training command.

Let me know if this works for you!

lucasjinreal · 2023-07-10T08:34:25Z

Am also wondering for this. For instance, using v100 which might not possible feed 2048 at all, if using 1024 and applying condensing rotary embeddings in a ratio of 16, will work? How good?

DachengLi1 · 2023-07-10T15:20:51Z

@lucasjinreal condensing rotary does not reduce memory, it only makes model good quality with 16K.

lucasjinreal · 2023-07-11T02:25:36Z

@DachengLi1 what I menas, v100 can not feed too much minimal len like 2048 for most cases.

DachengLi1 · 2023-07-11T02:38:41Z

@lucasjinreal i see thanks! Condensing will be great, I believe it should work from 1024 to 8192 say. But the thing is you will still need to fine-tune on the longer length a bit after condensing - but I guess you can resort to A100 for that adapting part?

lucasjinreal · 2023-07-11T02:51:51Z

@DachengLi1 hi, wanna discuss a bit more, have u tried compare with your method with ALibi on Extrapolation ability?

DachengLi1 added the good first issue Good for newcomers label Jul 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use 3090 to train 16k model? #4

How to use 3090 to train 16k model? #4

aresa7796 commented Jul 1, 2023

musabgultekin commented Jul 1, 2023

DachengLi1 commented Jul 1, 2023

lucasjinreal commented Jul 10, 2023

DachengLi1 commented Jul 10, 2023

lucasjinreal commented Jul 11, 2023

DachengLi1 commented Jul 11, 2023

lucasjinreal commented Jul 11, 2023

How to use 3090 to train 16k model? #4

How to use 3090 to train 16k model? #4

Comments

aresa7796 commented Jul 1, 2023

musabgultekin commented Jul 1, 2023

DachengLi1 commented Jul 1, 2023

lucasjinreal commented Jul 10, 2023

DachengLi1 commented Jul 10, 2023

lucasjinreal commented Jul 11, 2023

DachengLi1 commented Jul 11, 2023

lucasjinreal commented Jul 11, 2023