Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass deepspeed and fsdp as None explicitly when merging adapters to allow custom device_map #1575

Merged
merged 1 commit into from
May 7, 2024

Conversation

chiragjn
Copy link
Contributor

@chiragjn chiragjn commented Apr 29, 2024

I wanted to merge my adapters using multiple gpus in a not so large main memory scenarios, so I tried passing device_map=auto

but if we have deepspeed or fsdp set in config yaml, then the axolotl sets ACCELERATE_USE_* in env
https://github.com/OpenAccess-AI-Collective/axolotl/blob/5294653a2d353066600cbc66bb06f7c63c87147b/src/axolotl/utils/trainer.py#L428-L432

which later down resets the device_map value back to None
https://github.com/OpenAccess-AI-Collective/axolotl/blob/5294653a2d353066600cbc66bb06f7c63c87147b/src/axolotl/utils/config/__init__.py#L46-L48

kinda self sabotaging - forcing me to merge adapters on cpu

I am not adding a default device_map, so people can choose what they want - just making sure it will be respected

@chiragjn chiragjn changed the title Pass deepspeed and fsdp as None explicitly when merging adapters to allow custom device_map Pass deepspeed and fsdp as None explicitly when merging adapters to allow custom device_map Apr 29, 2024
@chiragjn chiragjn changed the title Pass deepspeed and fsdp as None explicitly when merging adapters to allow custom device_map Pass deepspeed and fsdp as None explicitly when merging adapters to allow custom device_map Apr 29, 2024
@NanoCode012
Copy link
Collaborator

Hey, this looks okay to me. Were you able to confirm this merges successfully on GPU? Did you merge on single gpu or multiple?

@chiragjn
Copy link
Contributor Author

chiragjn commented Apr 30, 2024

Yes I was able to confirm and merge both on single and multiple gpus if I pass device_map=auto / --device_map auto separately in **kwargs


One caveat, if you end up running merging in the same script as training and it is launched with accelerate launch --use_deepspeed ... then ACCELERATE_USE_DEEPSPEED will be set by accelerate lib itself instead of axolotl and the fix in the PR will not work
So make sure to call the merging script separately without accelerate launch or forcefully unset those env variables

@NanoCode012
Copy link
Collaborator

Was merging on gpus much faster than RAM? On this note, how about single vs multiple?

I wonder if the Readme should also be updated to include info on passing device map.

@chiragjn
Copy link
Contributor Author

chiragjn commented May 1, 2024

Was merging on gpus much faster than RAM?

Yes much much faster, but more importantly for me the constraint was limited main memory.
imagine g5.xlarge on AWS, A10 24 GB GPU, 16 GB main memory -> can't comfortably fit Lllama 3 8B @ bf16 in main memory but can on the gpu

On this note, how about single vs multiple?

Both work well, but the main point the number of gpus required will be different based on how many gpus the model can fit in half precision.
Now people can pass their own custom placement via device_map

@chiragjn
Copy link
Contributor Author

chiragjn commented May 7, 2024

@NanoCode012 @winglian bump on this

@NanoCode012 NanoCode012 merged commit 9e1480e into axolotl-ai-cloud:main May 7, 2024
7 checks passed
@NanoCode012
Copy link
Collaborator

Thanks! Sorry I missed this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants