Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Ascend NPU support #1758

Merged
merged 1 commit into from
Nov 21, 2024
Merged

Conversation

MengqingCao
Copy link
Contributor

Description

Enable Ascend NPU backend for finetuning, inferencing and gradio webui.
Main changes:

  • modify the hard code related to cuda and abstract to device
  • add NPU related configure constraints

Motivation and Context

There are two benefits:

  1. Abstracting device make sense for more backends to plugin, and Ascend NPU is a good example.
  2. Allow Ascend NPU users to use axolotl for LLM finetuning, inferencing

Example

# preprocess datasets - optional but recommended
ASCEND_RT_VISIBLE_DEVICES=0 python -m axolotl.cli.preprocess examples/openllama-3b/lora.yml

# finetune lora
accelerate launch -m axolotl.cli.train examples/openllama-3b/lora.yml

# inference
accelerate launch -m axolotl.cli.inference examples/openllama-3b/lora.yml \
    --lora_model_dir="./lora-out"

# gradio
accelerate launch -m axolotl.cli.inference examples/openllama-3b/lora.yml \
    --lora_model_dir="./lora-out" --gradio

Screenshots

NPU supported CLI inference

axolotl_cli_chat

NPU supported Gradio webui inference

axolotl_cli_chat_gradio

Config

lora.yaml

base_model: openlm-research/open_llama_3b_v2
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
load_in_8bit: true
load_in_4bit: false
strict: false
push_dataset_to_hub:
datasets:
  - path: teknium/GPT4-LLM-Cleaned
    type: alpaca
dataset_prepared_path:
val_set_size: 0.02
adapter: lora
lora_model_dir:
sequence_len: 1024
sample_packing: true
lora_r: 8
lora_alpha: 16
lora_dropout: 0.0
lora_target_modules:
  - gate_proj
  - down_proj
  - up_proj
  - q_proj
  - v_proj
  - k_proj
  - o_proj
lora_fan_in_fan_out:
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
output_dir: ./outputs/lora-out
gradient_accumulation_steps: 1
micro_batch_size: 2
num_epochs: 4
optimizer: adamw_torch
torchdistx_path:
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
float32: true
bf16: false
fp16: false
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank: 0
logging_steps: 1
xformers_attention:
flash_attention: false
gptq_groupsize:
s2_attention:
gptq_model_v1:
warmup_steps: 20
evals_per_epoch: 4
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.1
fsdp:
fsdp_config:
special_tokens:
  bos_token: "<s>"
  eos_token: "</s>"
  unk_token: "<unk>"

@MengqingCao
Copy link
Contributor Author

Good day! @winglian I tried to create a class ModelKwargs, but with the modification of model_kwargs, there are many other operations such as patching, creating models, etc. And their judgment conditions seem inseparable.

Thus, finnaly I refactor the whole load_model func into a class ModelLoader. All the operations in original load_model func have been placed in several member functions and followed the original logical order.

This brings a lot changes, while making the model loading pipeline more clearly. Moreover, the changes of member variables such as model_kwargs are more obvious. But I am not sure whether the current function naming and pipeline splitting method is completely reasonable.

Please review the latest code and give me some suggestions. Thanks a lot!

@MengqingCao
Copy link
Contributor Author

Hi, @winglian Could you help review the latest code in this PR? Let me know if the breaks brings by refactoring of the original code is not you want.

Just FYI, I accidentally deleted the original commit, and it cann be found in this branch.

@Yikun
Copy link

Yikun commented Sep 12, 2024

Looks like it includes two parts in this commits Model Loaders reafactor and Ascend NPU support. Maybe we could spilit it as two PRs, the first one is Model Loaders reafactor, then we will rebase the Ascend NPU support PR after it.

Or do you have any other suggestions? @winglian Please feel free let us know if you have any more concern. Thanks!

@MengqingCao
Copy link
Contributor Author

The refactoring of ModelLoder is split into #1909, and the support of Ascend NPU will be commited after #1909 . Hope this will make it easier to review and test. cc @winglian

@MengqingCao
Copy link
Contributor Author

@winglian Hi, Ascend NPU support is done on the latest branch, plz review it, thanks!

src/axolotl/utils/bench.py Outdated Show resolved Hide resolved
src/axolotl/utils/config/__init__.py Outdated Show resolved Hide resolved
src/axolotl/utils/config/__init__.py Outdated Show resolved Hide resolved
src/axolotl/utils/config/models/input/v0_4_1/__init__.py Outdated Show resolved Hide resolved
src/axolotl/utils/config/models/input/v0_4_1/__init__.py Outdated Show resolved Hide resolved
src/axolotl/utils/config/models/input/v0_4_1/__init__.py Outdated Show resolved Hide resolved
src/axolotl/utils/distributed.py Outdated Show resolved Hide resolved
src/axolotl/utils/distributed.py Show resolved Hide resolved
src/axolotl/utils/distributed.py Outdated Show resolved Hide resolved
src/axolotl/utils/models.py Outdated Show resolved Hide resolved
@MengqingCao MengqingCao reopened this Nov 19, 2024
@MengqingCao
Copy link
Contributor Author

@NanoCode012 I just pushed the lasted code, but I accidentally closed PR before that, and now CI is stopped. What should I do now?
image

@NanoCode012
Copy link
Collaborator

@MengqingCao , no worries. I restarted them. I checked the PR, and all my points have been resolved. Thank you for addressing them.

I plan to let the multi-gpu CI to run as well to ensure no issues there.

@winglian winglian merged commit 838b74d into axolotl-ai-cloud:main Nov 21, 2024
22 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants