Enable Ascend NPU support #1758

MengqingCao · 2024-07-16T08:32:21Z

Description

Enable Ascend NPU backend for finetuning, inferencing and gradio webui.
Main changes:

modify the hard code related to cuda and abstract to device
add NPU related configure constraints

Motivation and Context

There are two benefits:

Abstracting device make sense for more backends to plugin, and Ascend NPU is a good example.
Allow Ascend NPU users to use axolotl for LLM finetuning, inferencing

Example

# preprocess datasets - optional but recommended
ASCEND_RT_VISIBLE_DEVICES=0 python -m axolotl.cli.preprocess examples/openllama-3b/lora.yml

# finetune lora
accelerate launch -m axolotl.cli.train examples/openllama-3b/lora.yml

# inference
accelerate launch -m axolotl.cli.inference examples/openllama-3b/lora.yml \
    --lora_model_dir="./lora-out"

# gradio
accelerate launch -m axolotl.cli.inference examples/openllama-3b/lora.yml \
    --lora_model_dir="./lora-out" --gradio

Screenshots

NPU supported CLI inference

NPU supported Gradio webui inference

Config

lora.yaml

base_model: openlm-research/open_llama_3b_v2
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
load_in_8bit: true
load_in_4bit: false
strict: false
push_dataset_to_hub:
datasets:
  - path: teknium/GPT4-LLM-Cleaned
    type: alpaca
dataset_prepared_path:
val_set_size: 0.02
adapter: lora
lora_model_dir:
sequence_len: 1024
sample_packing: true
lora_r: 8
lora_alpha: 16
lora_dropout: 0.0
lora_target_modules:
  - gate_proj
  - down_proj
  - up_proj
  - q_proj
  - v_proj
  - k_proj
  - o_proj
lora_fan_in_fan_out:
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
output_dir: ./outputs/lora-out
gradient_accumulation_steps: 1
micro_batch_size: 2
num_epochs: 4
optimizer: adamw_torch
torchdistx_path:
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
float32: true
bf16: false
fp16: false
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank: 0
logging_steps: 1
xformers_attention:
flash_attention: false
gptq_groupsize:
s2_attention:
gptq_model_v1:
warmup_steps: 20
evals_per_epoch: 4
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.1
fsdp:
fsdp_config:
special_tokens:
  bos_token: "<s>"
  eos_token: "</s>"
  unk_token: "<unk>"

src/axolotl/utils/models.py

MengqingCao · 2024-07-19T03:25:05Z

Good day! @winglian I tried to create a class ModelKwargs, but with the modification of model_kwargs, there are many other operations such as patching, creating models, etc. And their judgment conditions seem inseparable.

Thus, finnaly I refactor the whole load_model func into a class ModelLoader. All the operations in original load_model func have been placed in several member functions and followed the original logical order.

This brings a lot changes, while making the model loading pipeline more clearly. Moreover, the changes of member variables such as model_kwargs are more obvious. But I am not sure whether the current function naming and pipeline splitting method is completely reasonable.

Please review the latest code and give me some suggestions. Thanks a lot!

MengqingCao · 2024-08-02T07:21:39Z

Hi, @winglian Could you help review the latest code in this PR? Let me know if the breaks brings by refactoring of the original code is not you want.

Just FYI, I accidentally deleted the original commit, and it cann be found in this branch.

Yikun · 2024-09-12T03:29:33Z

Looks like it includes two parts in this commits Model Loaders reafactor and Ascend NPU support. Maybe we could spilit it as two PRs, the first one is Model Loaders reafactor, then we will rebase the Ascend NPU support PR after it.

Or do you have any other suggestions? @winglian Please feel free let us know if you have any more concern. Thanks!

MengqingCao · 2024-09-12T12:36:15Z

The refactoring of ModelLoder is split into #1909, and the support of Ascend NPU will be commited after #1909 . Hope this will make it easier to review and test. cc @winglian

MengqingCao · 2024-10-26T10:08:33Z

@winglian Hi, Ascend NPU support is done on the latest branch, plz review it, thanks!

src/axolotl/utils/bench.py

src/axolotl/utils/distributed.py

src/axolotl/utils/config/__init__.py

src/axolotl/utils/bench.py

src/axolotl/utils/config/__init__.py

src/axolotl/utils/config/models/input/v0_4_1/__init__.py

src/axolotl/utils/distributed.py

src/axolotl/utils/models.py

MengqingCao · 2024-11-19T10:59:56Z

@NanoCode012 I just pushed the lasted code, but I accidentally closed PR before that, and now CI is stopped. What should I do now?

NanoCode012 · 2024-11-19T13:17:52Z

@MengqingCao , no worries. I restarted them. I checked the PR, and all my points have been resolved. Thank you for addressing them.

I plan to let the multi-gpu CI to run as well to ensure no issues there.

winglian reviewed Jul 16, 2024

View reviewed changes

src/axolotl/utils/models.py Outdated Show resolved Hide resolved

MengqingCao closed this Jul 19, 2024

MengqingCao force-pushed the npu_support branch from b8b169a to 7830fe0 Compare July 19, 2024 03:00

MengqingCao reopened this Jul 19, 2024

MengqingCao force-pushed the npu_support branch from a9b5ca4 to 8d39332 Compare September 5, 2024 09:44

MengqingCao mentioned this pull request Sep 12, 2024

Refactor func load_model to class ModelLoader #1909

Merged

1 task

MengqingCao closed this Oct 26, 2024

MengqingCao reopened this Oct 26, 2024

MengqingCao force-pushed the npu_support branch from 58afbe0 to 080f4eb Compare October 26, 2024 08:01

winglian reviewed Oct 30, 2024

View reviewed changes

src/axolotl/utils/bench.py Outdated Show resolved Hide resolved

winglian reviewed Oct 30, 2024

View reviewed changes

src/axolotl/utils/distributed.py Outdated Show resolved Hide resolved

MengqingCao commented Oct 31, 2024

View reviewed changes

src/axolotl/utils/config/__init__.py Show resolved Hide resolved

NanoCode012 reviewed Nov 13, 2024

View reviewed changes

src/axolotl/utils/bench.py Outdated Show resolved Hide resolved

src/axolotl/utils/config/__init__.py Outdated Show resolved Hide resolved

src/axolotl/utils/config/__init__.py Outdated Show resolved Hide resolved

MengqingCao force-pushed the npu_support branch from 8b72705 to bfbc3a4 Compare November 14, 2024 08:14

NanoCode012 reviewed Nov 19, 2024

View reviewed changes

MengqingCao closed this Nov 19, 2024

MengqingCao force-pushed the npu_support branch from 7caec61 to d9b71ed Compare November 19, 2024 10:51

Add Ascend NPU support

161a1b1

MengqingCao reopened this Nov 19, 2024

winglian added the ready to merge label Nov 19, 2024

winglian merged commit 838b74d into axolotl-ai-cloud:main Nov 21, 2024
22 of 24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Ascend NPU support #1758

Enable Ascend NPU support #1758

MengqingCao commented Jul 16, 2024

MengqingCao commented Jul 19, 2024

MengqingCao commented Aug 2, 2024

Yikun commented Sep 12, 2024 •

edited

Loading

MengqingCao commented Sep 12, 2024

MengqingCao commented Oct 26, 2024

MengqingCao commented Nov 19, 2024

NanoCode012 commented Nov 19, 2024

Enable Ascend NPU support #1758

Enable Ascend NPU support #1758

Conversation

MengqingCao commented Jul 16, 2024

Description

Motivation and Context

Example

Screenshots

NPU supported CLI inference

NPU supported Gradio webui inference

Config

MengqingCao commented Jul 19, 2024

MengqingCao commented Aug 2, 2024

Yikun commented Sep 12, 2024 • edited Loading

MengqingCao commented Sep 12, 2024

MengqingCao commented Oct 26, 2024

MengqingCao commented Nov 19, 2024

NanoCode012 commented Nov 19, 2024

Yikun commented Sep 12, 2024 •

edited

Loading