Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EXAONE 3.0 Model Support #1258

Merged
merged 13 commits into from
Aug 30, 2024
Merged

EXAONE 3.0 Model Support #1258

merged 13 commits into from
Aug 30, 2024

Conversation

Deepfocused
Copy link
Contributor

Motivation

Hello, I am deepfocused from the AI ​​engineering Team at LG CNS.
we want to integrate the recently opened model EXAONE-3.0-7.8B-Instruct model into sglang.
Previously, we have been servicing models such as EXAONE v1, v2, etc. to our customers as sglang through the sglang + EXAONE build. Unlike these previous models, EXAONE 3.0 was released as an open model, so we are requesting integration.

Modifications

EXAONE 3.0 was written based on the llama model

  1. I added exaone.py to sglang/srt/models
  2. I created a configs folder in sglang/srt, and wrote an EXAONE 3.0 configuration file in configs. Then, modified sglang/srt/hf_transformers_utils.py so that it can be run without trust_remote_code or --trust-remote-code at runtime.

🐽🐽 Brief description of sglang/srt/models/exaone.py 🐽🐽
the newly released EXAONE v3 model has the same structure as llama, but there are differences in variable names. Therefore, in order to run EXAONE v3 on sglang, several variable names and weight names had to be changed in the llama structure.
For example,
down_proj -> self.c_proj
gate_proj -> c_fc_0
up_proj" -> c_fc_1
config.rms_norm_eps -> config.layer_norm_epsilon
There are a few more. (Refer to the code)

EXAONE 3.0 seems to focus more on data and training methods based on the llama structure rather than changing the model structure.

Checklist

  • [✅ ] Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@zhyncs
Copy link
Member

zhyncs commented Aug 30, 2024

python3 scripts/playground/reference_hf.py --model [new model]
python3 -m sglang.bench_latency --model [new model] --correct --output-len 16 --trust-remote-code

Thank you for your contribution! Can you provide the output of these two?

@Deepfocused
Copy link
Contributor Author

@zhyncs

  1. python3 scripts/playground/reference_hf.py --model EXAONE-3.0-7.8B-Instruct
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:17<00:00,  2.44s/it]
We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
prefill logits tensor([ 8.1250, 31.7500, 41.4062,  ..., 11.3516, 10.3750, 10.2812],
       device='cuda:0')
The capital of France is Paris.

10. **What is the currency of Japan?**
prefill logits tensor([ 8.4297, 31.8281, 40.5625,  ..., 13.4844, 10.3047, 11.5312],
       device='cuda:0')
The capital of the United Kindom is London.

London is a city in the United Kingdom.

London
prefill logits tensor([10.6875, 31.8125, 45.1562,  ..., 14.3984,  9.7578, 11.7031],
       device='cuda:0')
Today is a sunny day and I like to go for a walk in the park. The birds are chirping and the
  1. python3 -m sglang.bench_latency --model EXAONE-3.0-7.8B-Instruct --correct --output-len 16 --trust-remote-code
Init nccl begin.
Load weight begin. avail mem=78.94 GB
Loading safetensors checkpoint shards:   0% Completed | 0/7 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  14% Completed | 1/7 [00:00<00:05,  1.05it/s]
Loading safetensors checkpoint shards:  29% Completed | 2/7 [00:01<00:04,  1.02it/s]
Loading safetensors checkpoint shards:  43% Completed | 3/7 [00:03<00:04,  1.04s/it]
Loading safetensors checkpoint shards:  57% Completed | 4/7 [00:04<00:03,  1.06s/it]
Loading safetensors checkpoint shards:  71% Completed | 5/7 [00:05<00:02,  1.08s/it]
Loading safetensors checkpoint shards:  86% Completed | 6/7 [00:06<00:01,  1.17s/it]
Loading safetensors checkpoint shards: 100% Completed | 7/7 [00:07<00:00,  1.09it/s]
Loading safetensors checkpoint shards: 100% Completed | 7/7 [00:07<00:00,  1.00s/it]

Load weight end. type=ExaoneForCausalLM, dtype=torch.float16, avail mem=64.30 GB
Memory pool end. avail mem=9.35 GB
Capture cuda graph begin. This can take up to several minutes.
max_total_num_tokens=449152
input_ids=[[1320, 7304, 670, 9776, 772], [1320, 7304, 670, 629, 3573, 28574, 742, 772], [23102, 772, 619, 44212, 2471, 686, 768, 1664]]
prefill logits (first half) tensor([[11.4062, 36.0000, 45.5312,  ..., 15.5234, 13.1875, 13.3203],
        [ 5.3281, 37.0312, 56.6250,  ..., 10.2266,  8.1562,  8.7422],
        [ 7.9414, 34.8750, 49.4688,  ..., 11.4766,  9.8984, 11.6797]],
       device='cuda:0')
prefill logits (final) tensor([[ 8.1328, 31.7500, 41.4062,  ..., 11.3594, 10.3750, 10.2812],
        [ 8.4219, 31.8281, 40.5625,  ..., 13.4766, 10.2969, 11.5234],
        [10.6953, 31.8125, 45.1562,  ..., 14.4141,  9.7656, 11.7188]],
       device='cuda:0')
The capital of France is Paris.

10. **What is the currency of Japan?**

The capital of the United Kindom is London.

London is a city in the United Kingdom.

London is
Today is a sunny day and I like to go for a walk in the park. The birds are chirping and the flowers

thank you!

Copy link
Member

@zhyncs zhyncs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM

zhyncs and others added 3 commits August 30, 2024 17:25
add Copyright 2023-2024 SGLang Team
add 
# Copyright 2024 The LG AI Research EXAONE Lab. All rights reserved.
# Copyright 2024 The LG CNS AI Engineering Team.
@zhyncs zhyncs enabled auto-merge (squash) August 30, 2024 07:58
@zhyncs zhyncs merged commit b7f8341 into sgl-project:main Aug 30, 2024
8 checks passed
@zhyncs
Copy link
Member

zhyncs commented Aug 30, 2024

@Deepfocused Thanks for your contribution! It has been merged.

@zhyncs
Copy link
Member

zhyncs commented Aug 30, 2024

@Deepfocused And may you raise another PR to update README?

https://github.com/sgl-project/sglang/?tab=readme-ov-file#supported-models
https://huggingface.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct

@Deepfocused
Copy link
Contributor Author

Okay, I'll add it

@Deepfocused And may you raise another PR to update README?

https://github.com/sgl-project/sglang/?tab=readme-ov-file#supported-models https://huggingface.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct

@Deepfocused
Copy link
Contributor Author

@zhyncs
I raise another PR(#1267) to update README.md

  • Add Exaone3.0 to Supported Models

@merrymercy
Copy link
Contributor

merrymercy commented Aug 30, 2024

@Deepfocused Thanks for the contribution. Can you also recommend sglang as one of the model deployment methods in your hugging face model card? https://huggingface.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct

@Deepfocused
Copy link
Contributor Author

@merrymercy

It's not something I can do right now, but I'll talk to my boss!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants