EXAONE 3.0 Model Support #1258

Deepfocused · 2024-08-29T20:48:53Z

Motivation

Hello, I am deepfocused from the AI engineering Team at LG CNS.
we want to integrate the recently opened model EXAONE-3.0-7.8B-Instruct model into sglang.
Previously, we have been servicing models such as EXAONE v1, v2, etc. to our customers as sglang through the sglang + EXAONE build. Unlike these previous models, EXAONE 3.0 was released as an open model, so we are requesting integration.

Modifications

EXAONE 3.0 was written based on the llama model

I added exaone.py to sglang/srt/models
I created a configs folder in sglang/srt, and wrote an EXAONE 3.0 configuration file in configs. Then, modified sglang/srt/hf_transformers_utils.py so that it can be run without trust_remote_code or --trust-remote-code at runtime.

🐽🐽 Brief description of sglang/srt/models/exaone.py 🐽🐽
the newly released EXAONE v3 model has the same structure as llama, but there are differences in variable names. Therefore, in order to run EXAONE v3 on sglang, several variable names and weight names had to be changed in the llama structure.
For example,
down_proj -> self.c_proj
gate_proj -> c_fc_0
up_proj" -> c_fc_1
config.rms_norm_eps -> config.layer_norm_epsilon
There are a few more. (Refer to the code)

EXAONE 3.0 seems to focus more on data and training methods based on the llama structure rather than changing the model structure.

Checklist

[✅ ] Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

zhyncs · 2024-08-30T06:43:01Z

python3 scripts/playground/reference_hf.py --model [new model]
python3 -m sglang.bench_latency --model [new model] --correct --output-len 16 --trust-remote-code

Thank you for your contribution! Can you provide the output of these two?

Deepfocused · 2024-08-30T07:12:00Z

@zhyncs

python3 scripts/playground/reference_hf.py --model EXAONE-3.0-7.8B-Instruct

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:17<00:00,  2.44s/it]
We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
prefill logits tensor([ 8.1250, 31.7500, 41.4062,  ..., 11.3516, 10.3750, 10.2812],
       device='cuda:0')
The capital of France is Paris.

10. **What is the currency of Japan?**
prefill logits tensor([ 8.4297, 31.8281, 40.5625,  ..., 13.4844, 10.3047, 11.5312],
       device='cuda:0')
The capital of the United Kindom is London.

London is a city in the United Kingdom.

London
prefill logits tensor([10.6875, 31.8125, 45.1562,  ..., 14.3984,  9.7578, 11.7031],
       device='cuda:0')
Today is a sunny day and I like to go for a walk in the park. The birds are chirping and the

python3 -m sglang.bench_latency --model EXAONE-3.0-7.8B-Instruct --correct --output-len 16 --trust-remote-code

Init nccl begin.
Load weight begin. avail mem=78.94 GB
Loading safetensors checkpoint shards:   0% Completed | 0/7 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  14% Completed | 1/7 [00:00<00:05,  1.05it/s]
Loading safetensors checkpoint shards:  29% Completed | 2/7 [00:01<00:04,  1.02it/s]
Loading safetensors checkpoint shards:  43% Completed | 3/7 [00:03<00:04,  1.04s/it]
Loading safetensors checkpoint shards:  57% Completed | 4/7 [00:04<00:03,  1.06s/it]
Loading safetensors checkpoint shards:  71% Completed | 5/7 [00:05<00:02,  1.08s/it]
Loading safetensors checkpoint shards:  86% Completed | 6/7 [00:06<00:01,  1.17s/it]
Loading safetensors checkpoint shards: 100% Completed | 7/7 [00:07<00:00,  1.09it/s]
Loading safetensors checkpoint shards: 100% Completed | 7/7 [00:07<00:00,  1.00s/it]

Load weight end. type=ExaoneForCausalLM, dtype=torch.float16, avail mem=64.30 GB
Memory pool end. avail mem=9.35 GB
Capture cuda graph begin. This can take up to several minutes.
max_total_num_tokens=449152
input_ids=[[1320, 7304, 670, 9776, 772], [1320, 7304, 670, 629, 3573, 28574, 742, 772], [23102, 772, 619, 44212, 2471, 686, 768, 1664]]
prefill logits (first half) tensor([[11.4062, 36.0000, 45.5312,  ..., 15.5234, 13.1875, 13.3203],
        [ 5.3281, 37.0312, 56.6250,  ..., 10.2266,  8.1562,  8.7422],
        [ 7.9414, 34.8750, 49.4688,  ..., 11.4766,  9.8984, 11.6797]],
       device='cuda:0')
prefill logits (final) tensor([[ 8.1328, 31.7500, 41.4062,  ..., 11.3594, 10.3750, 10.2812],
        [ 8.4219, 31.8281, 40.5625,  ..., 13.4766, 10.2969, 11.5234],
        [10.6953, 31.8125, 45.1562,  ..., 14.4141,  9.7656, 11.7188]],
       device='cuda:0')
The capital of France is Paris.

10. **What is the currency of Japan?**

The capital of the United Kindom is London.

London is a city in the United Kingdom.

London is
Today is a sunny day and I like to go for a walk in the park. The birds are chirping and the flowers

thank you!

python/sglang/srt/configs/exaone.py

zhyncs

overall LGTM

zhyncs · 2024-08-30T08:14:19Z

@Deepfocused Thanks for your contribution! It has been merged.

zhyncs · 2024-08-30T08:17:07Z

@Deepfocused And may you raise another PR to update README?

https://github.com/sgl-project/sglang/?tab=readme-ov-file#supported-models
https://huggingface.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct

Deepfocused · 2024-08-30T08:20:09Z

Okay, I'll add it

@Deepfocused And may you raise another PR to update README?

https://github.com/sgl-project/sglang/?tab=readme-ov-file#supported-models https://huggingface.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct

Deepfocused · 2024-08-30T08:50:24Z

@zhyncs
I raise another PR(#1267) to update README.md

Add Exaone3.0 to Supported Models

merrymercy · 2024-08-30T10:17:57Z

@Deepfocused Thanks for the contribution. Can you also recommend sglang as one of the model deployment methods in your hugging face model card? https://huggingface.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct

Deepfocused · 2024-08-30T15:42:00Z

@merrymercy

It's not something I can do right now, but I'll talk to my boss!!!

Deepfocused and others added 10 commits August 30, 2024 02:08

Create exaone.py

dd1ad45

Create exaone.py

b522cad

Delete python/sglang/srt/models directory

eed2a55

Update hf_transformers_utils.py

aa61bc5

Create __init__.py

62c09a3

Create exaone.py

59dbd39

Update hf_transformers_utils.py

3014dab

Update hf_transformers_utils.py

24168af

Update hf_transformers_utils.py

2b3b6b8

precommit check completed

d30797f

zhyncs reviewed Aug 30, 2024

View reviewed changes

python/sglang/srt/configs/exaone.py Outdated Show resolved Hide resolved

zhyncs approved these changes Aug 30, 2024

View reviewed changes

zhyncs requested review from Ying1123, merrymercy and hnyls2002 August 30, 2024 07:25

zhyncs and others added 3 commits August 30, 2024 17:25

Merge branch 'main' into main

2963f10

Update sqlang/srt/configs/exaone.py

fa4c72d

add Copyright 2023-2024 SGLang Team

zhyncs enabled auto-merge (squash) August 30, 2024 07:58

zhyncs merged commit b7f8341 into sgl-project:main Aug 30, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EXAONE 3.0 Model Support #1258

EXAONE 3.0 Model Support #1258

Deepfocused commented Aug 29, 2024

zhyncs commented Aug 30, 2024

Deepfocused commented Aug 30, 2024

zhyncs left a comment

zhyncs commented Aug 30, 2024

zhyncs commented Aug 30, 2024

Deepfocused commented Aug 30, 2024

Deepfocused commented Aug 30, 2024

merrymercy commented Aug 30, 2024 •

edited

Loading

Deepfocused commented Aug 30, 2024

EXAONE 3.0 Model Support #1258

EXAONE 3.0 Model Support #1258

Conversation

Deepfocused commented Aug 29, 2024

Motivation

Modifications

Checklist

zhyncs commented Aug 30, 2024

Deepfocused commented Aug 30, 2024

zhyncs left a comment

Choose a reason for hiding this comment

zhyncs commented Aug 30, 2024

zhyncs commented Aug 30, 2024

Deepfocused commented Aug 30, 2024

Deepfocused commented Aug 30, 2024

merrymercy commented Aug 30, 2024 • edited Loading

Deepfocused commented Aug 30, 2024

merrymercy commented Aug 30, 2024 •

edited

Loading