Move FP8 TE export logic to mcore.export #11409

Laplasjan107 · 2024-11-26T15:20:23Z

What does this PR do ?

Support FP8 TE TRT-LLM export with mcore path

Collection: llm/nlp

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>

Signed-off-by: Piotr Kaminski <piotrus.kaminski@gmail.com>

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

Signed-off-by: Piotr Kaminski <piotrus.kaminski@gmail.com>

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>

janekl · 2024-12-20T14:29:02Z

nemo/export/tensorrt_llm.py

@@ -83,6 +83,17 @@ def wrapper(*args, **kwargs):
    use_pytriton = False


+def determine_quantization_settings(


Could you add docstrings for all new / edited functions?

(same remark for typing)

janekl · 2024-12-20T14:30:47Z

nemo/export/tensorrt_llm.py

@@ -353,17 +366,38 @@ def export(
                    from megatron.core.export.trtllm.trtllm_helper import TRTLLMHelper
                    from tensorrt_llm.layers import MoeConfig

+                    use_embedding_sharing = model_configs.get("share_embeddings_and_output_weights", False)
+                    fp8_quantized, fp8_kvcache = determine_quantization_settings(
+                        model_configs, fp8_quantized, fp8_kvcache


It's model_configs (plural) passed, but only 1x nemo_model_config consumed in determine_quantization_settings - could we unify?

janekl · 2024-12-20T14:50:53Z

nemo/export/tensorrt_llm.py

@@ -524,7 +559,7 @@ def get_transformer_config(self, nemo_model_config):
            ffn_hidden_size=nemo_model_config.get('ffn_hidden_size'),
            layernorm_epsilon=nemo_model_config.get('layernorm_epsilon'),
            add_bias_linear=nemo_model_config.get('bias'),
-            num_moe_experts=nemo_model_config.get('num_moe_experts', None),
+            num_moe_experts=num_moe_experts if num_moe_experts > 0 else None,
            normalization=transformer_config_normalization,
            layernorm_zero_centered_gamma=layernorm_zero_centered_gamma,


In #11614 there is also gated_linear_unit=nemo_model_config.get('gated_linear_unit'), added.

Can we add it here? So that we merge only this MR to offload CI slightly.

Can we write nemo_model_config.get('gated_linear_unit', False) to conform with the typing? There could be suprises maybe, see None vs 0 for num_moe_experts

janekl · 2024-12-20T15:07:19Z

Please fill in the MR Usage section with details on fp8_quantized & fp8_kvcache params so that people can immediately get to know what is added here

janekl · 2024-12-20T15:14:31Z

nemo/export/tensorrt_llm.py

@@ -353,17 +366,38 @@ def export(
                    from megatron.core.export.trtllm.trtllm_helper import TRTLLMHelper
                    from tensorrt_llm.layers import MoeConfig

+                    use_embedding_sharing = model_configs.get("share_embeddings_and_output_weights", False)


This overrides parameter use_embedding_sharing passed to TensorRTLLM.export(...). Should it be removed from signature?

Also: how about keeping the original name share_embeddings_and_output_weights?

janekl · 2024-12-20T15:28:22Z

nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py

-    bytes_list = [state_dict[keyname][0] for keyname in keynames]
-    return load_scales_from_bytes(bytes_list)
+        decomposed_sharded_key = key.split('/')
+        if not len(decomposed_sharded_key):


Always len(decomposed_sharded_key) >= 1 so this if is inactive?

Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>

Signed-off-by: Piotr Kaminski <piotrus.kaminski@gmail.com>

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

nemo/export/tensorrt_llm.py

Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>

github-actions · 2024-12-23T14:19:23Z

beep boop 🤖: 🙏 The following files have warnings. In case you are familiar with these, please try helping us to improve the code base.

Your code was analyzed with PyLint. The following annotations have been identified:

************* Module nemo.export.trt_llm.nemo_ckpt_loader.nemo_file
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:53:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:121:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:146:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:176:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:187:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:191:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:216:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:227:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:248:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:343:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:461:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:548:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:552:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:563:0: C0115: Missing class docstring (missing-class-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:575:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:615:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:652:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:665:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:20:0: W0611: Unused import re (unused-import)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:22:0: W0611: Unused BytesIO imported from io (unused-import)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:24:0: W0611: Unused List imported from typing (unused-import)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:24:0: W0611: Unused Tuple imported from typing (unused-import)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:27:0: W0611: Unused import tensorstore (unused-import)

-----------------------------------
Your code has been rated at 9.77/10

Mitigation guide:

Add sensible and useful docstrings to functions and methods
For trivial methods like getter/setters, consider adding # pylint: disable=C0116 inside the function itself
To disable multiple functions/methods at once, put a # pylint: disable=C0116 before the first and a # pylint: enable=C0116 after the last.

By applying these rules, we reduce the occurance of this message in future.

Thank you for improving NeMo's documentation!

Piotr Kaminski and others added 2 commits November 26, 2024 07:18

Initial commit

5cc9f13

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

Apply isort and black reformatting

8d9d6c0

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

Laplasjan107 marked this pull request as ready for review December 9, 2024 11:32

Laplasjan107 changed the title ~~[Draft] Move FP8 TE export logic to mcore.export~~ Move FP8 TE export logic to mcore.export Dec 10, 2024

Laplasjan107 and others added 5 commits December 12, 2024 11:25

Merge branch 'main' into pikaminski/mcore_fp8_export

1a0e039

Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>

NeMo 2.0 mapping

527f0b8

Signed-off-by: Piotr Kaminski <piotrus.kaminski@gmail.com>

Apply isort and black reformatting

f44fd24

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

Safely access "num_moe_experts"

fe5dac5

Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>

Adapt num_moe_experts check slightly

71292d4

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

janekl force-pushed the pikaminski/mcore_fp8_export branch from ce002a8 to 71292d4 Compare December 13, 2024 10:05

Laplasjan107 and others added 5 commits December 16, 2024 05:12

fixes

12ae2b9

Signed-off-by: Piotr Kaminski <piotrus.kaminski@gmail.com>

Add missing mixtral mapping

7abbc4d

Signed-off-by: Piotr Kaminski <piotrus.kaminski@gmail.com>

nemo2 and gemma fixes

f6881b1

Signed-off-by: Piotr Kaminski <piotrus.kaminski@gmail.com>

Apply isort and black reformatting

1be987b

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

Fix model_type access

2c61089

Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>

janekl reviewed Dec 20, 2024

View reviewed changes

janekl mentioned this pull request Dec 20, 2024

Fix moe and Gemma issue for TRTLLM Mcore export #11614

Open

8 tasks

janekl reviewed Dec 20, 2024

View reviewed changes

janekl requested review from oyilmaz-nvidia and shanmugamr1992 December 20, 2024 15:42

Laplasjan107 and others added 3 commits December 23, 2024 10:22

Merge branch 'main' into pikaminski/mcore_fp8_export

c87daf3

Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>

code review changes

1b47685

Signed-off-by: Piotr Kaminski <piotrus.kaminski@gmail.com>

Apply isort and black reformatting

b8ee2b2

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

github-advanced-security bot found potential problems Dec 23, 2024

View reviewed changes

nemo/export/tensorrt_llm.py Fixed Show fixed Hide fixed

Remove unused import

b19d488

Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move FP8 TE export logic to mcore.export #11409

Move FP8 TE export logic to mcore.export #11409

Laplasjan107 commented Nov 26, 2024 •

edited

Loading

janekl Dec 20, 2024

janekl Dec 20, 2024

janekl Dec 20, 2024

janekl Dec 20, 2024

janekl commented Dec 20, 2024

janekl Dec 20, 2024

janekl Dec 20, 2024

janekl Dec 20, 2024

github-actions bot commented Dec 23, 2024

		@@ -83,6 +83,17 @@ def wrapper(args, *kwargs):
		use_pytriton = False


		def determine_quantization_settings(

Move FP8 TE export logic to mcore.export #11409

Are you sure you want to change the base?

Move FP8 TE export logic to mcore.export #11409

Conversation

Laplasjan107 commented Nov 26, 2024 • edited Loading

What does this PR do ?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

janekl Dec 20, 2024

Choose a reason for hiding this comment

janekl Dec 20, 2024

Choose a reason for hiding this comment

janekl Dec 20, 2024

Choose a reason for hiding this comment

janekl Dec 20, 2024

Choose a reason for hiding this comment

janekl commented Dec 20, 2024

janekl Dec 20, 2024

Choose a reason for hiding this comment

janekl Dec 20, 2024

Choose a reason for hiding this comment

janekl Dec 20, 2024

Choose a reason for hiding this comment

github-actions bot commented Dec 23, 2024

Laplasjan107 commented Nov 26, 2024 •

edited

Loading