Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move FP8 TE export logic to mcore.export #11409

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

Laplasjan107
Copy link
Collaborator

@Laplasjan107 Laplasjan107 commented Nov 26, 2024

What does this PR do ?

Support FP8 TE TRT-LLM export with mcore path

Collection: llm/nlp

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Piotr Kaminski and others added 2 commits November 26, 2024 07:18
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
@Laplasjan107 Laplasjan107 marked this pull request as ready for review December 9, 2024 11:32
@Laplasjan107 Laplasjan107 changed the title [Draft] Move FP8 TE export logic to mcore.export Move FP8 TE export logic to mcore.export Dec 10, 2024
Laplasjan107 and others added 5 commits December 12, 2024 11:25
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kaminski <piotrus.kaminski@gmail.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
@janekl janekl force-pushed the pikaminski/mcore_fp8_export branch from ce002a8 to 71292d4 Compare December 13, 2024 10:05
Laplasjan107 and others added 5 commits December 16, 2024 05:12
Signed-off-by: Piotr Kaminski <piotrus.kaminski@gmail.com>
Signed-off-by: Piotr Kaminski <piotrus.kaminski@gmail.com>
Signed-off-by: Piotr Kaminski <piotrus.kaminski@gmail.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
@@ -83,6 +83,17 @@ def wrapper(*args, **kwargs):
use_pytriton = False


def determine_quantization_settings(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add docstrings for all new / edited functions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same remark for typing)

@@ -353,17 +366,38 @@ def export(
from megatron.core.export.trtllm.trtllm_helper import TRTLLMHelper
from tensorrt_llm.layers import MoeConfig

use_embedding_sharing = model_configs.get("share_embeddings_and_output_weights", False)
fp8_quantized, fp8_kvcache = determine_quantization_settings(
model_configs, fp8_quantized, fp8_kvcache
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's model_configs (plural) passed, but only 1x nemo_model_config consumed in determine_quantization_settings - could we unify?

@@ -524,7 +559,7 @@ def get_transformer_config(self, nemo_model_config):
ffn_hidden_size=nemo_model_config.get('ffn_hidden_size'),
layernorm_epsilon=nemo_model_config.get('layernorm_epsilon'),
add_bias_linear=nemo_model_config.get('bias'),
num_moe_experts=nemo_model_config.get('num_moe_experts', None),
num_moe_experts=num_moe_experts if num_moe_experts > 0 else None,
normalization=transformer_config_normalization,
layernorm_zero_centered_gamma=layernorm_zero_centered_gamma,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #11614 there is also gated_linear_unit=nemo_model_config.get('gated_linear_unit'), added.

  • Can we add it here? So that we merge only this MR to offload CI slightly.
  • Can we write nemo_model_config.get('gated_linear_unit', False) to conform with the typing? There could be suprises maybe, see None vs 0 for num_moe_experts

@janekl
Copy link
Collaborator

janekl commented Dec 20, 2024

Please fill in the MR Usage section with details on fp8_quantized & fp8_kvcache params so that people can immediately get to know what is added here

@@ -353,17 +366,38 @@ def export(
from megatron.core.export.trtllm.trtllm_helper import TRTLLMHelper
from tensorrt_llm.layers import MoeConfig

use_embedding_sharing = model_configs.get("share_embeddings_and_output_weights", False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overrides parameter use_embedding_sharing passed to TensorRTLLM.export(...). Should it be removed from signature?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: how about keeping the original name share_embeddings_and_output_weights?

bytes_list = [state_dict[keyname][0] for keyname in keynames]
return load_scales_from_bytes(bytes_list)
decomposed_sharded_key = key.split('/')
if not len(decomposed_sharded_key):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always len(decomposed_sharded_key) >= 1 so this if is inactive?

Laplasjan107 and others added 3 commits December 23, 2024 10:22
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kaminski <piotrus.kaminski@gmail.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Copy link
Contributor

beep boop 🤖: 🙏 The following files have warnings. In case you are familiar with these, please try helping us to improve the code base.


Your code was analyzed with PyLint. The following annotations have been identified:

************* Module nemo.export.trt_llm.nemo_ckpt_loader.nemo_file
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:53:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:121:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:146:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:176:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:187:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:191:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:216:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:227:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:248:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:343:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:461:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:548:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:552:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:563:0: C0115: Missing class docstring (missing-class-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:575:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:615:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:652:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:665:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:20:0: W0611: Unused import re (unused-import)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:22:0: W0611: Unused BytesIO imported from io (unused-import)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:24:0: W0611: Unused List imported from typing (unused-import)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:24:0: W0611: Unused Tuple imported from typing (unused-import)
nemo/export/trt_llm/nemo_ckpt_loader/nemo_file.py:27:0: W0611: Unused import tensorstore (unused-import)

-----------------------------------
Your code has been rated at 9.77/10

Mitigation guide:

  • Add sensible and useful docstrings to functions and methods
  • For trivial methods like getter/setters, consider adding # pylint: disable=C0116 inside the function itself
  • To disable multiple functions/methods at once, put a # pylint: disable=C0116 before the first and a # pylint: enable=C0116 after the last.

By applying these rules, we reduce the occurance of this message in future.

Thank you for improving NeMo's documentation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants