forked from microsoft/onnxruntime
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add migx ep fp8 int4 #78
Merged
TedThemistokleous
merged 10 commits into
rocm6.3_internal_testing
from
add_migx_ep_fp8_int4
Jan 29, 2025
Merged
Add migx ep fp8 int4 #78
TedThemistokleous
merged 10 commits into
rocm6.3_internal_testing
from
add_migx_ep_fp8_int4
Jan 29, 2025
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
TedThemistokleous
force-pushed
the
add_migx_ep_fp8_int4
branch
from
December 4, 2024 23:05
9b44fcc
to
7076f74
Compare
TedThemistokleous
force-pushed
the
add_migx_ep_fp8_int4
branch
from
December 18, 2024 21:26
4850dfd
to
11ff644
Compare
streamhsa
force-pushed
the
add_migx_ep_fp8_int4
branch
from
December 25, 2024 03:52
c19fa76
to
d1a2609
Compare
Map things to int8 right now as we don't explicitly set an int4 input type and pack/unpack int4 operands
Mirror the same calibration code we use for int8 and just change which quantize we call through the MIGraphx API
- Add additional flags for fp8 thats shared for int8 - Add lockout warning message when int8/fp8 used at the same time
Previous runs using session options failed as we were missing pulling in inputs from the python interface. This plus additional logging allowed me to track what options were invoked via env and what were added during the start of an inference session
TedThemistokleous
force-pushed
the
add_migx_ep_fp8_int4
branch
from
January 22, 2025 22:13
87f1f91
to
a314c7f
Compare
rebasing off ort_value changes used for llama_V2 pipe to verify end to end with an int4 model. |
need this so the user knows there's any of the environment variables running in the background to ensure proper consistently between runs.
TedThemistokleous
force-pushed
the
add_migx_ep_fp8_int4
branch
from
January 24, 2025 04:54
a314c7f
to
56246ac
Compare
Merging this in to not block fp8 related items further. Will workout any additional int4 items as int4 verification occurs. |
TedThemistokleous
merged commit Jan 29, 2025
2117821
into
rocm6.3_internal_testing
11 of 15 checks passed
TedThemistokleous
added a commit
that referenced
this pull request
Jan 29, 2025
* Add fp8 and int4 types in supported list for Onnxruntime EP * Add support for int4 inputs Map things to int8 right now as we don't explicitly set an int4 input type and pack/unpack int4 operands * Add flag to allow for fp8 quantization through Onnxruntime API * Add fp8 quantization to the compile stage of the MIGraphX EP Mirror the same calibration code we use for int8 and just change which quantize we call through the MIGraphx API * cleanup logging * Cleanup and encapsulate quantization / compile functions - Add additional flags for fp8 thats shared for int8 - Add lockout warning message when int8/fp8 used at the same time * Run lintrunner pass * Fix session options inputs + add better logging. Previous runs using session options failed as we were missing pulling in inputs from the python interface. This plus additional logging allowed me to track what options were invoked via env and what were added during the start of an inference session * Fix naming for save/load path varibles to be consistent with enable. * Print only env variables that are set as warnings need this so the user knows there's any of the environment variables running in the background to ensure proper consistently between runs. --------- Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Datatype support for int4 and fp8 (all formats)
Motivation and Context
Allows us to support operators of these data types to be handled by Onnxruntime MIGraphX EP