Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add migx ep fp8 int4 #78

Merged
merged 10 commits into from
Jan 29, 2025

Conversation

TedThemistokleous
Copy link

Description

Datatype support for int4 and fp8 (all formats)

Motivation and Context

Allows us to support operators of these data types to be handled by Onnxruntime MIGraphX EP

@TedThemistokleous TedThemistokleous added the enhancement New feature or request label Dec 4, 2024
@TedThemistokleous TedThemistokleous self-assigned this Dec 4, 2024
@streamhsa streamhsa force-pushed the add_migx_ep_fp8_int4 branch from c19fa76 to d1a2609 Compare December 25, 2024 03:52
@TedThemistokleous TedThemistokleous changed the title Add migx ep fp8 int4 Add migx ep fp8 Jan 5, 2025
@TedThemistokleous TedThemistokleous changed the title Add migx ep fp8 Add migx ep fp8 int4 Jan 5, 2025
Map things to int8 right now as we don't explicitly set an int4 input type and pack/unpack int4 operands
Mirror the same calibration code we use for int8 and just change which quantize we call through the MIGraphx API
- Add additional flags for fp8 thats shared for int8

- Add lockout warning message when int8/fp8 used at the same time
Previous runs using session options failed as we were missing pulling in inputs from the python interface. This plus additional logging allowed me to track what options were invoked via env and what were added during the start of an inference session
@TedThemistokleous
Copy link
Author

rebasing off ort_value changes used for llama_V2 pipe to verify end to end with an int4 model.

need this so the user knows there's any of the environment variables running in the background to ensure proper consistently between runs.
@TedThemistokleous
Copy link
Author

Merging this in to not block fp8 related items further. Will workout any additional int4 items as int4 verification occurs.

@TedThemistokleous TedThemistokleous merged commit 2117821 into rocm6.3_internal_testing Jan 29, 2025
11 of 15 checks passed
TedThemistokleous added a commit that referenced this pull request Jan 29, 2025
* Add fp8 and int4 types in supported list for Onnxruntime EP

* Add support for int4 inputs

Map things to int8 right now as we don't explicitly set an int4 input type and pack/unpack int4 operands

* Add flag to allow for fp8 quantization through Onnxruntime API

* Add fp8 quantization to the compile stage of the MIGraphX EP

Mirror the same calibration code we use for int8 and just change which quantize we call through the MIGraphx API

* cleanup logging

* Cleanup and encapsulate quantization / compile functions

- Add additional flags for fp8 thats shared for int8

- Add lockout warning message when int8/fp8 used at the same time

* Run lintrunner pass

* Fix session options inputs + add better logging.

Previous runs using session options failed as we were missing pulling in inputs from the python interface. This plus additional logging allowed me to track what options were invoked via env and what were added during the start of an inference session

* Fix naming for save/load path varibles to be consistent with  enable.

* Print only env variables that are set as warnings

need this so the user knows there's any of the environment variables running in the background to ensure proper consistently between runs.

---------

Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant