Improve the implementation of Manual Cast and IPEX support #14597

KohakuBlueleaf · 2024-01-09T15:35:48Z

Description

Improved the efficiency of manual cast with type checking.
Apply the behavior of precision=full into manual cast
- Also ensure fp8 can be used with precision=full on autocastable devices.
  (With manual cast, which is as fast as autocast + output upcast. I just chose the simples one)
Use float32 for MHA layer in ipex since some computation in it is not good for IPEX devices.
Have checked this implementation can be used on NV Cuda/Apple M/Intel ARC. With latest ipex, Arc A770 can use manual cast with fp8 features.
Manual Cast on A770 is now faster then autocast('xpu'), even with ipex.optimize.

This reverts commit e003659.

KohakuBlueleaf and others added 7 commits January 9, 2024 22:11

improve efficiency and support more device

209c26a

Fix bugs when arg dtype doesn't match

42e6df7

linting and debugs

c2c05fc

Apply correct inference precision implementation

e003659

Revert "Apply correct inference precision implementation"

1fd6965

This reverts commit e003659.

Apply the correct behavior of precision='full'

58d5b04

rearrange if-statements for cpu

ca671e5

KohakuBlueleaf requested a review from AUTOMATIC1111 as a code owner January 9, 2024 15:35

AUTOMATIC1111 approved these changes Jan 9, 2024

View reviewed changes

AUTOMATIC1111 merged commit 905b142 into dev Jan 9, 2024
6 checks passed

AUTOMATIC1111 deleted the improved-manual-cast branch January 9, 2024 16:33

w-e-w mentioned this pull request Feb 17, 2024

1.8.0-RC #14948

Closed

pawel665j mentioned this pull request Apr 16, 2024

## 1.8.0-RC #15537

Closed