-
Notifications
You must be signed in to change notification settings - Fork 648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High Numerical Errors in Mish Activation with FLOAT16 Precision on Neural Engine #2359
Comments
I write the alternative Mish implementation here:
|
For security reasons, I am not able to download and run your network. Please create a minimal example to demonstrate the issue. Ideally some small amount of self contained code that I can just copy and paste. |
I have created a minimal example to demonstrate the issue below. It is small amount of self contained code so you can just copy and paste. To ReproduceTwo scripts to reproduce this issue. One uses the built-in Mish activation, and the other uses the alternative Mish implementation. Built-in Mish Activation
Expected Output
Alternative Mish Implementation
Expected Output
|
🐞Describing the bug
The built-in Mish activation function in
coremltools
introduces significant numerical errors in Core ML models when using 16-bit floating point precision (FLOAT16) on configurations withComputeUnit=CPU_AND_NE
. Specifically, converting models that utilize the Mish activation results in substantial discrepancies in output predictions compared to the original model, leading to high error rates across various metrics.Stack Trace
N/A
To Reproduce
Follow the steps below to reproduce the high numerical errors using the built-in Mish activation function:
Clone the KataGo repository:
git clone --branch v1.15.3-coreml1 https://github.com/ChinChangYang/KataGo.git KataGo-v1.15.3-coreml1 cd KataGo-v1.15.3-coreml1/python
Download a KataGo model in RAW checkpoint format:
Install Python Modules:
Evaluate the high error using the built-in Mish implementation:
Expected Output:
Evaluate the lower error using the alternative Mish implementation:
Expected Output:
System environment (please complete the following information):
Additional context
The issue arises specifically when using
ComputeUnit=CPU_AND_NE
withPrecision=FLOAT16
. The built-in Mish activation function incoremltools
leads to high numerical errors, as evidenced by metrics such aswinrateError
,leadError
, and others showing discrepancies upwards of 25%. Switching to an alternative Mish implementation drastically reduces these errors to below 1%, albeit with a 32% increase in inference time due to the additional operators introduced.This problem is isolated to 16-bit floating point precision on the Neural Engine (NE), as experiments with other compute units and precision settings (e.g., FLOAT32) do not exhibit the same high error rates. The significant reduction in error using the alternative Mish implementation suggests that the built-in Mish operator may have implementation issues when used in this specific configuration.
This issue was generated based on a detailed analysis of numerical errors in Core ML models using the Mish activation function with 16-bit precision, as documented in the related blog post. Further investigation and collaboration from the
coremltools
engineering team would be greatly appreciated to resolve this matter.The text was updated successfully, but these errors were encountered: