-
-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensorflow "Illegal instruction" on some machines #48
Comments
Might be solvable by using Tensorflow Docker base image rather than pipenv installing it from pypi. |
This happens because the version of Tensorflow on PyPI is compiled to use CPU instructions like AVX, AVX2, SSE4.1, SSE4.2 and FMA which my Scaleway baremetal server and HP ProLiant microserver do not support. I'm assuming the Tensorflow Docker images are compiled in the same way so using those will be no use. I'm experimenting with compiling my own wheel package without the need for these CPU extensions. If there are notable performance issues then I'll look at installing different packages depending on current CPU once our Docker image has loaded. |
Tensorflow build which runs on my HP ProLiant microserver is here https://github.com/damianmoore/tensorflow-builder/releases . Running benchmarks to determine the impact against more optimised one on PyPI. |
These are some quick benchmarks of the PyPI version of Tensorflow versus my own build from the comment above (no CPU optimisations). As expected the unoptimised build performs slower, by not by very much. These were measured using the Object Detection model (which uses this pre-trained model) on a Dell XPS 13 2017 (9370 i7-8550U). I ran 3 object detection predictions with each build and the test code was from this function. There was a common amount of overhead collecting tests etc. that can be removed from all results.
This shows the custom build that works on all the tested machines takes 13.04% longer than the optimised one on PyPI. Alternatively, you could say the PyPI build completes in 88.45% of the time of the custom build. This seems like a small difference in speed and that it would be acceptable use the custom (no CPU extensions) build everywhere. When we have time, we can produce different Tensorflow builds that are downloaded depending on CPU flags that are detected, |
Noticed after running the image that fixed locking issues on the live demo site
The text was updated successfully, but these errors were encountered: