Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorflow "Illegal instruction" on some machines #48

Closed
damianmoore opened this issue Jan 5, 2019 · 4 comments
Closed

Tensorflow "Illegal instruction" on some machines #48

damianmoore opened this issue Jan 5, 2019 · 4 comments
Assignees
Milestone

Comments

@damianmoore
Copy link
Collaborator

Noticed after running the image that fixed locking issues on the live demo site

@damianmoore damianmoore added the bug label Jan 5, 2019
@damianmoore damianmoore added this to the 1.0 milestone Jan 5, 2019
@damianmoore
Copy link
Collaborator Author

Might be solvable by using Tensorflow Docker base image rather than pipenv installing it from pypi.

@damianmoore
Copy link
Collaborator Author

This happens because the version of Tensorflow on PyPI is compiled to use CPU instructions like AVX, AVX2, SSE4.1, SSE4.2 and FMA which my Scaleway baremetal server and HP ProLiant microserver do not support. I'm assuming the Tensorflow Docker images are compiled in the same way so using those will be no use.

I'm experimenting with compiling my own wheel package without the need for these CPU extensions. If there are notable performance issues then I'll look at installing different packages depending on current CPU once our Docker image has loaded.

@damianmoore
Copy link
Collaborator Author

Tensorflow build which runs on my HP ProLiant microserver is here https://github.com/damianmoore/tensorflow-builder/releases . Running benchmarks to determine the impact against more optimised one on PyPI.

@damianmoore
Copy link
Collaborator Author

These are some quick benchmarks of the PyPI version of Tensorflow versus my own build from the comment above (no CPU optimisations). As expected the unoptimised build performs slower, by not by very much. These were measured using the Object Detection model (which uses this pre-trained model) on a Dell XPS 13 2017 (9370 i7-8550U).

I ran 3 object detection predictions with each build and the test code was from this function. There was a common amount of overhead collecting tests etc. that can be removed from all results.

                                    Run 1   Run 2   Run 3   Mean
PyPI build:                         62.74   61.25   61.57   61.85
Custom build (unoptimised):         69.37   70.72   69.66   69.92
Testing overhead (to subtract):     15.96   15.12   15.97   15.68

This shows the custom build that works on all the tested machines takes 13.04% longer than the optimised one on PyPI. Alternatively, you could say the PyPI build completes in 88.45% of the time of the custom build.

This seems like a small difference in speed and that it would be acceptable use the custom (no CPU extensions) build everywhere. When we have time, we can produce different Tensorflow builds that are downloaded depending on CPU flags that are detected,

@damianmoore damianmoore changed the title "Illegal instruction (core dumped)" when running any classification processors Tensorflow "Illegal instruction" on some machines Jan 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant