Tensorflow "Illegal instruction" on some machines #48

damianmoore · 2019-01-05T23:59:28Z

Noticed after running the image that fixed locking issues on the live demo site

damianmoore · 2019-01-12T11:35:43Z

Might be solvable by using Tensorflow Docker base image rather than pipenv installing it from pypi.

damianmoore · 2019-01-16T19:51:05Z

This happens because the version of Tensorflow on PyPI is compiled to use CPU instructions like AVX, AVX2, SSE4.1, SSE4.2 and FMA which my Scaleway baremetal server and HP ProLiant microserver do not support. I'm assuming the Tensorflow Docker images are compiled in the same way so using those will be no use.

I'm experimenting with compiling my own wheel package without the need for these CPU extensions. If there are notable performance issues then I'll look at installing different packages depending on current CPU once our Docker image has loaded.

damianmoore · 2019-01-16T21:44:12Z

Tensorflow build which runs on my HP ProLiant microserver is here https://github.com/damianmoore/tensorflow-builder/releases . Running benchmarks to determine the impact against more optimised one on PyPI.

damianmoore · 2019-01-17T23:21:18Z

These are some quick benchmarks of the PyPI version of Tensorflow versus my own build from the comment above (no CPU optimisations). As expected the unoptimised build performs slower, by not by very much. These were measured using the Object Detection model (which uses this pre-trained model) on a Dell XPS 13 2017 (9370 i7-8550U).

I ran 3 object detection predictions with each build and the test code was from this function. There was a common amount of overhead collecting tests etc. that can be removed from all results.

                                    Run 1   Run 2   Run 3   Mean
PyPI build:                         62.74   61.25   61.57   61.85
Custom build (unoptimised):         69.37   70.72   69.66   69.92
Testing overhead (to subtract):     15.96   15.12   15.97   15.68

This shows the custom build that works on all the tested machines takes 13.04% longer than the optimised one on PyPI. Alternatively, you could say the PyPI build completes in 88.45% of the time of the custom build.

This seems like a small difference in speed and that it would be acceptable use the custom (no CPU extensions) build everywhere. When we have time, we can produce different Tensorflow builds that are downloaded depending on CPU flags that are detected,

damianmoore added the bug label Jan 5, 2019

damianmoore added this to the 1.0 milestone Jan 5, 2019

damianmoore self-assigned this Jan 12, 2019

damianmoore added backend ai labels Jan 12, 2019

damianmoore changed the title ~~"Illegal instruction (core dumped)" when running any classification processors~~ Tensorflow "Illegal instruction" on some machines Jan 17, 2019

damianmoore closed this as completed in 4200d97 Jan 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensorflow "Illegal instruction" on some machines #48

Tensorflow "Illegal instruction" on some machines #48

damianmoore commented Jan 5, 2019

damianmoore commented Jan 12, 2019

damianmoore commented Jan 16, 2019

damianmoore commented Jan 16, 2019

damianmoore commented Jan 17, 2019

Tensorflow "Illegal instruction" on some machines #48

Tensorflow "Illegal instruction" on some machines #48

Comments

damianmoore commented Jan 5, 2019

damianmoore commented Jan 12, 2019

damianmoore commented Jan 16, 2019

damianmoore commented Jan 16, 2019

damianmoore commented Jan 17, 2019