-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add benchmark examples to quantify performance overhead of JavaScript functions #628
Comments
I created a benchmark recognizing 3 images (one scanned page, one mid-sized screencap with images, and one small screencap) 10 times each. This was run using both tesseract.js and tesseract.js-core. For both the browser and Node.js versions the JavaScript functions added a non-trivial amount of performance overhead, however the Node.js version stands out with tesseract.js requiring 377% more time than using tesseract.js-core directly. Will have to follow up and troubleshoot.
|
Added benchmark code and assets per #628
#631 appears to have resolved the performance issues seen above with Node.js. When rerun on commit be956cd of tesseract.js and commit 2f0071c of tesseract.js-core (and using the SIMD-supported build), the results are below.
The performance overhead for Node.js is now an acceptably low 8 seconds. However, the tesseract.js overhead for the browser version is unacceptably high--nearly doubling the runtime. This is likely caused by the implementation of exif-based auto-rotation (see #604). |
Reverting the exif rotation implementation (#634) from the browser version indeed reduced the overhead of tesseract.js to a reasonable amount, comparable with the Node.js version.
|
Running the same benchmark using the Tesseract API (not Therefore, when run on a modern desktop (with AVX2 instructions), the webassembly version has roughly 2x the runtime. This is half attributable to webassembly only supporting 128 bit SIMD instructions and half attributable to other factors. The above numbers apply only to the LSTM model, for which performance is almost entirely driven by how fast a single function can perform matrix multiplication. Performance is much more similar between the native and webassembly versions for the Legacy model. |
There are currently many complaints regarding slow performance--sometimes the root cause is the .wasm Tesseract engine (tesseract.js-core) while other times the root cause has been a JavaScript function in this repo. We should implement standardized benchmark code in both this repo and tesseract.js-core. This will quantify the performance overhead of the JavaScript functions (over using the tesseract.js-core functions directly). This should assist with troubleshooting performance issues, as well as assessing the performance impact of future merge requests.
The text was updated successfully, but these errors were encountered: