-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of arrayBufferToBinaryString #3121
Improve performance of arrayBufferToBinaryString #3121
Conversation
* Skip expensive buffer overflows * Use TextDecoder when possible to speed up conversion
2f0b78c
to
38638d7
Compare
I'm unable to run unit tests on my Mac (10.14) even before my changes. |
Thanks for this PR. Is |
I was thinking about that as well but didn't have time check it yet. I can test that as well soon. |
One thing to note though, support for TextDecoder is kind of new so it isn't possible to rely solely on that. And the old functionality processed through 100x20px image in sub-milliseconds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few things I think should be worked out. I've worked with binary strings before, and they should be avoided at all costs, but since it's a bit hard to remove all the uses of binary strings in jsPDF, it's best to maximize performance as much as possible.
src/modules/addimage.js
Outdated
var decoder = new TextDecoder("ascii"); | ||
return decoder.decode(buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, for binary strings, this is not going to work. ASCII only covers 127 code points, i.e. 0x00 through 0x7F. We need 0x00 through 0xFF (the entire byte range). For example, this fails:
const inputBuffer = new Uint8Array([252, 129, 187, 147]);
const binStr = new TextDecoder('ascii').decode(inputBuffer.buffer);
console.log(binStr.split('').map(str => str.charCodeAt(0)));
// [252, 129, 187, 8220]
As it turns out, TextDecoder doesn't work well for binary string encoding, and this case should probably be removed entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how this passed the test cases actually...Any ideas? Since that simple example yielded malformed output I would expect something to have broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how it passed library tests but it passed my own visual validation because the color palette was so small. In the end I did find it produced malformed data as some grays were going to green.
For some reason I had in mind ascii was 8 bit as well. And after some research TextDecoder still isn't a choice with any given text encoding to replace fromCharCode...
src/modules/addimage.js
Outdated
if (buffer.length < ARRAY_BUFFER_LIMIT) { | ||
try { | ||
return atob(btoa(String.fromCharCode.apply(null, buffer))); | ||
} catch (e) { | ||
// Buffer was overflown, fall back to other methods | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this works (or at least it shouldn't if the second argument is an ArrayBuffer
). ArrayBuffer
cannot be indexed and therefore cannot be used in .apply
. I'm pretty sure you need to convert to Uint8Array
first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I know atob(btoa(str))
was in the previous code, but I don't think atob(btoa(str))
is any different from str
. Is there an example that differs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was wondering about that as well.
src/modules/addimage.js
Outdated
// Heuristic limit after which String.fromCharCode will start to overflow. | ||
// Need to change to StringDecoder. | ||
var ARRAY_BUFFER_LIMIT = 6000000; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This figure doesn't seem right. The max frame size is 1MB to begin with, so 6MB is too big for sure. On my device, the limit is around 100kB, but I'm pretty sure going for around 16K is safe. Could you verify this in the console?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems a bit large for me, too. 16k sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some benchmarks from different batch sizes: https://jsbench.me/x5kmnswnmz/1. Sweet spot at least for Chromium 89 seems to be at 4096-8192 slight degradation at 16k but slowing down rapidly after that. Manual testing shows minor GCs going to major GC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's take 8k? :)
By the way @Pantura, have you verified you're using the latest version of |
I did update and it didn't really improve that much. I actually took yet another approach to the data format as I noticed it was doing a PNG encode/decode for nothing. There will soon be a PR about possibility to input RGBA Array. This format wouldn't allow passing through image size (or width would be enough). Would it be ok to add it as an optional parameter? |
@Pantura a PR with an RGBA array/buffer as input would be appreciated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. Thanks for the PR, I will merge it now.
Thanks for the awesome library which seems to be the only way we can output snapshot of a DOM node reliably with extra headers and custom scaling options.
However we stumbled on a problem with CSP rules regarding dataUrls when using image with image dataUrl. I transferred to use a blob url of the image but performance suffered very badly.
After some performance monitoring the worst culprit was found which is fixed in this PR.
This PR has two improvements over the old functionality
Skip expensive buffer overflows
After blob file size starts growing, so will the time spent in trying the direct conversion of the buffer. For a buffer with a size of 14M, my 2019 core i7 Mac takes 100ms before throwing an exception. For 44M that takes 400+ms. Skip trying altogether for massive buffers. The limit could be tuned but this is roughly where the processing times start increasing and overflows begin.
The image in question is about 3840x2160 px when being added to PDF.
Use TextDecoder when possible to speed up conversion
Along with the fix above using TextDecoder is the best option for binary string conversion. I tested direct string concatenation (worst), using predefined array size with final .join (depending on GC cycles, below and above original solution).
Code in this PR runs arrayBufferToBinaryString about 18 times faster (4700 vs 251ms and 3000 vs 153 for two of my test pages).
Old:
Updated:
The total time for adding the image is still suffering from decodePixels performance but I couldn't think of a solution to fix that.