-
-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issue on long documents #578
Comments
Most of the time is taken by Our mission is:
I think that we'll get better results with 1. than with 2., but it depends on what's in your document. If you have a lot of text in blocks whose size depends on their content (for example floats, tables…), then you generally have to split the text twice per block and per page (once to get the minimum size and once to get the real layout) and thus have to call Is it possible to get one of your reports HTML+CSS? I'll give you hints about where to start. |
Thanks @liZe, |
Hi @liZe, It seems like the slow performance is related with the following CSS I am using: td{ |
Thanks a lot. Using
That's much better 😄. |
Here's what's happening:
Moreover, there's something strange in Pango that makes WeasyPrint render the Pango layout twice. The part of the code that can be optimized is here. |
Just an idea. Quoting the docs on pango_cairo_create_layout()
Bold applied by me. This function is called in The relevant code # text.py, line 626
cairo_dummy_context = (
cairo.Context(cairo.ImageSurface(cairo.FORMAT_ARGB32, 1, 1))
if hinting else cairo.Context(cairo.PDFSurface(None, 1, 1)))
self.layout = ffi.gc(
pangocairo.pango_cairo_create_layout(ffi.cast(
'cairo_t *', cairo_dummy_context._pointer)),
gobject.g_object_unref) looks to me like it's alway the same |
Saves a loooooot of time when a lot of text is drawn. Related to #578.
It's hard (that hard?) to use the same layout, as we currently use a different layout for each box and don't rely on the powerful but limited options provided by Pango to split lines and use different font sizes / weights. But as a first step, it's easy to create only once the dummy context used to create this layout. I've done this in
Please share your ideas whenever you want 😉. |
Hoi, we had a similar performance issue with documents that accidentally contained lines which consisted of "words" about 100kB each. Weasyprint spent a lot of time rendering this (more than 1.5 hr, I didn't wait). I agree that the Anyway, to workaround this we added the following shortcut:
I'm not sure this is a good fix for the general case, but at least in our case it reduces the rendering time for these problematic documents to seconds. |
That's a case that's not handled by our little shortcut here: Lines 989 to 998 in 840c4a6
This optimisation tries to render only the beginning of the text and checks that the line has been cut. If it's been cut, it means that we're OK with this first line and that we can keep it. If it's not cut, we cancel the optimisation and render the whole text instead. It doesn't work for you because long words are never broken, so it renders the whole texts (minus the already rendered text) for each line.
Unfortunately, it's not a good fix for everyone, for one major reason: you don't render the whole text, meaning that some text can be lost in the operation. For example, if your first line can handle more than (2.5 × font-size) characters, you'll lose the characters after this offset. And even if this lost text is displayed outside of the page boundaries, it has to be rendered as it is at least included in PDF metadata.
The problem is that we're using Pango's high-level functions to find line cuts. These functions are known to be pretty slow, but they're useful to render whole paragraphs. What's ironic is that we never use it to render paragraphs, as it's too limited for our use case: Pango doesn't handle all the CSS features using its markup language. A very good explanation about this can be found on Behdad Esfahbod's blog. Using Pango's low-level functions is probably the next step (before we only use Harfbuzz as other browsers do 😇). If anyone is interested (and also very, very patient), I can help! |
Hello! (The survey is now closed. Thanks for all your answers! We’ll share the results soon 😉) If you’re interested in better performances, we created a short survey where you can give a boost to this feature and help us to improve WeasyPrint 😉 Vote for it! |
Other observations from digging through the profiling I did for #1587 for a bit:
I suspect I'd mostly be looking to improve things somewhere inside remake_page, and possibly by seeing if there's anything we can memoize on backgrounds or margin boxes. draw_text is 14% of our weight, and get_first_line takes about 10% total so if there are obvious things to do on either of those they could be impactful as well. (I have fantasized about replacing Pango's line breaking with something more purpose-built on HarfBuzz, but I doubt I'll have the time. That said, simply caching get_first_line calls might be of some use.) But again, things look like there won't be many obvious quick wins in a "simple" document. I think there are a few gotchas around multi-column lists that are worth digging into, and they'll probably exist in some other edge cases, but that's more an effort of finding documents that hit them than "general" optimization. |
@aschmitz Thanks a lot for finding and sharing these numbers. Before discussing the details, here is some general information about performances for WeasyPrint, for everybody that would be interested in this topic. WeasyPrint is quite slow by choice and by design, but it doesn’t mean at all that we can’t improve its speed. That’s the major point we’re working on for the next release, as it was one of the important topics for WeasyPrint’s users in the survey last year. I personally use the samples of WeasyPerf to profile both the speed and the memory use. I render a document using the cProfile, and then use gprof2dot and Graphviz to get a nice SVG graph. There are lots of other tools for that, but that’s my favorite stack (so far)! The samples are different enough to get very different results. It’s often very upsetting to get a nice 10% improvement with one sample, and then discover that it doesn’t change anything (or worse: you get slower results) for the other samples. That’s really nice to have these numbers, probably with a large document (like in #1587) with a quite small stylesheet.
That’s a place where we can definitely improve performance. And that’s actually already the result of a large speed improvement, as we now use lazy dictionaries for CSS attributes that calculate the values (often using the values of their parents) on demand. Before that, we were calculating all the CSS properties’ values, even the ones that we don’t use. 10% is huge, but we call this function almost each time a CSS value has to be calculated. One solution would be to call it less often, maybe by pre-calculating some values that will for sure be useful. And of course, we can find tiny optimizations in the function code.
This value can be much higher (+30%) with large stylesheets (for example with Bootstrap, but even with some Sphinx documentation as WeasyPrint’s doc). And we could totally improve the styling part when Python is able to provide "real" multithreading :). Latest CSSSelect2 improvements give also nice results.
Contrary to what we could think with the function name,
I’ve not seen that with the samples, there’s definitely some improvements waiting to be found.
I’ve tried to work on that a few weeks ago, but the speed improvements were not worth the code complexity. I’ll probably want to try again later :).
#1481 is already really nice, but we could find a better way.
That’s a function I’ve tried to improve so many times. I regularly find little optimizations, but I’m never happy as I feel I’ve not found THE solution. I have to admit that currently, most of the time is actually spent in Pango, that we don’t call Pango functions too often according to the number of lines we have in a document, and… Well. Creating a layout and using it to render only one line is so stupid. But that’s also why it’s "simple" (@Tontyna wouldn’t agree) and at least maintainable.
Cairo didn’t help for that :).
fonttools/fonttools#2467 helped. We have larger percentages when a lot of fonts are embedded.
That’s a paragraph I could have written (well, I actually wrote bea6cef).
Multi-column and flex are sometimes terribly broken and terribly slow. But let’s keep some work for version 56 :). |
Hi I am working with long reports (~100 pages) and in some cases the rendering time is a bit overwhelming. At first I thought the issue was related with loading images (about 4 images / page)
but after some debugging I noticed the code is extremely CPU intensive.
I did some profiling on the report using python profile module on the latest version and I got this:
It looks like over half the time is spent on text.py, since I am not familiar with the code I was wondering if someone could explain a bit more what is happening and if there is something I can do to improve the performance for this specific report.
Best,
The text was updated successfully, but these errors were encountered: