Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate PDF files without embedding default PDF fonts #551

Open
FelixSchwarz opened this issue Dec 15, 2017 · 12 comments
Open

Generate PDF files without embedding default PDF fonts #551

FelixSchwarz opened this issue Dec 15, 2017 · 12 comments
Labels
feature New feature that should be supported

Comments

@FelixSchwarz
Copy link
Contributor

I'm generating a few hundred PDF files with weasyprint and the total size of the PDF files starts to add up. That in turn starts pushing some other legacy systems quite badly so I'm looking for ways to reduce the final PDF size.

I noticed that proprietary PDF tools are able to "optimize" my generated PDFs which results in way smaller PDF files. Now I'm looking for ways to do achieve similar effects with WeasyPrint.

From what I can see in Evince each of my PDFs has some embedded fonts (even though I don't have any special font requirements). My hope was that I could somehow tell WeasyPrint to use Helvetica which is (AFAIK) a PDF default font which doesn't need to be embedded.

Unfortunately so far I was unable to achieve this. WeasyPrint always embeds a font. Is there any way to tell WeasyPrint it should NOT embed Helvetica/ensure only standard PDF fonts are used?

@liZe liZe added the feature New feature that should be supported label Jan 4, 2018
@liZe liZe changed the title how to use PDF default fonts (= reduced PDF size)? Generate PDF files without embedding default PDF fonts Jan 4, 2018
@brnosouza
Copy link

Do you guys have any update on this issue? I would love to use this feature

@liZe
Copy link
Member

liZe commented Mar 27, 2020

Do you guys have any update on this issue? I would love to use this feature

The best way to get this is to get rid of Cairo (see #841), but that’s hard to do.

@bl-ue
Copy link

bl-ue commented Jan 25, 2021

Any progress since March?

@FelixSchwarz
Copy link
Contributor Author

@bl-ue I think #1232 and https://github.com/CourtBouillon/pydyf should provide a path forward so this can be implemented.

@grewn0uille
Copy link
Member

@bl-ue I think #1232 and https://github.com/CourtBouillon/pydyf should provide a path forward so this can be implemented.

That’s right!
We replaced Cairo by our own PDF generator on the master branch. There is some work to do before we can make a release. You can follow #1232 to track the progress.
After a release without Cairo, we will be able to work on features like this one!

@bl-ue
Copy link

bl-ue commented Jan 25, 2021

Wow! Thank you for the fast response!! 🤩

We over @tldr-pages use WeasyPrint several times a day, and the PDF that contains all of our (4000?) pages is getting a little big...:smile:

I'll be sure to track that issue and introduce the rest of the team to it. Again, thank you guys for your extremely fast responses!!!

@liZe
Copy link
Member

liZe commented Jan 26, 2021

I'll be sure to track that issue and introduce the rest of the team to it. Again, thank you guys for your extremely fast responses!!!

You’re welcome!

I’ve read your script, and your problem is not the one described by this issue. Your problem is that fonts are embedded multiple times when pages from multiple WeasyPrint documents are put in a single PDF. Could you please open a new issue with a link to your Python script? Thank you!

liZe added a commit that referenced this issue Feb 1, 2021
The hb_face value is different when the Pango font is different, and it causes
problems when multiple Pango contexts are used (for example when pages from
multiple documents are mixed).

Related to #551.
@liZe
Copy link
Member

liZe commented Feb 1, 2021

@bl-ue Your problem is fixed (I hope) with the current master branch. If you can test TLDR with the next version of WeasyPrint, I’d be glad to know if it works for you (and of course get your bug reports 😉). If it doesn’t work, please open a new issue!

@bl-ue
Copy link

bl-ue commented Feb 1, 2021

Okay @liZe, wonderful! I'm sorry I never got to the issue—I got busy :)

P.S. Your team is really responsive! Good choice to pick WeasyPrint! :D

liZe added a commit that referenced this issue Feb 1, 2021
It’s probably slow, but at least it’s reliable. We can find a better solution
later.

Related to #551.
@liZe
Copy link
Member

liZe commented Feb 1, 2021

Here’s the result for French pages.

tldr-pages-pydyf.pdf

PDF size was 3.4MB, it’s now 400kB 🎉.

P.S. Your team is really responsive! Good choice to pick WeasyPrint! :D

❤️

@mailq
Copy link

mailq commented Nov 4, 2024

I'd like to point again at this issue, as it is almost seven years old and the preconditions were met four years ago.

I'm not a Python developer, but I would argue that this fix should be only a few lines of code in fonts.py. If it encounters one of the 14 default PDF fonts, it should be handled differently. This request is not compatible with PDF/A creation, where all fonts have to be included. But it should be possible to get it working for "normal", "simple" PDFs, where I need this functionality.

@liZe
Copy link
Member

liZe commented Nov 5, 2024

I'm not a Python developer, but I would argue that this fix should be only a few lines of code in fonts.py.

I bet it’s not. 😄

Even the test to find if a font matches the embedded font names is probably not trivial. PDF/A is another detail, but there are many open questions to solve.

For example, what happens when a document uses a character in Helvetica that’s outside the Latin character set defined by the PDF specification? We have to embed the Helvetica font, because these characters are not supported by the Helvetica font provided PDF readers. It means that for each font, we have to check the list of glyphs used in a document to define if the font has to be embedded or not. Of course, the supported encodings depend on the font. And for Latin characters, it’s not one encoding, it’s actually 4 different encodings proposed by the specification.

And of course, we’ll need quite a lot of tests. 😄

If someone is interested in opening a pull request, it would be wonderful, we could discuss and improve the PR. But pretending that a bug is "only a few lines of code in fonts.py" minimizes the problem a little bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature that should be supported
Projects
None yet
Development

No branches or pull requests

6 participants