-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to create a PDF with UTF-8 character encoding? #181
Comments
@shaolinh84, I am looking into it because it seems that the openhtmltopdf is not converting the characters in the HTML (taken from the String variable passed to openhtmltopdf): <html><head><meta http-equiv="content-type" content="text/html; charset=UTF-8"></head><body>
<ul>
<li>Test PDF with Unicode chars: Общие</li>
</ul>
</body></html> The resulting PDF is: It could be some configuration that is missing. |
@shaolinh84, it seems that the PDF conversion depends on the fonts which are used and whether they have the given Unicode characters. You should skip the flexmark-java PDF converter and build your PDF conversion with the code used in the converter and add fonts available in the PDF. I have not done this yet so it is a theoretical solution. The code in PDF converter extension is: public static void exportToPdf(final OutputStream os, final String html, final String url, final PdfRendererBuilder.TextDirection defaultTextDirection) {
try {
// There are more options on the builder than shown below.
PdfRendererBuilder builder = new PdfRendererBuilder();
if (defaultTextDirection != null) {
builder.useUnicodeBidiSplitter(new ICUBidiSplitter.ICUBidiSplitterFactory());
builder.useUnicodeBidiReorderer(new ICUBidiReorderer());
builder.defaultTextDirection(defaultTextDirection); // OR RTL
}
org.jsoup.nodes.Document doc;
doc = Jsoup.parse(html);
Document dom = DOMBuilder.jsoup2DOM(doc);
builder.withW3cDocument(dom, url);
builder.toStream(os);
builder.run();
} catch (Exception e) {
e.printStackTrace();
// LOG exception
} finally {
try {
os.close();
} catch (IOException e) {
// swallow
}
}
} The pdf renderer builder has a function to add a font to the pdf conversion, from what I understand. public PdfRendererBuilder useFont(FSSupplier<InputStream> supplier, String fontFamily, Integer fontWeight, PdfRendererBuilder.FontStyle fontStyle, boolean subset) {
this._fonts.add(new PdfRendererBuilder.AddedFont(supplier, fontWeight, fontFamily, subset, fontStyle));
return this;
} A better example is in the issues for openhtmltopdf: danfickle/openhtmltopdf#129 |
I have some isssue... |
A solution to the font problem is to define an embedded TrueType font in the style or stylesheet and set the For example including Noto Serif/Sans/Mono fonts and adding However, the PDF converter requires TrueType fonts and For my test I used If the installation directory for the fonts is @font-face {
font-family: 'noto-cjk';
src: url('file:///usr/local/fonts/arialuni.ttf');
font-weight: normal;
font-style: normal;
}
@font-face {
font-family: 'noto-serif';
src: url('file:///usr/local/fonts/NotoSerif-Regular.ttf');
font-weight: normal;
font-style: normal;
}
@font-face {
font-family: 'noto-serif';
src: url('file:///usr/local/fonts/NotoSerif-Bold.ttf');
font-weight: bold;
font-style: normal;
}
@font-face {
font-family: 'noto-serif';
src: url('file:///usr/local/fonts/NotoSerif-BoldItalic.ttf');
font-weight: bold;
font-style: italic;
}
@font-face {
font-family: 'noto-serif';
src: url('file:///usr/local/fonts/NotoSerif-Italic.ttf');
font-weight: normal;
font-style: italic;
}
@font-face {
font-family: 'noto-sans';
src: url('file:///usr/local/fonts/NotoSans-Regular.ttf');
font-weight: normal;
font-style: normal;
}
@font-face {
font-family: 'noto-sans';
src: url('file:///usr/local/fonts/NotoSans-Bold.ttf');
font-weight: bold;
font-style: normal;
}
@font-face {
font-family: 'noto-sans';
src: url('file:///usr/local/fonts/NotoSans-BoldItalic.ttf');
font-weight: bold;
font-style: italic;
}
@font-face {
font-family: 'noto-sans';
src: url('file:///usr/local/fonts/NotoSans-Italic.ttf');
font-weight: normal;
font-style: italic;
}
@font-face {
font-family: 'noto-mono';
src: url('file:///usr/local/fonts/NotoMono-Regular.ttf');
font-weight: normal;
font-style: normal;
}
body {
font-family: 'noto-sans', 'noto-cjk', sans-serif;
overflow: hidden;
word-wrap: break-word;
font-size: 14px;
}
var,
code,
kbd,
pre {
font: 0.9em 'noto-mono', Consolas, "Liberation Mono", Menlo, Courier, monospace;
} Sample PdfConverter.java updated. Wiki Page with information added: PDF-Renderer-Converter |
This is my failing test in kotlin:
This is my parser class:
Parser is com.vladsch.flexmark.parse.Parser, HtmlRenderer is com.vladsch.flexmark.html.HtmlRenderer.
As I am just passing Outputstream to the PdfConverterExtension I don't have control in writing the data. Is there a possibility to create PDF with UTF-8 Characters? The html content still has the correct HTML encoding
The text was updated successfully, but these errors were encountered: