-
-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement target-counter
to create table of contents
#23
Comments
Indeed, WeasyPrint completely ignores JavaScript. If you’re generating HTML from something else, maybe you can generate a table of content at the same time. For example docutils can do this, with with source files in reStructuredText format. Otherwise, you could parse an HTML document with lxml, manipulate it in Python with the lxml API, and pass the lxml tree to WeasyPrint with Unfortunately in any case, you won’t get page numbers in this table of contents. That would require something in CSS like target-counter() which really needs to be in WeasyPrint’s layout engine. |
Okay thanks. |
No, it’s not implemented at all. We’re thinking about how to do it, but it’s not obvious at all. It’s also somewhat low-priority. When you say it throws errors, is it a message logged on stderr or do you get a Python exception with a full traceback? The latter would be a bug. |
I guess there are no other options to create a TOC with page reference then ? |
No, it’s not possible without a lot of work in WeasyPrint itself, or without very dirty hacks. I really recommend not to, but if you want to go with the dirty hack look at the API: http://weasyprint.org/docs/api/#weasyprint.document.Document.make_bookmark_tree |
WeasyPrint is almost the perfect solution for me - except the fact that it doesn't support target-counter yet :-( I do use target-counter for getting the page numbers as you have suggested. |
About
|
Hey there. I was wondering if this is still not supported? I'm considering migrating from wkhtmltopdf to WeasyPrint, because of their lousy support for |
@gnapse Indeed, not much progress on this front since the discussion above. In addition do the trickiness described, it’s also a matter of someone doing the work. |
I've tried for fun to implement |
Are there some simple cases that can be attacked first? For example, the table of contents typically appears at the beginning of a document, so it appears to be vulnerable to the LaTeX instability, however the pages at the beginning of a document are frequently numbered with Roman numerals, so the fact that the table of contents is at the beginning doesn't actually matter if it indexes only the main body of a document. I haven't looked at the code in any depth yet but I guess I will soon! Realistically I don't have much time so if it's that complicated I'll fail but I can offer a crate of beer to anyone who gets this in :-) http://comicsagogo.files.wordpress.com/2011/10/asterix-and-british-food2.jpg |
@bitdivine The problem is with finding out the page number for elements that are later in the document (which haven’t been processed yet.) This is unfortunately the common case for table of contents at the beginning of a document. |
How about something like this: Push the header, including the table of contents, to the end of the file. Now page numbers aren't a problem. Finally in the function "layout_document" roll the end back to the beginning: pages = pages[index_of_head:]+pages[:index_of_head]? However what is a nice way of parameterising the shuffle? Can we use omega notation for the page numbers? I can try this when I get home. |
Here is a proof of concept. It reverses the pages and the first page (in my toy document) is page 14 and labelled a such: Can we replace that reverse with sort(...) of some kind? Is a good way of representing this in HTML to have divs that enclose booklets and order first by booklet, then by page number? |
Oops - fix:
|
If you call See details in the documentation: http://weasyprint.org/docs/api/ So yeah, I suppose we could add limited support for This sounds reasonable. It’s just a matter of someone doing the work now. Unfortunately nobody is actively working on WeasyPrint at the moment. |
Right, the |
Hi. Are there any news on implementing support for |
@mzu, please refer to #23 (comment). |
Did anything happen related to the proof of concept @bitdivine posted last year? The docs don't seem to suggest so, this If not, I might hack on it a bit and report back with some toy examples. |
@ingcake The last paragraph of #23 (comment) is still relevant. |
Thanks. I'll have a look at it sometime in the next week. |
I'm afraid I haven't followed up, and I'm unlikely to have time to do so in the near future. I can but wish you bonne chance! |
I think I see a strategy for tackling the ToC issue as well as front matter in general. My concern is that this might fall into the aforementioned "very dirty hack" territory, and it requires fixing named pages. The user could label specific content with
If the above doesn't sound too hackish, I'd be interested in working on named pages (#57). With that worked out, the above should be far easier to implement. It seems like it would have a nontrivial impact on the WeasyPrint spec, though. EDIT: Namely, the user would need to know what id's WeasyPrint will look for. |
target-counter
to create table of contents
bump? I really need a TOC, any update on named pages? |
@mpicard bump? I really need a patch ;) |
@liZe bring me up to speed? From what I saw named pages is required as well? |
Yes, no offense!
Cool! Adding a table of contents (tables and pages) at the end of the document is easy with only Python and lxml.
@SimonSapin is too shy to admit that his ideas are generally damn good. I really recommend not to follow his "I really recommend not to", just try to play with his awful dirty hack: it's a good way to understand how it currently works, what can be done (adding the ToC at the end) and what's impossible without big changes (getting page numbers before rendering the whole document, getting the titles without lxml, etc.) |
@eenblam send me a message if you decide to dig into this, I need a fix for this as well so I would be willing to help out and discuss with you what I find. |
Too bad you can't handle JS. Phil Schatz's css-polyfills.js shim handles a slew of CSS3 generated content. I have a demo HTML doc with autogenerated TOC, LOF, LOT, and Acronym sections (MIL-STD style). All it needs is leader() for the front matter. |
I do realise that this issue had its name changed to handle the implementation of 'target-counter', however, whenever you search for information about generating a table of contents using WeasyPrint, you land up here, and the contents of this issue thread make it seem as if generating the table of contents isn't straightforward. But I'd just like to emphasize that it is straightforward, making WeasyPrint even more appealing. The 3rd comment by @SimonSapin at the beginning of the thread basically explains how to do it, but I'd just like to outline what I did so that anyone coming back here doesn't leave as disappointed as I originally left, because as I said, it is possible and straightforward (and does not use "very dirty hacks" nor is it "contrived", as @SimonSapin put it):
Note: you'll definitely have to modify this code to format the table of contents in HTML nicely
QED |
@doronhorwitz You are a genius - that little recipe needs to be included in the WeasyPrint documentation, being unable to have tables of contents was one of the reasons I wasn't using this library |
@doronhorwitz how does this approach handle page numbers, say when the table of content is 1, 2, 3 pages long? Can it break the table into several pages? |
From what I can tell from the code its pretty boiler plate, leaving the actual construction of the "Table of Contents" up to who ever is using it. In this case, if it is HTML there are no breaks so its a big run on blob of text. Changing the
The above gets page numbers floated on the right (like most tables of contents), the rest of the styling is left as an exercise for the reader, but it would likely be done in a template rather than in code and is indicative of how it would work. But, the approach I am taking is this:
Its important to note the "main part of the document" will have the pages number normally, with the title pages and table of contents outside that main flow (much like a regular document anyway), so regardless of how big the table of contents gets its flow doesn't mess the page count of the "actual document". |
From what I gather in this issue, the problem is that you'd have to make a second pass to insert the page numbers in the TOC after the rest of the document is laid out. Could we then make the simplified assumption that the page numbers are absolutely positioned (or similar) and thus don't affect the position of the following boxes so we wouldn't have to trigger a reflow? It's not perfect, but it would be very useful... so we wouldn't have to make another layout pass, but simply write some code to insert the (absolutely-positioned) page numbers after all the other layouting is done. btw, here is some HTML/CSS to demonstrate how |
After diving into WeasyPrint's code for several days I finally brought
Conclusion: At the moment my implementaion of |
@Tontyna Fixing #652 also fixes this issue too, doesn't it? Of course, the original problem is not really solved, as we can't create a TOC in pure CSS. As there's nothing in the spec allowing such a feature, we may just close this issue and add @doronhorwitz's reciepe to the documentation. |
You're right. A TOC requires a script, @doronhorwitz 's is a good starting point. Although the page numbers wont be the right ones when With #652 available I'd automate my TOCs with a script that extracts the headings / bookmark-labels and injects a html-snippet like the |
Indeed. |
I'm closing this issue, as there's nothing more we can do here according to the current CSS specifications. The HTML template engine has to add empty links, see the report sample as an example. |
@liZe could you please link to the code that generates the pdf in the |
I just called |
This is possible and I got it working in my project. Just look at the "report" sample on WeasyPrint's site and you'll see how you can get Page Number and add it to your own, custom, table of contents: |
I'd like to automatically create a table of contents in my document.
I am thinking of using a small piece of Javascript....
But I have a feeling WeasyPrint doesn't process javascript.. ?
Or are there other ways of doing this ?
Sander.
The text was updated successfully, but these errors were encountered: