Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce the Buf type #47

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open

Introduce the Buf type #47

wants to merge 26 commits into from

Conversation

LaurenzV
Copy link
Contributor

So, the problem I'm trying to solve is that for PDF 1.4 (and by extension PDF/A1), there are a couple of more architectural limits that need to be upheld, and doing so on the caller-side is very hard.

The idea is simple: Right now, must writers (i.e. mainly the Chunk type, but also the writers for cmaps and postscript code) use a simple vector of bytes to write to). This PR introduces a Buf type, which is a very thin wrapper around a vector of bytes, but in addition to that it holds a struct which tracks the limits of the data types that were used in the buffer. The caller can then collect those and update them for each chunk it creates, using the merge method. And in the very end, we can check whether at any location in the PDF, we've used a limit that is higher than the allowed one. I've already integrated this in krilla, and it seems to work pretty nicely.

It would be nice if the merging would happen in pdf-writer by forcing the user to pass a Buf instead of Vec<u8> wherever possible (for example, when writing a content stream), but the problem is that the user has to be able to add arbirary data to a stream (since any kind of binary data can appear in a stream, instead of just content streams), and the user currently has no way of constructing a Buf manually using a vector of bytes. I could change that (and if I do that I might as well implement DerefMut for Buf so that I don't need to wrap reserve for example), but I think it's fine if for now, the user has to merge the limits themselves manually. Let me know if you think this would be better, though.

If you are happy with the overall approach, I'll try to add some test cases for limits merging, too.

src/buf.rs Show resolved Hide resolved
src/buf.rs Outdated Show resolved Hide resolved
src/buf.rs Outdated Show resolved Hide resolved
src/buf.rs Outdated Show resolved Hide resolved
src/buf.rs Outdated Show resolved Hide resolved
src/chunk.rs Outdated Show resolved Hide resolved
src/chunk.rs Outdated Show resolved Hide resolved
src/renumber.rs Outdated Show resolved Hide resolved
src/content.rs Show resolved Hide resolved
assert_eq!(
chunk.as_bytes(),
b"1 0 obj
<<
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asserting the PDF in the example is a bit unconventional. Do you think we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Print it instead? Or just remove it completely?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other examples write it to a file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah but the other examples also create an actually valid PDF file. :p

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it so it’s a valid PDF too, but maybe a bit unnecessary if it’s just about showing how to use the Limits API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Maybe we can just completely ignore it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So remove all asserts? Or just the one from the PDF buffer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants