Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find image coordinates and which page an image is part of when extracting images #963

Closed
nealdyrkacz opened this issue Aug 12, 2021 · 1 comment

Comments

@nealdyrkacz
Copy link

nealdyrkacz commented Aug 12, 2021

Using the answers from this issue I can find all the images and their in a PDF, what I need to know is what page these images are apart of. I'm having issues deciphering where to find this in the doc's context and when you actually find an image. And is there a way to decipher the coordinates of found image?

   const enumeratedIndirectObjects = pdfDoc.context.enumerateIndirectObjects();
    enumeratedIndirectObjects.forEach(x => {
      const pdfRef = x[0];
      const pdfObject = x[1];
    
      if (!(pdfObject instanceof PDFRawStream)) return;
    
      const { dict } = pdfObject;
    
      const subtype = dict.get(PDFName.of('Subtype'));
      const width = dict.get(PDFName.of('Width'));
      const height = dict.get(PDFName.of('Height'));
      const name = dict.get(PDFName.of('Name'));
    
      if (subtype == PDFName.of('Image')) {
       //FOUND IMAGE, NOW WHAT PAGE DOES IT EXIST ON
       //push image coordinates into an array for page
      }
    });
@nealdyrkacz nealdyrkacz reopened this Aug 12, 2021
@nealdyrkacz nealdyrkacz changed the title Find which page an image is part of when extracting images Find image coordinates and which page an image is part of when extracting images Aug 12, 2021
@Hopding
Copy link
Owner

Hopding commented Sep 22, 2021

Determining an image's coordinates would require parsing and processing the page's content stream, which is not something pdf-lib supports. You can get the raw content stream pretty easily (there are some issues out there where I've provided some examples), but parsing them would be a bit tricky and require knowledge about the inner structure of PDF files.

It would be nice to provide explicit support for this though. I've added it to the roadmap in #998.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants