Find image coordinates and which page an image is part of when extracting images #963

nealdyrkacz · 2021-08-12T15:05:57Z

Using the answers from this issue I can find all the images and their in a PDF, what I need to know is what page these images are apart of. I'm having issues deciphering where to find this in the doc's context and when you actually find an image. And is there a way to decipher the coordinates of found image?

   const enumeratedIndirectObjects = pdfDoc.context.enumerateIndirectObjects();
    enumeratedIndirectObjects.forEach(x => {
      const pdfRef = x[0];
      const pdfObject = x[1];
    
      if (!(pdfObject instanceof PDFRawStream)) return;
    
      const { dict } = pdfObject;
    
      const subtype = dict.get(PDFName.of('Subtype'));
      const width = dict.get(PDFName.of('Width'));
      const height = dict.get(PDFName.of('Height'));
      const name = dict.get(PDFName.of('Name'));
    
      if (subtype == PDFName.of('Image')) {
       //FOUND IMAGE, NOW WHAT PAGE DOES IT EXIST ON
       //push image coordinates into an array for page
      }
    });

The text was updated successfully, but these errors were encountered:

Hopding · 2021-09-22T22:04:25Z

Determining an image's coordinates would require parsing and processing the page's content stream, which is not something pdf-lib supports. You can get the raw content stream pretty easily (there are some issues out there where I've provided some examples), but parsing them would be a bit tricky and require knowledge about the inner structure of PDF files.

It would be nice to provide explicit support for this though. I've added it to the roadmap in #998.

nealdyrkacz closed this as completed Aug 12, 2021

nealdyrkacz reopened this Aug 12, 2021

nealdyrkacz changed the title ~~Find which page an image is part of when extracting images~~ Find image coordinates and which page an image is part of when extracting images Aug 12, 2021

nealdyrkacz mentioned this issue Aug 12, 2021

How can I get string or image from "PDFDocument.context.indirectObjects" #962

Closed

Hopding added the needs-triage label Sep 19, 2021

Hopding closed this as completed Sep 22, 2021

Hopding removed the needs-triage label Sep 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find image coordinates and which page an image is part of when extracting images #963

Find image coordinates and which page an image is part of when extracting images #963

nealdyrkacz commented Aug 12, 2021 •

edited by Hopding

Loading

Hopding commented Sep 22, 2021

Find image coordinates and which page an image is part of when extracting images #963

Find image coordinates and which page an image is part of when extracting images #963

Comments

nealdyrkacz commented Aug 12, 2021 • edited by Hopding Loading

Hopding commented Sep 22, 2021

nealdyrkacz commented Aug 12, 2021 •

edited by Hopding

Loading