exporting JBIG2 images #46

side2k · 2017-01-20T11:45:59Z

This PR is adaptation of this one: euske/pdfminer#107

vstoykov · 2017-02-24T09:16:47Z

There is no cStringIO in Python2.

goulu · 2017-04-18T15:03:04Z

Please check why tests fail and fix this before we can merge...
Thank you !

vstoykov · 2017-08-01T13:22:23Z

@side2k you can replace input_stream = StringIO() with input_stream = BytesIO() (BytesIO is already imported) and remove the import of StringIO. Also you can rebase probably.

Adapted from the original at pdfminer/pdfminer.six#46

pietermarsman · 2019-07-14T14:02:27Z

This PR needs only a little work. @side2k can you do that?

pietermarsman · 2019-07-14T14:03:10Z

This PR fixes #26

side2k · 2019-07-14T14:31:23Z

@pietermarsman I will look into it within a nearest couple of hours

… references reading

…d corruption

side2k · 2019-07-14T14:52:08Z

@pietermarsman i've rebased and added the fixes. All the tests are passing now.

side2k · 2019-07-14T14:54:25Z

it would be really great to get a code review from someone who is familiar with the current state of things in pdfminer - I didn't touch it for a couple of years by now.

pietermarsman · 2019-07-14T16:05:12Z

Nice, quick response! :) Do you have a pdf and script to test the changes with? That would make it a lot easier for me to review the code and understand what it does and what you have changed.

I don't have a lot of experience with pdfs, nor with pdfminer. But I want to learn that and I can give your code a thorough sanity check.

pietermarsman · 2019-07-16T19:31:19Z

(@side2k , not sure if you've missed my previous message, this is a friendly notification if you did)

Do you have a pdf and sample code to test this with? I can't understand and check this PR if I don't have a pdf that triggers this code.

side2k · 2019-07-16T22:47:13Z

@pietermarsman I think I had once, but right now I dont have an opportunity to search. Maybe later. I am on a business trip right now, sorry.

pietermarsman · 2019-07-22T18:29:05Z

@side2k let us know when you find it.

I think we can close this PR until we have some testing material. Once we have that we can reopen, test, review and merge.

pietermarsman · 2019-08-19T07:33:50Z

@ganeshtata, there is no pdf to test this PR with. Do you agree that we cannot merge this PR if there is nothing to test?

pietermarsman

I've tried testing this with the this pdf: jbig2.pdf.

I needed to make a few adjustments to make this work, they are all related to byte-strings vs. normal strings.

The output I got has the extension .jb2 but I could not determine if that was a proper jbig2 file since I have no viewer for that, and could not find one on the internet. At least, not one that shows the actual image. I've also attached the output image (Im1.jb2.zip).

pietermarsman · 2019-10-15T09:10:29Z

pdfminer/jbig2.py

+
+# file literals
+
+FILE_HEADER_ID = '\x97\x4A\x42\x32\x0D\x0A\x1A\x0A'


Should be a bytestring

pietermarsman · 2019-10-15T09:11:21Z

pdfminer/jbig2.py

+        return segments
+
+    def is_eof(self):
+        if self.stream.read(1) == '':


Should compare to a bytestring, e.g. self.stream.read(1) == b''

pietermarsman · 2019-10-15T09:11:36Z

pdfminer/jbig2.py

+        return data_len
+
+    def encode_segment(self, segment):
+        data = ''


Should be a bytestring

pietermarsman · 2019-10-15T09:11:57Z

pdfminer/jbig2.py

+            'flags': {'deferred': False, 'type': SEG_TYPE_END_OF_FILE},
+            'number': seg_number,
+            'page_assoc': 0,
+            'raw_data': '',


Should be a bytestring

pietermarsman · 2019-10-22T08:49:41Z

I've found a pdf with JBIG2 images pdfbox jira.

ItDoesntWorkScan.pdf

And added test for pdf with JBIG2 image. Fixes #26 Closes #46

side2k mentioned this pull request Jan 20, 2017

Add export for JBIG2 images #26

Closed

tataganesh changed the base branch from master to develop November 8, 2018 18:54

eladkehat added a commit to eladkehat/yapdfminer that referenced this pull request May 4, 2019

Add support for JBIG2 images.

6f91413

Adapted from the original at pdfminer/pdfminer.six#46

Leonid Amirov added 11 commits July 14, 2019 17:38

saving JBIG2 streams as .jb2 images

5f09ad2

JBIG2 stream decoding basic routines

2f33f9a

JBIG2 stream writer basic routines(incomplete)

0f70748

finished with basic JBIG2 format encoding routines

dd45b37

JBIG2 file writing routines

aa27c2a

exporting jb2 images from ImageWriter

47f6940

prevent ImageWriter from overwriting images with duplicate names

2c996e8

added read_file() method to JBIG2StreamReader; fixed a bug in segment…

5fc5616

… references reading

fixed bug in JGBIG2StreamReader.encode_retention_flags() causing fiel…

0a055d4

…d corruption

fix rebase issue

c32eb89

replace cStringIO with BytesIO

03a777c

side2k force-pushed the jbig2.six branch from d856ab8 to 03a777c Compare July 14, 2019 14:47

side2k requested a review from goulu July 14, 2019 14:50

cuteufo mentioned this pull request Jul 26, 2019

text extraction while font Encoding is a PDFStream object #279

Closed

pietermarsman added the type: new feature label Oct 13, 2019

pietermarsman requested changes Oct 15, 2019

View reviewed changes

pietermarsman removed the request for review from goulu October 22, 2019 08:49

pietermarsman mentioned this pull request Oct 22, 2019

Pr jbig2 #311

Merged

4 tasks

pietermarsman closed this in #311 Oct 22, 2019

pietermarsman added a commit that referenced this pull request Oct 22, 2019

Added: extraction of JBIG2 encoded images (#311)

373c6e7

And added test for pdf with JBIG2 image. Fixes #26 Closes #46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exporting JBIG2 images #46

exporting JBIG2 images #46

side2k commented Jan 20, 2017

vstoykov commented Feb 24, 2017

goulu commented Apr 18, 2017

vstoykov commented Aug 1, 2017

pietermarsman commented Jul 14, 2019

pietermarsman commented Jul 14, 2019

side2k commented Jul 14, 2019 •

edited

Loading

side2k commented Jul 14, 2019

side2k commented Jul 14, 2019

pietermarsman commented Jul 14, 2019 •

edited

Loading

pietermarsman commented Jul 16, 2019

side2k commented Jul 16, 2019

pietermarsman commented Jul 22, 2019

pietermarsman commented Aug 19, 2019

pietermarsman left a comment

pietermarsman Oct 15, 2019

pietermarsman Oct 15, 2019

pietermarsman Oct 15, 2019

pietermarsman Oct 15, 2019

pietermarsman commented Oct 22, 2019


		# file literals

		FILE_HEADER_ID = '\x97\x4A\x42\x32\x0D\x0A\x1A\x0A'

exporting JBIG2 images #46

exporting JBIG2 images #46

Conversation

side2k commented Jan 20, 2017

vstoykov commented Feb 24, 2017

goulu commented Apr 18, 2017

vstoykov commented Aug 1, 2017

pietermarsman commented Jul 14, 2019

pietermarsman commented Jul 14, 2019

side2k commented Jul 14, 2019 • edited Loading

side2k commented Jul 14, 2019

side2k commented Jul 14, 2019

pietermarsman commented Jul 14, 2019 • edited Loading

pietermarsman commented Jul 16, 2019

side2k commented Jul 16, 2019

pietermarsman commented Jul 22, 2019

pietermarsman commented Aug 19, 2019

pietermarsman left a comment

Choose a reason for hiding this comment

pietermarsman Oct 15, 2019

Choose a reason for hiding this comment

pietermarsman Oct 15, 2019

Choose a reason for hiding this comment

pietermarsman Oct 15, 2019

Choose a reason for hiding this comment

pietermarsman Oct 15, 2019

Choose a reason for hiding this comment

pietermarsman commented Oct 22, 2019

side2k commented Jul 14, 2019 •

edited

Loading

pietermarsman commented Jul 14, 2019 •

edited

Loading