-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCRMyPDF - AttributeError: 'ArrayObject' object has no attribute 'getData' #220 #111
Comments
I took a guess at what the problem is and I think fixed it in the develop branch. Could you send me the actual PDF you used (Dropbox or something)? There's something unusual about the particular PDF you tried. At least, it's different from the other PDFs I've tested. I'd also like to add it to the test suite. Thanks! |
Presumed to be fixed |
I am sorry for the delay, I managed to change my partition table so that ubunut wouldn't boot anylonger and it took some time to fix it. You'll find the PDF followig the link below: Edit: Using the current version, I get the message: xxx.pdf: not a valid PDF, and could not repair it |
Thank you for providing the PDF file. It's very helpful to have examples of all kinds of PDFs out there. In -rc8 I fixed a bug that this PDF file triggered by not having the document info dictionary which is technically optional but present in almost all PDFs files. That problem, however, does not explain the error message you encountered. That error message comes from I think if you upgrade to qpdf to >= 5.1.3 and ocrmypdf to -rc8 you should see the problem fixed. |
I upgraded qpdf and ocrmypdf to the current versions and it works as long as I don't use ImageMagick and unpaper (ocrmypdf -i input.pdf output.pdf), but using one of them (ocrmypdf -d -c input.pdf output.pdf) generates the following error: ocrmypdf -c Input.pdf Output.pdf |
ImageMagick is no longer used. What version of unpaper and ocrmypdf are you trying that with? Using your On Mon, 24 Aug 2015 at 12:42 tuxasus notifications@github.com wrote:
|
I am using unpaper 0.4.2 and 3.0rc8 and ubuntu as OS Edit: same problem using unpaper 6.1 |
unpaper 0.4.2 is the problem. It seems to produce invalid output files Dockerfile shows how to build unpaper 6.1 from source. On Tue, 25 Aug 2015 at 10:25 tuxasus notifications@github.com wrote:
|
I get the same Problem with unpaper 6.1 ocrmypdf -c Input.pdf Output.pdf |
Thanks for your patience with this. What is the output of |
ocrmypdf --version ocrmypdf -v 1 -c Input.pdf Output.pdf Tasks which will be run: Task enters queue = 'ocrmypdf.main.repair_pdf'
Completed Task = 'ocrmypdf.main.repair_pdf' WARNING: Completed Task = 'ocrmypdf.main.generate_postscript_stub'
Copyright (C) 2013 Artifex Software, Inc. All rights reserved. Completed Task = 'ocrmypdf.main.rasterize_with_ghostscript' Original exceptions:
Input.pdf: not a valid PDF, and could not repair it.
|
Should be fixed now (-rc9 and above). |
I have the same problem on Mac Pro yosemite, but I cannot find the -rc9 fix. How would I download it? |
@jbargil Sorry for slow reply. On Sun, 4 Oct 2015 at 01:57 jbargil notifications@github.com wrote:
|
Thank you for your help using version 3.1 everything works pretty fine! |
Hey, I am trying to get OCRmyPDF running on some PDFs generated / archived using my scanner. So far the program should run, but when I try to run it on an existing PDF (freshly scanned) I get the following error message:
[code]
/usr/lib/python3/dist-packages/pkg_resources.py:1031: UserWarning: /home/florian/.python-eggs is writable by group/others and vulnerable to attack when used with get_resource_filename. Consider a more secure location (set with .set_extraction_path or the PYTHON_EGG_CACHE environment variable).
warnings.warn(msg, UserWarning)
Traceback (most recent call last):
File "/usr/local/bin/ocrmypdf", line 9, in
load_entry_point('ocrmypdf==3.0rc4', 'console_scripts', 'ocrmypdf')()
File "/usr/local/lib/python3.4/dist-packages/ocrmypdf-3.0rc4-py3.4.egg/ocrmypdf/main.py", line 848, in run_pipeline
cmdline.run(options)
File "/usr/local/lib/python3.4/dist-packages/ruffus-2.6.3-py3.4.egg/ruffus/cmdline.py", line 824, in run
**appropriate_options)
File "/usr/local/lib/python3.4/dist-packages/ruffus-2.6.3-py3.4.egg/ruffus/task.py", line 5938, in pipeline_run
raise job_errors
ruffus.ruffus_exceptions.RethrownJobError:
Original exception:
Exception #1
'builtins.AttributeError('ArrayObject' object has no attribute 'getData')' raised in ...
Task = def ocrmypdf.main.repair_pdf(...):
Job = [source.pdf -> .../com.github.ocrmypdf.gao5vxz1/source.repaired.pdf, <ocrmypdf.main.WrappedLogger>, [], <_thread.lock>]
Traceback (most recent call last):
File "/usr/local/lib/python3.4/dist-packages/ruffus-2.6.3-py3.4.egg/ruffus/task.py", line 751, in run_pooled_job_without_exceptions
register_cleanup, touch_files_only)
File "/usr/local/lib/python3.4/dist-packages/ruffus-2.6.3-py3.4.egg/ruffus/task.py", line 567, in job_wrapper_io_files
ret_val = user_defined_work_func(*params)
File "/usr/local/lib/python3.4/dist-packages/ocrmypdf-3.0rc4-py3.4.egg/ocrmypdf/main.py", line 332, in repair_pdf
pdfinfo.extend(pdf_get_all_pageinfo(output_file))
File "/usr/local/lib/python3.4/dist-packages/ocrmypdf-3.0rc4-py3.4.egg/ocrmypdf/pageinfo.py", line 137, in pdf_get_all_pageinfo
return [_pdf_get_pageinfo(infile, n) for n in range(pdf.numPages)]
File "/usr/local/lib/python3.4/dist-packages/ocrmypdf-3.0rc4-py3.4.egg/ocrmypdf/pageinfo.py", line 137, in
return [_pdf_get_pageinfo(infile, n) for n in range(pdf.numPages)]
File "/usr/local/lib/python3.4/dist-packages/ocrmypdf-3.0rc4-py3.4.egg/ocrmypdf/pageinfo.py", line 118, in _pdf_get_pageinfo
if _page_has_inline_images(page):
File "/usr/local/lib/python3.4/dist-packages/ocrmypdf-3.0rc4-py3.4.egg/ocrmypdf/pageinfo.py", line 45, in _page_has_inline_images
data = contents.getData()
AttributeError: 'ArrayObject' object has no attribute 'getData'
[/code]
It doesn't matter which scanned file I use, as an example I attached a file printed and scanned from liquidweb
The text was updated successfully, but these errors were encountered: