Skip to content
This repository has been archived by the owner on Jan 6, 2024. It is now read-only.

convert: Option '-resample' requires an argument or argument is malformed #13

Open
tonyblue2 opened this issue May 7, 2022 · 12 comments
Assignees
Labels

Comments

@tonyblue2
Copy link

tonyblue2 commented May 7, 2022

Hello,

please excuse my bad english - but I am not a native Speaker.

I installed pmOCR on a Ubuntu 20.04.4 LTS. But if I try to start it in batch-mode I got the error Message "convert: Option '-resample' requires an argument or argument is malformed.".

These were my Steps:

apt-get install -f poppler-utils
Paketlisten werden gelesen... Fertig Abhängigkeitsbaum wird aufgebaut.... 50% Abhängigkeitsbaum wird aufgebaut. Statusinformationen werden eingelesen.... Fertig Die folgenden zusätzlichen Pakete werden installiert: libcairo2 libjbig0 libjpeg-turbo8 libjpeg8 liblcms2-2 libnspr4 libnss3 libopenjp2-7 libpixman-1-0 libpoppler97 libtiff5 libwebp6 libxcb-render0 poppler-data Vorgeschlagene Pakete: liblcms2-utils ghostscript fonts-japanese-mincho | fonts-ipafont-mincho fonts-japanese-gothic | fonts-ipafont-gothic fonts-arphic-ukai fonts-arphic-uming fonts-nanum Die folgenden NEUEN Pakete werden installiert: libcairo2 libjbig0 libjpeg-turbo8 libjpeg8 liblcms2-2 libnspr4 libnss3 libopenjp2-7 libpixman-1-0 libpoppler97 libtiff5 libwebp6 libxcb-render0 poppler-data poppler-utils 0 aktualisiert, 15 neu installiert, 0 zu entfernen und 0 nicht aktualisiert.

apt-get install tesseract-ocr-deu
Paketlisten werden gelesen... Fertig Abhängigkeitsbaum wird aufgebaut. Statusinformationen werden eingelesen.... Fertig Die folgenden zusätzlichen Pakete werden installiert: fontconfig libarchive13 libdatrie1 libgif7 libgomp1 libgraphite2-3 libharfbuzz0b liblept5 libpango-1.0-0 libpangocairo-1.0-0 libpangoft2-1.0-0 libtesseract4 libthai-data libthai0 libwebpmux3 tesseract-ocr tesseract-ocr-eng tesseract-ocr-osd Vorgeschlagene Pakete: lrzip Die folgenden NEUEN Pakete werden installiert: fontconfig libarchive13 libdatrie1 libgif7 libgomp1 libgraphite2-3 libharfbuzz0b liblept5 libpango-1.0-0 libpangocairo-1.0-0 libpangoft2-1.0-0 libtesseract4 libthai-data libthai0 libwebpmux3 tesseract-ocr tesseract-ocr-deu tesseract-ocr-eng tesseract-ocr-osd 0 aktualisiert, 19 neu installiert, 0 zu entfernen und 0 nicht aktualisiert. Es müssen 9.340 kB an Archiven heruntergeladen werden. Nach dieser Operation werden 28,3 MB Plattenplatz zusätzlich benutzt.

apt-get install -f git
Paketlisten werden gelesen... Fertig Abhängigkeitsbaum wird aufgebaut. Statusinformationen werden eingelesen.... Fertig Die folgenden zusätzlichen Pakete werden installiert: git-man libbrotli1 libcurl3-gnutls liberror-perl libnghttp2-14 librtmp1 libssh-4 patch Vorgeschlagene Pakete: git-daemon-run | git-daemon-sysvinit git-doc git-el git-email git-gui gitk gitweb git-cvs git-mediawiki git-svn diffutils-doc Die folgenden NEUEN Pakete werden installiert: git git-man libbrotli1 libcurl3-gnutls liberror-perl libnghttp2-14 librtmp1 libssh-4 patch 0 aktualisiert, 9 neu installiert, 0 zu entfernen und 0 nicht aktualisiert. Es müssen 6.379 kB an Archiven heruntergeladen werden. Nach dieser Operation werden 41,0 MB Plattenplatz zusätzlich benutzt.

git clone https://github.com/deajan/pmOCR
Klone nach 'pmOCR' … remote: Enumerating objects: 2030, done. remote: Counting objects: 100% (74/74), done. remote: Compressing objects: 100% (45/45), done. remote: Total 2030 (delta 47), reused 46 (delta 29), pack-reused 1956 Empfange Objekte: 100% (2030/2030), 1021.31 KiB | 5.58 MiB/s, fertig. Löse Unterschiede auf: 100% (1385/1385), fertig.

cd pmOCR

./install.sh
2022-05-07 21:02:07 - Detected systemd. 2022-05-07 21:02:07 - Copying [default.conf] to [/etc/pmocr/default.conf.new]. 2022-05-07 21:02:07 - Copied [/pmOCR/default.conf] to [/etc/pmocr/default.conf.new]. 2022-05-07 21:02:07 - Copied [/pmOCR/pmocr.sh] to [/usr/local/bin/pmocr.sh]. 2022-05-07 21:02:07 - Set file permissions to [755] on [/usr/local/bin/pmocr.sh]. 2022-05-07 21:02:07 - Set file ownership on [/usr/local/bin/pmocr.sh] to [root:root]. 2022-05-07 21:02:07 - Copied [/pmOCR/pmocr-srv@.service] to [/lib/systemd/system/pmocr-srv@.service]. 2022-05-07 21:02:07 - Created [pmocr-srv] service in [/lib/systemd/system] and [/etc/systemd/user]. 2022-05-07 21:02:07 - Can be activated with [systemctl start SERVICE_NAME@instance.conf] where instance.conf is the name of the config file in /etc/pmocr. 2022-05-07 21:02:07 - Can be enabled on boot with [systemctl enable pmocr-srv@instance.conf]. 2022-05-07 21:02:07 - In userland, active with [systemctl --user start pmocr-srv@instance.conf]. 2022-05-07 21:02:07 - pmocr installed. Use with /usr/local/bin/pmocr.sh 2022-05-07 21:02:07 - In order to make usage statistics, the script would like to connect to http://instcount.netpower.fr?program=pmocr&version=1.8.1&os=Linux%205.13.19-6-pve%20x86_64%20x86_64%20GNU%2FLinux%20%28%22Ubuntu%22%20%2220.04.4%20LTS%20%28Focal%20Fossa%29%22%29%2064-bit%20Unix&action=install No data except those in the url will be send. Allow [Y/n]

ln -s /usr/local/bin/pmocr.sh /usr/bin/pmocr.sh

apt-file search /usr/bin/convert
bedops: /usr/bin/convert2bed bitseq: /usr/bin/convertSamples caffe-tools-cpu: /usr/bin/convert_cifar_data caffe-tools-cpu: /usr/bin/convert_imageset caffe-tools-cpu: /usr/bin/convert_mnist_data caffe-tools-cpu: /usr/bin/convert_mnist_siamese_data cbflib-bin: /usr/bin/convert_image cct: /usr/bin/convert_vcf_to_features cgns-convert: /usr/bin/convert_dataclass cgns-convert: /usr/bin/convert_location cgns-convert: /usr/bin/convert_variables convertall: /usr/bin/convertall device-tree-compiler: /usr/bin/convert-dtsv0 dvbstreamer: /usr/bin/convertdvbdb eigensoft: /usr/bin/convertf findbugs: /usr/bin/convertXmlToText foxtrotgps: /usr/bin/convert2gpx foxtrotgps: /usr/bin/convert2osm graphicsmagick-imagemagick-compat: /usr/bin/convert imagemagick-6.q16: /usr/bin/convert-im6.q16 imagemagick-6.q16hdri: /usr/bin/convert-im6.q16hdri ir.lv2: /usr/bin/convert4chan leptonica-progs: /usr/bin/convertfilestopdf leptonica-progs: /usr/bin/convertfilestops leptonica-progs: /usr/bin/convertformat leptonica-progs: /usr/bin/convertsegfilestopdf leptonica-progs: /usr/bin/convertsegfilestops leptonica-progs: /usr/bin/converttopdf leptonica-progs: /usr/bin/converttops lilypond: /usr/bin/convert-ly ncbi-blast+: /usr/bin/convert2blastmask octomap-tools: /usr/bin/convert_octree omniorb: /usr/bin/convertior opendkim-tools: /usr/bin/convert_keylist phast: /usr/bin/convert_coords profphd-utils: /usr/bin/convert_seq python3-oslo.log: /usr/bin/convert-json python3-potr: /usr/bin/convertkey rsem: /usr/bin/convert-sam-for-rsem ruby-shoulda-context: /usr/bin/convert_to_should_syntax staden-io-lib-utils: /usr/bin/convert_trace syrthes-tools: /usr/bin/convert2syrthes4 texlive-bibtex-extra: /usr/bin/convertgls2bib xoreos-tools: /usr/bin/convert2da

apt-get install -f graphicsmagick-imagemagick-compat
Paketlisten werden gelesen... Fertig Abhängigkeitsbaum wird aufgebaut. Statusinformationen werden eingelesen.... Fertig Die folgenden zusätzlichen Pakete werden installiert: fonts-droid-fallback fonts-noto-mono fonts-urw-base35 ghostscript graphicsmagick gsfonts libavahi-client3 libavahi-common-data libavahi-common3 libcups2 libgraphicsmagick-q16-3 libgs9 libgs9-common libijs-0.35 libjbig2dec0 libpaper-utils libpaper1 libwmf0.2-7 Vorgeschlagene Pakete: fonts-noto fonts-freefont-otf | fonts-freefont-ttf fonts-texgyre ghostscript-x graphicsmagick-dbg cups-common libwmf0.2-7-gtk Die folgenden NEUEN Pakete werden installiert: fonts-droid-fallback fonts-noto-mono fonts-urw-base35 ghostscript graphicsmagick graphicsmagick-imagemagick-compat gsfonts libavahi-client3 libavahi-common-data libavahi-common3 libcups2 libgraphicsmagick-q16-3 libgs9 libgs9-common libijs-0.35 libjbig2dec0 libpaper-utils libpaper1 libwmf0.2-7 0 aktualisiert, 19 neu installiert, 0 zu entfernen und 0 nicht aktualisiert. Es müssen 16,6 MB an Archiven heruntergeladen werden. Nach dieser Operation werden 54,0 MB Plattenplatz zusätzlich benutzt. Möchten Sie fortfahren? [J/n]

Here the log of the error:
2022-05-07 21:25:13 - Beginning PDF OCR recognition of files in [/test] using tesseract.
2022-05-07 21:25:14 - Preparing to process [/test/testpdf.pdf].
2022-05-07 21:25:14 - Preparing to process [/test/testpdf2 - Kopie.pdf].
2022-05-07 21:25:14 - Preparing to process [/test/testpdf3.pdf].
2022-05-07 21:25:14 - Preparing to process [/test/testpdf4.pdf].
2022-05-07 21:25:14 - _ExecTasksPidsCheck called by [OCR_Dispatch] finished monitoring pid [5087] with exitcode [1].
2022-05-07 21:25:14 - Command was [OCR "/test/testpdf.pdf" ".pdf" "pdf" "false"].
2022-05-07 21:25:14 - Truncated output:
2022-05-07 21:25:14 - Processing file [/test/testpdf.pdf].
convert convert: Option '-resample' requires an argument or argument is malformed.
2022-05-07 21:25:14 - /usr/bin/convert intermediary transformation failed.
2022-05-07 21:25:14 - Could not process file [/test/testpdf.pdf] (OCR error code 1). See logs.
2022-05-07 21:25:14 - Truncated OCR Engine Output:

2022-05-07 21:25:14 - Renaming file [/test/testpdf.pdf] to [/test/testpdf_OCR_ERR.pdf] in order to exclude it from next run.
2022-05-07 21:25:14 - Sent mail using mail command.

I added the /etc/pmocr/default.conf

What is wrong?

Thank you

Tony

default.txt

@deajan
Copy link
Owner

deajan commented May 8, 2022

Hello Tony,

I found that I mistyped some arguments for the intermediary PDF transformation which is required to make searchable PDFs from existing ones.
In the meantime, I re-evaluated those arguments in order to make quicker preprocessing.
I've updated current master. Could you give it a spin ?

Please update your default.conf with the one on git too.
Don't forget to tweak the settings in default.conf for your language (guess -l deu ?) and other options. The optional image preprocessor for tesseract is quite CPU hungry, but needed if your scans are skewed / noisy.
speeding up can be achieved by lowering resolution to your scanner resolution (I leave 600dpi as default in order to never loose quality, but most people don't scan with 600dpi).

Mfg.

@deajan deajan self-assigned this May 8, 2022
@deajan deajan added the bug label May 8, 2022
@tonyblue2
Copy link
Author

tonyblue2 commented May 8, 2022

Hello deajan,

thank you for your message and your help.

I rolled back my machine und installed it again.

No I got this errormessage:

pmocr.sh --batch --target=pdf --skip-txt-pdf --delete-input /test
2022-05-08 20:51:59 - Running pmocr 1.8.2 as batch
2022-05-08 20:51:59 - Beginning PDF OCR recognition of files in [/test] using tesseract.
2022-05-08 20:51:59 - Preparing to process [/test/test.pdf].
2022-05-08 20:52:00 - _ExecTasksPidsCheck called by [OCR_Dispatch] finished monitoring pid [5588] with exitcode [1].
2022-05-08 20:52:00 - Command was [OCR "/test/test.pdf" ".pdf" "pdf" "false"].
2022-05-08 20:52:00 - Truncated output:
2022-05-08 20:51:59 - Processing file [/test/test.pdf].
convert convert: Unrecognized option (-respect-parenthesis).
2022-05-08 20:52:00 - /usr/bin/convert preprocesser failed.
2022-05-08 20:52:00 - Could not process file [/test/test.pdf] (OCR error code 1). See logs.
2022-05-08 20:52:00 - Truncated OCR Engine Output:

2022-05-08 20:52:00 - Renaming file [/test/test.pdf] to [/test/test_OCR_ERR.pdf] in order to exclude it from next run.
2022-05-08 20:52:00 - Sent mail using mail command.
2022-05-08 20:52:01 - Failed OCR_Dispatch run.
2022-05-08 20:52:01 - Batch ended.
2022-05-08 20:52:01 - pmocr stopped instance [MyOCRServer] with pid [5539].

In Ubuntu Version 1.4+really1 of graphicsmagick-imagemagick-compat is installed.

@deajan
Copy link
Owner

deajan commented May 8, 2022

Well that's bad, looks like the command line for imagemagick isn't recognized on your system.
Can you give me the output of convert --version ?
On my test machine I got the following which works well

convert --version
Version: ImageMagick 6.9.10-86 Q16 x86_64 2020-01-13 https://imagemagick.org
Copyright: © 1999-2020 ImageMagick Studio LLC
License: https://imagemagick.org/script/license.php
Features: Cipher DPC Modules OpenMP(4.5)
Delegates (built-in): bzlib cairo fftw fontconfig freetype gslib gvc jbig jng jp2 jpeg lcms ltdl lzma openexr pangocairo png ps raqm raw rsvg tiff webp wmf x xml zlib

@tonyblue2
Copy link
Author

convert --version convert convert: Request did not return an image.

`convert -version
GraphicsMagick 1.3.35 2020-02-23 Q16 http://www.GraphicsMagick.org/
Copyright (C) 2002-2020 GraphicsMagick Group.
Additional copyrights and licenses apply to this software.
See http://www.GraphicsMagick.org/www/Copyright.html for details.

Feature Support:
Native Thread Safe yes
Large Files (> 32 bit) yes
Large Memory (> 32 bit) yes
BZIP yes
DPS no
FlashPix no
FreeType yes
Ghostscript (Library) no
JBIG yes
JPEG-2000 no
JPEG yes
Little CMS yes
Loadable Modules no
Solaris mtmalloc no
Google perftools tcmalloc no
OpenMP yes (201511 "4.5")
PNG yes
TIFF yes
TRIO no
Solaris umem no
WebP yes
WMF yes
X11 yes
XML yes
ZLIB yes

Host type: x86_64-pc-linux-gnu

Configured using the command:
./configure '--build' 'x86_64-linux-gnu' '--enable-shared' '--enable-static' '--enable-libtool-verbose' '--prefix=/usr' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--docdir=${prefix}/share/doc/graphicsmagick' '--with-gs-font-dir=/usr/share/fonts/type1/gsfonts' '--with-x' '--x-includes=/usr/include/X11' '--x-libraries=/usr/lib/X11' '--without-dps' '--without-modules' '--without-frozenpaths' '--with-webp=yes' '--with-zstd=yes' '--with-perl' '--with-perl-options=INSTALLDIRS=vendor' '--enable-quantum-library-names' '--with-quantum-depth=16' 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -fdebug-prefix-map=/build/graphicsmagick-7OaGZU/graphicsmagick-1.4+really1.3.35=. -fstack-protector-strong -Wformat -Werror=format-security' 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'CXXFLAGS=-g -O2 -fdebug-prefix-map=/build/graphicsmagick-7OaGZU/graphicsmagick-1.4+really1.3.35=. -fstack-protector-strong -Wformat -Werror=format-security'

Final Build Parameters:
CC = gcc
CFLAGS = -fopenmp -g -O2 -fdebug-prefix-map=/build/graphicsmagick-7OaGZU/graphicsmagick-1.4+really1.3.35=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -pthread
CPPFLAGS = -Wdate-time -D_FORTIFY_SOURCE=2 -I/usr/include/X11 -I/usr/include/freetype2 -I/usr/include/libxml2
CXX = g++
CXXFLAGS = -g -O2 -fdebug-prefix-map=/build/graphicsmagick-7OaGZU/graphicsmagick-1.4+really1.3.35=. -fstack-protector-strong -Wformat -Werror=format-security -pthread
LDFLAGS = -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -L/usr/lib/X11
LIBS = -ljbig -lwebp -lwebpmux -llcms2 -ltiff -lfreetype -ljpeg -lpng16 -lwmflite -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lzstd -lm -lpthread'`

@tonyblue2
Copy link
Author

tonyblue2 commented May 8, 2022

I saw that you installed "ImageMagick 6.9.10-86 Q16". I have installed GraphicsMagick 1.3.35 2020-02-23 Q16.

So Now I tried to install

apt-get install -f imagemagick-6.q16

So now it looks like your post:

convert --version Version: ImageMagick 6.9.10-23 Q16 x86_64 20190101 https://imagemagick.org Copyright: © 1999-2019 ImageMagick Studio LLC License: https://imagemagick.org/script/license.php Features: Cipher DPC Modules OpenMP Delegates (built-in): bzlib djvu fftw fontconfig freetype jbig jng jpeg lcms lqr ltdl lzma openexr pangocairo png tiff webp wmf x xml zlib

but now I get the error-message:

pmocr.sh --batch --target=pdf --skip-txt-pdf --delete-input /test
2022-05-08 22:00:20 - Running pmocr 1.8.2 as batch
2022-05-08 22:00:20 - Beginning PDF OCR recognition of files in [/test] using tesseract.
2022-05-08 22:00:20 - Preparing to process [/test/test.pdf].
2022-05-08 22:00:20 - _ExecTasksPidsCheck called by [OCR_Dispatch] finished monitoring pid [8301] with exitcode [1].
2022-05-08 22:00:20 - Command was [OCR "/test/test.pdf" ".pdf" "pdf" "false"].
2022-05-08 22:00:20 - Truncated output:
2022-05-08 22:00:20 - Processing file [/test/test.pdf].
convert: attempt to perform an operation not allowed by the security policy 'PDF' @ error/constitute.c/IsCoderAuthorized/408.
convert: no images defined `/test/test.tif' @ error/convert.c/ConvertImageCommand/3258.

2022-05-08 22:00:20 - /usr/bin/convert intermediary transformation failed.
2022-05-08 22:00:20 - Could not process file [/test/test.pdf] (OCR error code 1). See logs.
2022-05-08 22:00:20 - Truncated OCR Engine Output:

2022-05-08 22:00:20 - Renaming file [/test/test.pdf] to [/test/test_OCR_ERR.pdf] in order to exclude it from next run.
2022-05-08 22:00:20 - Sent mail using mail command.
2022-05-08 22:00:20 - Failed OCR_Dispatch run.
2022-05-08 22:00:20 - Batch ended.
2022-05-08 22:00:20 - pmocr stopped instance [MyOCRServer] with pid [8252].
root@ftpserver:/test# ls /usr/bin/convert
convert convert-im6 convert-im6.q16

@tonyblue2
Copy link
Author

tonyblue2 commented May 8, 2022

I looked for a solution for the "not allowed" an found: https://forum.ubuntuusers.de/topic/imagemagick-funktioniert-nicht-2/

So I changed in /etc/ImageMagick-6/policy.xml:

policy domain="coder" rights="none" pattern="PDF"

in

policy domain="coder" rights="read|write" pattern="PDF" 

But now I get the next error-message:

pmocr.sh --batch --target=pdf --skip-txt-pdf --delete-input /test
2022-05-08 22:17:22 - Running pmocr 1.8.2 as batch
2022-05-08 22:17:22 - Beginning PDF OCR recognition of files in [/test] using tesseract.
2022-05-08 22:17:22 - Preparing to process [/test/test.pdf].
2022-05-08 22:17:33 - _ExecTasksPidsCheck called by [OCR_Dispatch] finished monitoring pid [9352] with exitcode [1].
2022-05-08 22:17:33 - Command was [OCR "/test/test.pdf" ".pdf" "pdf" "false"].
2022-05-08 22:17:33 - Truncated output:
2022-05-08 22:17:22 - Processing file [/test/test.pdf].
convert: invalid argument for option `-sharpen': -compress @ error/convert.c/ConvertImageCommand/2752.
2022-05-08 22:17:32 - /usr/bin/convert preprocesser failed.

2022-05-08 22:17:32 - Could not process file [/test/test.pdf] (OCR error code 1). See logs.
2022-05-08 22:17:32 - Truncated OCR Engine Output:

2022-05-08 22:17:32 - Renaming file [/test/test.pdf] to [/test/test_OCR_ERR.pdf] in order to exclude it from next run.
2022-05-08 22:17:33 - Sent mail using mail command.
2022-05-08 22:17:33 - Failed OCR_Dispatch run.
2022-05-08 22:17:33 - Batch ended.
2022-05-08 22:17:33 - pmocr stopped instance [MyOCRServer] with pid [9295].

@deajan
Copy link
Owner

deajan commented May 8, 2022

Hell I hate distro compile differences. I'll have to spin up an Ubuntu to check that. I'll report back.
In the meantime, could you remove the 'sharpen' argument in default.conf for testing ?

@deajan
Copy link
Owner

deajan commented May 9, 2022

Well, I actually made a mistake placing the -compress lzw argument.
Made another commit that should fix the conversion arguments, among a couple of other improvements.
Can you update both pmocr.sh and default.conf from current master and try again ?

@deajan
Copy link
Owner

deajan commented May 10, 2022

@tonyblue2 Can you confirm that pmOCR works on your system now ?

@tonyblue2
Copy link
Author

tonyblue2 commented May 10, 2022 via email

@deajan
Copy link
Owner

deajan commented May 11, 2022

So far so good, it seems that the intermediary transformation is fixed, but the tesseract OCR engine isn't working.
Would you mind checking for more output in /var/log/pmOCR_MyOCRServer.log ?
If no more output is in that log file, could you run with export _DEBUG=true before running your command ?

@deajan
Copy link
Owner

deajan commented May 15, 2022

@tonyblue2 Would you mind posting the logs?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants