-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Pdf2parquet inbuilt ocr error #1042
Comments
@dolfim-ibm Please see above. Is this related to Docling functionality? |
I think you have to prepend the arguments with |
UPDATE: however it works fine if the params are : the ocr engine is working fine for easyocr and tesseract_cli |
Do you get an error message or something else? |
Exception creating transform tesserocr is not correctly installed. Please install it via mostly this I uninstalled and reinstalled both tesseract and tesserocr still does not work |
Ok, then it is the non-trivial tesserocr installation. We described it a bit at https://ds4sd.github.io/docling/installation/. Very likely, all you need is running this pip uninstall tesserocr
pip install --no-binary :all: tesserocr Unfortunately, this is caused by |
Yeah running that did not fix that but now other two are working so I am good to do - just a little recommendation in the docs maybe mention the prepend thing with a asterisk and it will be awesome! Thanks a ton for the help ! |
Search before asking
Component
Documentation
What happened + What you expected to happen
I was trying to do pdf2parquet on google colab using the inbuilt ocr- easy ocr to be specific. Apparently it is not working as in throwing the error when the parameters are are set according to the documentation
Reproduction script
I have attached the images of the errors and the documentation which was about the parameters of the pdf2parquet
Anything else
If I remove the do_ocr and the ocr_engine parameter it will work just fine soo rest of the things are working fine and all dependencies are installed properly
OS
Other
Python
3.10.x
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: