You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Different behavior on my computer to AWS EC2 instance m5.xlarge.
Expected behavior
That they have the same behavior since it works on my computer, however when I execute it it cannot find the images.
AWS Log
Process Process-1:
Traceback (most recent call last):
File "/opt/build/app/read_contracts.py", line 67, in read_contracts
text_contract = read_pdf(filepath)
File "/opt/build/app/read_contracts.py", line 27, in read_pdf
images_from_path = convert_from_path(pdf_path=pdf,
File "/usr/local/lib/python3.9/site-packages/pdf2image/pdf2image.py", line 218, in convert_from_path
images += _load_from_output_folder(
File "/usr/local/lib/python3.9/site-packages/pdf2image/pdf2image.py", line 517, in _load_from_output_folder
images.append(Image.open(os.path.join(output_folder, f)))
File "/usr/local/lib/python3.9/site-packages/PIL/Image.py", line 3123, in open
raise UnidentifiedImageError(
PIL.UnidentifiedImageError: cannot identify image file '/tmp/tmpqo3mn0om/2d473b9f-5b6c-46f0-9220-a4bf51124f6e-03.ppm'
Desktop (please complete the following information):
OS: Ubuntu, m5.xlarge instance.
Version [e.g. 22] 22.04
Additional context
Function error
defread_pdf(pdf):
""" It takes a pdf file, converts it to images, and then converts those images to text :param pdf: The path to the PDF file you want to convert :return: A string with the text of the pdf """full_text=''withtempfile.TemporaryDirectory() aspath:
images_from_path=convert_from_path(pdf_path=pdf,
dpi=350,
output_folder=path)
forpageintqdm(images_from_path):
full_text+=image_to_text(page, lang='spa')
returnfull_text
I printed the filenames to see if it was a path issue but it displays correctly. Additionally I am using multiprocessing, again in local it works but in the instance it does not.
The text was updated successfully, but these errors were encountered:
Describe the bug
Different behavior on my computer to AWS EC2 instance
m5.xlarge
.Expected behavior
That they have the same behavior since it works on my computer, however when I execute it it cannot find the images.
AWS Log
Desktop (please complete the following information):
m5.xlarge
instance.Additional context
Function error
I printed the filenames to see if it was a path issue but it displays correctly. Additionally I am using
multiprocessing
, again in local it works but in the instance it does not.The text was updated successfully, but these errors were encountered: