Removal of all whitespaces during PDF conversion #120
Labels
bug
Something isn't working
open for contribution
Invites open-source developers to contribute to the project.
For a certain PDF of my test files
markitdown
will remove all whitespaces during conversion. The PDF can be found here: https://aclanthology.org/2024.eacl-long.5.pdfI run the example code in a jupyter notebook (Python 3.12.8) like this:
The result looks like this (preserving the head of the paper but removing all whitespaces from the body):
Other PDFs that I've tested work fine.
The text was updated successfully, but these errors were encountered: