You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When the blow code converts tesseract output to dataframe, The pandas.dataframe.infer_objects method is called which infers any sequence of digits (zip code, ssn numbers, loan id, etc) as float datatype.
* fix to #94 (comment)
now all text will inferred as string and the user can change it to their desired data type.
* maybe a simpler solution
Co-authored-by: lolipopshock <22512825+lolipopshock@users.noreply.github.com>
Describe the bug
When the blow code converts tesseract output to dataframe, The pandas.dataframe.infer_objects method is called which infers any sequence of digits (zip code, ssn numbers, loan id, etc) as float datatype.
layout-parser/src/layoutparser/ocr/tesseract_agent.py
Line 92 in 29fb2fb
To Reproduce
Steps to reproduce the behavior:
refer the attached screenshot
Environment
This bug is platform-independent tried on both Windows and Linux
Screenshots
![bug1](https://user-images.githubusercontent.com/59497032/139570555-e9acf179-2cac-4071-ba9a-b7d1528b0570.PNG)
![bug2](https://user-images.githubusercontent.com/59497032/139570565-8c0ef297-c180-437e-8d81-07b9e0a8dcbe.PNG)
![zip](https://user-images.githubusercontent.com/59497032/139570688-de91a245-a809-483d-ba27-5bc1c13d2620.jpg)
Code
Image
Cannot attach the full image, because of security reasons
on another image
![bug3](https://user-images.githubusercontent.com/59497032/139571526-0bbae501-be85-4a82-b032-ee2128e5a9f3.png)
when gather_data function is called it returns an error too
Traceback
The text was updated successfully, but these errors were encountered: