fix to issue #94 #95

kforcodeai · 2021-10-31T06:32:59Z

Fixes # #94 (comment)
#94
The issue was, all digit sequences were inferred as float, with this fix all text (numeric + non-numeric) will be inferred as string and the user can change it to their desired data type.
But with this fix, the user will be required to change the numeric data type columns.
i could not find any better solution other than this.

now all text will inferred as string and the user can change it to their desired data type.

lolipopshock · 2021-11-03T21:35:15Z

src/layoutparser/ocr/tesseract_agent.py

+        _cols.remove('text')
+        for col in _cols:
+            _df[col] = _df[col].astype(int)
+        res['data'] = _df


Can you try the following code:

_data = pytesseract.image_to_data(img_content, lang=self.lang, **self.configs) df = pd.read_csv( io.StringIO(_data), quoting=csv.QUOTE_NONE, encoding="utf-8", sep="\t" ) df['text'] = df['text'].astype('str') res["data"] = df

@lolipopshock sorry it does not, I have tried this
and ya i get it, the for loop and all that stuff looks ugly :)

here's the screenshot

I see -- it's the issue from floating point numbers .0 right?

lolipopshock · 2022-02-02T05:19:48Z

I think the new solution can solve your issue -- see example below:

Let's say we have a csv file test.csv:

Col_A, Col_B
, 1
2, 3
245.0,

And if we read it via:

df = pd.read_csv("test.csv", converters={"Col_A": str})

We have

Test	B
	1
2	3
245.0

(There's no .0 for 2 in the 2nd row and 1st col.

fix to Layout-Parser#94 (comment)

9b2fa43

now all text will inferred as string and the user can change it to their desired data type.

kforcodeai changed the title ~~fix to https://github.com/Layout-Parser/layout-parser/issues/94#issue…~~ fix to issue #94 Oct 31, 2021

lolipopshock reviewed Nov 3, 2021

View reviewed changes

maybe a simpler solution

09f630e

lolipopshock closed this Feb 2, 2022

lolipopshock reopened this Feb 2, 2022

lolipopshock merged commit 0809fa8 into Layout-Parser:master Feb 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix to issue #94 #95

fix to issue #94 #95

kforcodeai commented Oct 31, 2021

lolipopshock Nov 3, 2021

kforcodeai Nov 4, 2021

lolipopshock Nov 4, 2021

kforcodeai Nov 4, 2021

lolipopshock commented Feb 2, 2022 •

edited

Loading

fix to issue #94 #95

fix to issue #94 #95

Conversation

kforcodeai commented Oct 31, 2021

lolipopshock Nov 3, 2021

Choose a reason for hiding this comment

kforcodeai Nov 4, 2021

Choose a reason for hiding this comment

lolipopshock Nov 4, 2021

Choose a reason for hiding this comment

kforcodeai Nov 4, 2021

Choose a reason for hiding this comment

lolipopshock commented Feb 2, 2022 • edited Loading

lolipopshock commented Feb 2, 2022 •

edited

Loading