Segmenting and saving each class as image to allow OCR #9

elnazsn1988 · 2020-02-09T23:48:11Z

🚀 Feature

Hi, is there an internal feature which lets each classed be saved as a seperate segment, or image? I am trying to identify tables, seperate and then run through a tabular data analyzer and ocr - so far am able to get the image predictions with your code, but not the actual annotations/segmented fields for further analysis/ocr.

Motivation

Pitch

Note that you can implement many features by extending detectron2.
See projects for some examples.
We would only consider adding new features if they are relevant to many users.

hpanwar08 · 2020-02-10T04:22:25Z

@elnazsn1988 You can try the below code. It extracts the bounding boxes and then crop the image based on the bounding box and save.

classes = ['text', 'title', 'list', 'table', 'figure']
default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg)
img = detectron2.data.detection_utils.read_image(path_to_image, format="BGR")
predictions = default_predictor(img)
instances = predictions["instances"].to('cpu')

pred_classes = instances.pred_classes
labels = [classes[i] for i in pred_classes]
boxes = instances.pred_boxes
if isinstance(boxes, detectron2.structures.boxes.Boxes):
    boxes = boxes.tensor.numpy()
else:
    boxes = np.asarray(boxes)

from PIL import Image
img = Image.fromarray(img)

for label, bbox in zip(labels, boxes):
    if label == "table":
        cropped_img = img.crop(bbox)
        croppped_img.save(f"{label}_{bbox}.png")

elnazsn1988 · 2020-02-12T15:34:34Z

@hpanwar08 thanks for the above, it throws errors at img being extracted as a numpy object : ```

cropped_img = img.crop(bbox) AttributeError: 'numpy.ndarray' object has no attribute 'crop'

hpanwar08 · 2020-02-13T04:52:09Z

@elnazsn1988 I have updated the code, it should work now

akshay94950 · 2020-03-27T15:33:44Z

@hpanwar08 where to add the code ?which file?

hpanwar08 · 2020-03-30T05:42:26Z

@akshay94950 You can add this code in a new python file and run it.

akshay94950 · 2020-04-02T11:54:38Z

@hpanwar08
the above code is throwing error:
49 if label == "text":
50 cropped_img = img.crop(bbox)
---> 51 croppped_img.save(f"{label}_{bbox}.png")
52
53

NameError: name 'croppped_img' is not defined

hpanwar08 · 2020-04-02T12:18:07Z

can you show the entire code.

akshay94950 · 2020-04-02T12:19:56Z

import argparse
import glob
import multiprocessing as mp
import os
import time
import cv2
import tqdm
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()
from detectron2.config import get_cfg
from detectron2.data.detection_utils import read_image
from detectron2.utils.logger import setup_logger
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog
from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg

from predictor import VisualizationDemo

constants

WINDOW_NAME = "COCO detections"
classes = ['text', 'title', 'list', 'table', 'figure']
cfg = get_cfg()
cfg.merge_from_file("/content/gdrive/My Drive/detectron2-master/configs/DLA_mask_rcnn_R_101_FPN_3x.yaml")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.2 # set threshold for this model
cfg.MODEL.WEIGHTS = os.path.join( '/content/gdrive/My Drive/Unt', 'model_final.pth')
cfg.MODEL.DEVICE='cpu'
default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg)
img = detectron2.data.detection_utils.read_image("/content/gdrive/My Drive/unt/page1.jpg" , format="BGR")
predictions = default_predictor(img)
instances = predictions["instances"].to('cpu')

pred_classes = instances.pred_classes
labels = [classes[i] for i in pred_classes]
boxes = instances.pred_boxes
if isinstance(boxes, detectron2.structures.boxes.Boxes):
boxes = boxes.tensor.numpy()
else:
boxes = np.asarray(boxes)
print('b',boxes)
from PIL import Image
img = Image.fromarray(img)

for label, bbox in zip(labels, boxes):
if label == "text":
cropped_img = img.crop(bbox)
cropped_img.save(f"/content/gdrive/My Drive/unt/{label}_{bbox}.jpg")

akshay94950 · 2020-04-02T12:34:08Z

it was because of spelling mistake in that above code .thanks @hpanwar08 but it shows this error:
AttributeError: 'numpy.ndarray' object has no attribute 'crop'

akshay94950 · 2020-04-02T12:47:44Z

thanks bro its working.....@hpanwar08

elnazsn1988 · 2020-04-02T13:15:34Z

@elnazsn1988 I have updated the code, it should work now

HI @hpanwar08 - codes working great thanks, do you happen to know how I can annotate a new image and retrain the existing weights for the new image? Ive been trying to donwload the full dataset from:
wget -O datz/publaynet.tar.gz https://dax-cdn.cdn.appdomain.cloud/dax-publaynet/1.0.0/publaynet.tar.gz
but somehow the download is unstable so I cant get the whole dataset to retrain and annotate/add my file.

hpanwar08 · 2020-04-02T14:00:17Z

I did not understand your question. If you want to annotate new images then you can have a look at https://github.com/wkentaro/labelme
It's better to download the dataset in parts train-0.tar.gz, train-1.tar.gz etc rather than as one huge file.

elnazsn1988 · 2020-04-02T14:12:26Z

I did not understand your question. If you want to annotate new images then you can have a look at https://github.com/wkentaro/labelme
It's better to download the dataset in parts train-0.tar.gz, train-1.tar.gz etc rather than as one huge file.

@hpanwar08 is there a way to add a new annotated image and train your pretrained model on it without retraining the whole thing? As in use your weights from the trained model and retrain on a new image set.

hpanwar08 · 2020-04-02T16:01:06Z

Yes, you can annotated your custom dataset and save it in COCO format and train using train_net_dla.py

akshay94950 · 2020-04-05T16:02:49Z

can we obtain predictions output image without figures class for using it in ocr?or how to use it for text extraction ?

jocelynguo · 2020-04-06T04:10:29Z

@elnazsn1988 You can try the below code. It extracts the bounding boxes and then crop the image based on the bounding box and save.

classes = ['text', 'title', 'list', 'table', 'figure']
default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg)
img = detectron2.data.detection_utils.read_image(path_to_image, format="BGR")
predictions = default_predictor(img)
instances = predictions["instances"].to('cpu')

pred_classes = instances.pred_classes
labels = [classes[i] for i in pred_classes]
boxes = instances.pred_boxes
if isinstance(boxes, detectron2.structures.boxes.Boxes):
    boxes = boxes.tensor.numpy()
else:
    boxes = np.asarray(boxes)

from PIL import Image
img = Image.fromarray(img)

for label, bbox in zip(labels, boxes):
    if label == "table":
        cropped_img = img.crop(bbox)
        croppped_img.save(f"{label}_{bbox}.png")

One follow-up question: How to extract the confident score from each of the boxes?

hpanwar08 · 2020-04-06T09:46:02Z

can we obtain predictions output image without figures class for using it in ocr?or how to use it for text extraction ?

You could directly crop the images with bounding box

hpanwar08 · 2020-04-06T09:47:22Z

@elnazsn1988 You can try the below code. It extracts the bounding boxes and then crop the image based on the bounding box and save.

classes = ['text', 'title', 'list', 'table', 'figure']
default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg)
img = detectron2.data.detection_utils.read_image(path_to_image, format="BGR")
predictions = default_predictor(img)
instances = predictions["instances"].to('cpu')

pred_classes = instances.pred_classes
labels = [classes[i] for i in pred_classes]
boxes = instances.pred_boxes
if isinstance(boxes, detectron2.structures.boxes.Boxes):
    boxes = boxes.tensor.numpy()
else:
    boxes = np.asarray(boxes)

from PIL import Image
img = Image.fromarray(img)

for label, bbox in zip(labels, boxes):
    if label == "table":
        cropped_img = img.crop(bbox)
        croppped_img.save(f"{label}_{bbox}.png")

One follow-up question: How to extract the confident score from each of the boxes?

instances.scores will give you the predicted probabilities for each instance.

jocelynguo · 2020-04-06T19:57:45Z

@elnazsn1988 You can try the below code. It extracts the bounding boxes and then crop the image based on the bounding box and save.

classes = ['text', 'title', 'list', 'table', 'figure']
default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg)
img = detectron2.data.detection_utils.read_image(path_to_image, format="BGR")
predictions = default_predictor(img)
instances = predictions["instances"].to('cpu')

pred_classes = instances.pred_classes
labels = [classes[i] for i in pred_classes]
boxes = instances.pred_boxes
if isinstance(boxes, detectron2.structures.boxes.Boxes):
    boxes = boxes.tensor.numpy()
else:
    boxes = np.asarray(boxes)

from PIL import Image
img = Image.fromarray(img)

for label, bbox in zip(labels, boxes):
    if label == "table":
        cropped_img = img.crop(bbox)
        croppped_img.save(f"{label}_{bbox}.png")

One follow-up question: How to extract the confident score from each of the boxes?

instances.scores will give you the predicted probabilities for each instance.

Thank you!

akshay94950 · 2020-04-09T04:07:19Z

@hpanwar08 how to get the images saved using above code in order,which can be later used for an ocr using tesseract?

hpanwar08 · 2020-04-09T06:13:21Z

These images will be saved in the same directory in which you run the code.

akshay94950 · 2020-04-09T06:36:10Z

I mean the images are not in order. I want to use it for OCR. How to sort in order and get of the original input?

…

On Thu, 9 Apr, 2020, 11:43 AM Himanshu, ***@***.***> wrote: These images will be saved in the same directory in which you run the code. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AO6YAAJILCOVYP5Q4GVTGQLRLVRQ3ANCNFSM4KSFJACA> .

hpanwar08 · 2020-04-09T06:54:27Z

what do you want to sort on?

akshay94950 · 2020-04-09T07:16:57Z

The images are saved in the order of predictions but if I use the images in that order I can't extract meaningful text from the input image. I want all classes of predictions except figures and I want to use it for a text to speech module.

…

On Thu, 9 Apr, 2020, 12:24 PM Himanshu, ***@***.***> wrote: what do you want to sort on? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AO6YAAP2A3CQIRUTFSSID43RLVWLFANCNFSM4KSFJACA> .

hpanwar08 · 2020-04-09T07:37:01Z

exclude the figure label when iterating the labels
if label != "figure": ...

akshay94950 · 2020-04-09T09:00:05Z

I had done that and got the images but the image is not in order as of input. I want to use the images for text extraction. Since it's not in order the text I extracting is not meaningful or jumbled.

…

On Thu, 9 Apr, 2020, 1:07 PM Himanshu, ***@***.***> wrote: exclude the figure label when iterating the labels if label != "figure": ... — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AO6YAANKFY2RG6XXNKO7UZTRLV3KVANCNFSM4KSFJACA> .

hpanwar08 · 2020-04-09T09:36:02Z

Now I got what you are saying.
You may need sort segments based on the bounding box location. But it would depend on the page layout. Some images may have two column layout, some may have one, some may have mixed layout. You need write your logic based on these constraints.

akshay94950 · 2020-04-09T09:38:31Z

Yeah... That's matter. Can you help something regarding this?thanks for the help you shown till now..

…

On Thu, 9 Apr, 2020, 3:06 PM Himanshu, ***@***.***> wrote: Now I got what you are saying. You may need sort segments based on the bounding box location. But it would depend on the page layout. Some images may have two column layout, some may have one, some may have mixed layout. You need write your logic based on these constraints. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AO6YAALLNJBP7RQNFUL36ETRLWJJDANCNFSM4KSFJACA> .

hpanwar08 · 2020-04-13T03:24:18Z

One of the solution could be, write a classifier to classify the type of image e.g. single column, 2 column etc. Then write logic based on the type of the image e.g sorting based on x or y.

akshay94950 · 2020-04-15T17:28:06Z

can you please help me writing the logic of a single column image

hpanwar08 · 2020-04-20T12:44:35Z

try this

classes = ['text', 'title', 'list', 'table', 'figure']
default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg)
img = detectron2.data.detection_utils.read_image(path_to_image, format="BGR")
predictions = default_predictor(img)
instances = predictions["instances"].to('cpu')

pred_classes = instances.pred_classes
labels = [classes[i] for i in pred_classes]
boxes = instances.pred_boxes
if isinstance(boxes, detectron2.structures.boxes.Boxes):
    boxes = boxes.tensor.numpy()
else:
    boxes = np.asarray(boxes)

from PIL import Image
img = Image.fromarray(img)

# bbox = [xmin. ymin, xmax, ymax], 
# sort by bboxes and lebels by ymin
sorted_by_bbox_ymin = sorted(zip(boxes, labels), key=lambda x: x[0][1])
boxes, labels = list(zip(*sorted_by_bbox_ymin))
boxes = list(boxes)
labels = list(labels)

for idx, (label, bbox) in enumerate(zip(labels, boxes)):
    if label == "text":
        cropped_img = img.crop(bbox)
        croppped_img.save(f"{idx}_{label}_{bbox}.png")

deeptigoyal · 2020-04-28T07:04:23Z

Hi, I also want to run ocr on text label images obtained after running detectron, whrein order of images is required to obtain meaningful text.

I ran the above sorted code and ended up getting error:

list expected at most 1 arguments, got 18

If anyone has been able to crack it, then please help.

hpanwar08 · 2020-04-28T07:46:05Z

Fixed the code, should work now.

deeptigoyal · 2020-04-28T08:04:50Z

Yooo...thanks

:)

akshay94950 · 2020-05-01T05:58:40Z

Thanks.... @hpanwar08 For this work... It's working nice to some of images...'did you use the complete publaynet dataset for training?if not i like to train the remaining data in your model for that can you specify the portion of dataset you doesnt use ?

tnav96 · 2020-07-03T11:52:27Z

Hi, I am wondering if you have an example python script which I could run to test the pretrained detectron? That would be a hughe help

hpanwar08 · 2020-07-03T13:06:53Z

There is a command written in the README.md which you can try.

tnav96 · 2020-07-03T13:10:14Z

I was more looking for something like this which I could also manipulate afterwards. I am working in jupyter lab. Do you also have a simple example with such a structure but also including the implementation of cfg and all the required libraries?

try this

classes = ['text', 'title', 'list', 'table', 'figure']
default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg)
img = detectron2.data.detection_utils.read_image(path_to_image, format="BGR")
predictions = default_predictor(img)
instances = predictions["instances"].to('cpu')

pred_classes = instances.pred_classes
labels = [classes[i] for i in pred_classes]
boxes = instances.pred_boxes
if isinstance(boxes, detectron2.structures.boxes.Boxes):
    boxes = boxes.tensor.numpy()
else:
    boxes = np.asarray(boxes)

from PIL import Image
img = Image.fromarray(img)

# bbox = [xmin. ymin, xmax, ymax], 
# sort by bboxes and lebels by ymin
sorted_by_bbox_ymin = sorted(zip(boxes, labels), key=lambda x: x[0][1])
boxes, labels = list(zip(*sorted_by_bbox_ymin))
boxes = list(boxes)
labels = list(labels)

for idx, (label, bbox) in enumerate(zip(labels, boxes)):
    if label == "text":
        cropped_img = img.crop(bbox)
        croppped_img.save(f"{idx}_{label}_{bbox}.png")

hpanwar08 · 2020-07-03T13:21:19Z

You could use the above code, it should get the predictions for you. You need to install detectron2 first and copy the cfg from this repo to your installation.

hpanwar08 · 2020-07-03T13:23:57Z

Thanks.... @hpanwar08 For this work... It's working nice to some of images...'did you use the complete publaynet dataset for training?if not i like to train the remaining data in your model for that can you specify the portion of dataset you doesnt use ?

I used almost half of the publaynet data, it is mentioned in the README file.

tnav96 · 2020-07-03T13:24:59Z

I installed detectron2 already but if I just run the code above like that it obviously says that the library is not defined and cfg is not defined. I guess there is missing something like:

import argparse
import glob
import multiprocessing as mp
import os
import time
import cv2
import tqdm
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()
from detectron2.config import get_cfg
from detectron2.data.detection_utils import read_image
from detectron2.utils.logger import setup_logger
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog
from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg

cfg = get_cfg()
cfg.merge_from_file("detectron2-master/configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml")
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = os.path.join( 'trained', 'model_final_trimmed.pth')
cfg.MODEL.DEVICE='cpu'

Segmenting and saving each class as image to allow OCR #9

Segmenting and saving each class as image to allow OCR #9

Comments

elnazsn1988 commented Feb 9, 2020

🚀 Feature

Motivation

Pitch

hpanwar08 commented Feb 10, 2020 • edited Loading

elnazsn1988 commented Feb 12, 2020 • edited Loading

hpanwar08 commented Feb 13, 2020

akshay94950 commented Mar 27, 2020 • edited Loading

hpanwar08 commented Mar 30, 2020

akshay94950 commented Apr 2, 2020

hpanwar08 commented Apr 2, 2020

akshay94950 commented Apr 2, 2020 • edited Loading

constants

akshay94950 commented Apr 2, 2020 • edited Loading

akshay94950 commented Apr 2, 2020

elnazsn1988 commented Apr 2, 2020 • edited Loading

hpanwar08 commented Apr 2, 2020

elnazsn1988 commented Apr 2, 2020 • edited Loading

hpanwar08 commented Apr 2, 2020

akshay94950 commented Apr 5, 2020 • edited Loading

jocelynguo commented Apr 6, 2020

hpanwar08 commented Apr 6, 2020

hpanwar08 commented Apr 6, 2020

jocelynguo commented Apr 6, 2020

akshay94950 commented Apr 9, 2020

hpanwar08 commented Apr 9, 2020

akshay94950 commented Apr 9, 2020 via email

hpanwar08 commented Apr 9, 2020

akshay94950 commented Apr 9, 2020 via email

hpanwar08 commented Apr 9, 2020

akshay94950 commented Apr 9, 2020 via email

hpanwar08 commented Apr 9, 2020

akshay94950 commented Apr 9, 2020 via email

hpanwar08 commented Apr 13, 2020

akshay94950 commented Apr 15, 2020

hpanwar08 commented Apr 20, 2020 • edited Loading

deeptigoyal commented Apr 28, 2020

hpanwar08 commented Apr 28, 2020

deeptigoyal commented Apr 28, 2020

akshay94950 commented May 1, 2020

tnav96 commented Jul 3, 2020

hpanwar08 commented Jul 3, 2020

tnav96 commented Jul 3, 2020

hpanwar08 commented Jul 3, 2020

hpanwar08 commented Jul 3, 2020

tnav96 commented Jul 3, 2020

hpanwar08 commented Feb 10, 2020 •

edited

Loading

elnazsn1988 commented Feb 12, 2020 •

edited

Loading

akshay94950 commented Mar 27, 2020 •

edited

Loading

akshay94950 commented Apr 2, 2020 •

edited

Loading

akshay94950 commented Apr 2, 2020 •

edited

Loading

elnazsn1988 commented Apr 2, 2020 •

edited

Loading

elnazsn1988 commented Apr 2, 2020 •

edited

Loading

akshay94950 commented Apr 5, 2020 •

edited

Loading

hpanwar08 commented Apr 20, 2020 •

edited

Loading