Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmenting and saving each class as image to allow OCR #9

Open
elnazsn1988 opened this issue Feb 9, 2020 · 41 comments
Open

Segmenting and saving each class as image to allow OCR #9

elnazsn1988 opened this issue Feb 9, 2020 · 41 comments

Comments

@elnazsn1988
Copy link

🚀 Feature

Hi, is there an internal feature which lets each classed be saved as a seperate segment, or image? I am trying to identify tables, seperate and then run through a tabular data analyzer and ocr - so far am able to get the image predictions with your code, but not the actual annotations/segmented fields for further analysis/ocr.

Motivation

Pitch

Note that you can implement many features by extending detectron2.
See projects for some examples.
We would only consider adding new features if they are relevant to many users.

@hpanwar08
Copy link
Owner

hpanwar08 commented Feb 10, 2020

@elnazsn1988 You can try the below code. It extracts the bounding boxes and then crop the image based on the bounding box and save.

classes = ['text', 'title', 'list', 'table', 'figure']
default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg)
img = detectron2.data.detection_utils.read_image(path_to_image, format="BGR")
predictions = default_predictor(img)
instances = predictions["instances"].to('cpu')

pred_classes = instances.pred_classes
labels = [classes[i] for i in pred_classes]
boxes = instances.pred_boxes
if isinstance(boxes, detectron2.structures.boxes.Boxes):
    boxes = boxes.tensor.numpy()
else:
    boxes = np.asarray(boxes)

from PIL import Image
img = Image.fromarray(img)

for label, bbox in zip(labels, boxes):
    if label == "table":
        cropped_img = img.crop(bbox)
        croppped_img.save(f"{label}_{bbox}.png")

@elnazsn1988
Copy link
Author

elnazsn1988 commented Feb 12, 2020

@hpanwar08 thanks for the above, it throws errors at img being extracted as a numpy object : ```

cropped_img = img.crop(bbox) AttributeError: 'numpy.ndarray' object has no attribute 'crop'

@hpanwar08
Copy link
Owner

@elnazsn1988 I have updated the code, it should work now

@akshay94950
Copy link

akshay94950 commented Mar 27, 2020

@hpanwar08 where to add the code ?which file?

@hpanwar08
Copy link
Owner

@akshay94950 You can add this code in a new python file and run it.

@akshay94950
Copy link

@hpanwar08
the above code is throwing error:
49 if label == "text":
50 cropped_img = img.crop(bbox)
---> 51 croppped_img.save(f"{label}_{bbox}.png")
52
53

NameError: name 'croppped_img' is not defined

@hpanwar08
Copy link
Owner

can you show the entire code.

@akshay94950
Copy link

akshay94950 commented Apr 2, 2020

import argparse
import glob
import multiprocessing as mp
import os
import time
import cv2
import tqdm
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()
from detectron2.config import get_cfg
from detectron2.data.detection_utils import read_image
from detectron2.utils.logger import setup_logger
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog
from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg

from predictor import VisualizationDemo

constants

WINDOW_NAME = "COCO detections"
classes = ['text', 'title', 'list', 'table', 'figure']
cfg = get_cfg()
cfg.merge_from_file("/content/gdrive/My Drive/detectron2-master/configs/DLA_mask_rcnn_R_101_FPN_3x.yaml")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.2 # set threshold for this model
cfg.MODEL.WEIGHTS = os.path.join( '/content/gdrive/My Drive/Unt', 'model_final.pth')
cfg.MODEL.DEVICE='cpu'
default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg)
img = detectron2.data.detection_utils.read_image("/content/gdrive/My Drive/unt/page1.jpg" , format="BGR")
predictions = default_predictor(img)
instances = predictions["instances"].to('cpu')

pred_classes = instances.pred_classes
labels = [classes[i] for i in pred_classes]
boxes = instances.pred_boxes
if isinstance(boxes, detectron2.structures.boxes.Boxes):
boxes = boxes.tensor.numpy()
else:
boxes = np.asarray(boxes)
print('b',boxes)
from PIL import Image
img = Image.fromarray(img)

for label, bbox in zip(labels, boxes):
if label == "text":
cropped_img = img.crop(bbox)
cropped_img.save(f"/content/gdrive/My Drive/unt/{label}_{bbox}.jpg")

@akshay94950
Copy link

akshay94950 commented Apr 2, 2020

it was because of spelling mistake in that above code .thanks @hpanwar08 but it shows this error:
AttributeError: 'numpy.ndarray' object has no attribute 'crop'

@akshay94950
Copy link

thanks bro its working.....@hpanwar08

@elnazsn1988
Copy link
Author

elnazsn1988 commented Apr 2, 2020

@elnazsn1988 I have updated the code, it should work now

HI @hpanwar08 - codes working great thanks, do you happen to know how I can annotate a new image and retrain the existing weights for the new image? Ive been trying to donwload the full dataset from:
wget -O datz/publaynet.tar.gz https://dax-cdn.cdn.appdomain.cloud/dax-publaynet/1.0.0/publaynet.tar.gz
but somehow the download is unstable so I cant get the whole dataset to retrain and annotate/add my file.

@hpanwar08
Copy link
Owner

I did not understand your question. If you want to annotate new images then you can have a look at https://github.com/wkentaro/labelme
It's better to download the dataset in parts train-0.tar.gz, train-1.tar.gz etc rather than as one huge file.

@elnazsn1988
Copy link
Author

elnazsn1988 commented Apr 2, 2020

I did not understand your question. If you want to annotate new images then you can have a look at https://github.com/wkentaro/labelme
It's better to download the dataset in parts train-0.tar.gz, train-1.tar.gz etc rather than as one huge file.

@hpanwar08 is there a way to add a new annotated image and train your pretrained model on it without retraining the whole thing? As in use your weights from the trained model and retrain on a new image set.

@hpanwar08
Copy link
Owner

Yes, you can annotated your custom dataset and save it in COCO format and train using train_net_dla.py

@akshay94950
Copy link

akshay94950 commented Apr 5, 2020

can we obtain predictions output image without figures class for using it in ocr?or how to use it for text extraction ?

@jocelynguo
Copy link

@elnazsn1988 You can try the below code. It extracts the bounding boxes and then crop the image based on the bounding box and save.

classes = ['text', 'title', 'list', 'table', 'figure']
default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg)
img = detectron2.data.detection_utils.read_image(path_to_image, format="BGR")
predictions = default_predictor(img)
instances = predictions["instances"].to('cpu')

pred_classes = instances.pred_classes
labels = [classes[i] for i in pred_classes]
boxes = instances.pred_boxes
if isinstance(boxes, detectron2.structures.boxes.Boxes):
    boxes = boxes.tensor.numpy()
else:
    boxes = np.asarray(boxes)

from PIL import Image
img = Image.fromarray(img)

for label, bbox in zip(labels, boxes):
    if label == "table":
        cropped_img = img.crop(bbox)
        croppped_img.save(f"{label}_{bbox}.png")

One follow-up question: How to extract the confident score from each of the boxes?

@hpanwar08
Copy link
Owner

can we obtain predictions output image without figures class for using it in ocr?or how to use it for text extraction ?

You could directly crop the images with bounding box

@hpanwar08
Copy link
Owner

@elnazsn1988 You can try the below code. It extracts the bounding boxes and then crop the image based on the bounding box and save.

classes = ['text', 'title', 'list', 'table', 'figure']
default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg)
img = detectron2.data.detection_utils.read_image(path_to_image, format="BGR")
predictions = default_predictor(img)
instances = predictions["instances"].to('cpu')

pred_classes = instances.pred_classes
labels = [classes[i] for i in pred_classes]
boxes = instances.pred_boxes
if isinstance(boxes, detectron2.structures.boxes.Boxes):
    boxes = boxes.tensor.numpy()
else:
    boxes = np.asarray(boxes)

from PIL import Image
img = Image.fromarray(img)

for label, bbox in zip(labels, boxes):
    if label == "table":
        cropped_img = img.crop(bbox)
        croppped_img.save(f"{label}_{bbox}.png")

One follow-up question: How to extract the confident score from each of the boxes?

instances.scores will give you the predicted probabilities for each instance.

@jocelynguo
Copy link

@elnazsn1988 You can try the below code. It extracts the bounding boxes and then crop the image based on the bounding box and save.

classes = ['text', 'title', 'list', 'table', 'figure']
default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg)
img = detectron2.data.detection_utils.read_image(path_to_image, format="BGR")
predictions = default_predictor(img)
instances = predictions["instances"].to('cpu')

pred_classes = instances.pred_classes
labels = [classes[i] for i in pred_classes]
boxes = instances.pred_boxes
if isinstance(boxes, detectron2.structures.boxes.Boxes):
    boxes = boxes.tensor.numpy()
else:
    boxes = np.asarray(boxes)

from PIL import Image
img = Image.fromarray(img)

for label, bbox in zip(labels, boxes):
    if label == "table":
        cropped_img = img.crop(bbox)
        croppped_img.save(f"{label}_{bbox}.png")

One follow-up question: How to extract the confident score from each of the boxes?

instances.scores will give you the predicted probabilities for each instance.

Thank you!

@akshay94950
Copy link

@hpanwar08 how to get the images saved using above code in order,which can be later used for an ocr using tesseract?

@hpanwar08
Copy link
Owner

These images will be saved in the same directory in which you run the code.

@akshay94950
Copy link

akshay94950 commented Apr 9, 2020 via email

@hpanwar08
Copy link
Owner

what do you want to sort on?

@akshay94950
Copy link

akshay94950 commented Apr 9, 2020 via email

@hpanwar08
Copy link
Owner

exclude the figure label when iterating the labels
if label != "figure": ...

@akshay94950
Copy link

akshay94950 commented Apr 9, 2020 via email

@hpanwar08
Copy link
Owner

Now I got what you are saying.
You may need sort segments based on the bounding box location. But it would depend on the page layout. Some images may have two column layout, some may have one, some may have mixed layout. You need write your logic based on these constraints.

@akshay94950
Copy link

akshay94950 commented Apr 9, 2020 via email

@hpanwar08
Copy link
Owner

One of the solution could be, write a classifier to classify the type of image e.g. single column, 2 column etc. Then write logic based on the type of the image e.g sorting based on x or y.

@akshay94950
Copy link

can you please help me writing the logic of a single column image

@hpanwar08
Copy link
Owner

hpanwar08 commented Apr 20, 2020

try this

classes = ['text', 'title', 'list', 'table', 'figure']
default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg)
img = detectron2.data.detection_utils.read_image(path_to_image, format="BGR")
predictions = default_predictor(img)
instances = predictions["instances"].to('cpu')

pred_classes = instances.pred_classes
labels = [classes[i] for i in pred_classes]
boxes = instances.pred_boxes
if isinstance(boxes, detectron2.structures.boxes.Boxes):
    boxes = boxes.tensor.numpy()
else:
    boxes = np.asarray(boxes)

from PIL import Image
img = Image.fromarray(img)

# bbox = [xmin. ymin, xmax, ymax], 
# sort by bboxes and lebels by ymin
sorted_by_bbox_ymin = sorted(zip(boxes, labels), key=lambda x: x[0][1])
boxes, labels = list(zip(*sorted_by_bbox_ymin))
boxes = list(boxes)
labels = list(labels)

for idx, (label, bbox) in enumerate(zip(labels, boxes)):
    if label == "text":
        cropped_img = img.crop(bbox)
        croppped_img.save(f"{idx}_{label}_{bbox}.png")

@deeptigoyal
Copy link

Hi, I also want to run ocr on text label images obtained after running detectron, whrein order of images is required to obtain meaningful text.

I ran the above sorted code and ended up getting error:

list expected at most 1 arguments, got 18

If anyone has been able to crack it, then please help.
error2

@hpanwar08
Copy link
Owner

Fixed the code, should work now.

@deeptigoyal
Copy link

Yooo...thanks

:)

@akshay94950
Copy link

Thanks.... @hpanwar08 For this work... It's working nice to some of images...'did you use the complete publaynet dataset for training?if not i like to train the remaining data in your model for that can you specify the portion of dataset you doesnt use ?

@tnav96
Copy link

tnav96 commented Jul 3, 2020

Hi, I am wondering if you have an example python script which I could run to test the pretrained detectron? That would be a hughe help

@hpanwar08
Copy link
Owner

There is a command written in the README.md which you can try.

@tnav96
Copy link

tnav96 commented Jul 3, 2020

I was more looking for something like this which I could also manipulate afterwards. I am working in jupyter lab. Do you also have a simple example with such a structure but also including the implementation of cfg and all the required libraries?

try this

classes = ['text', 'title', 'list', 'table', 'figure']
default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg)
img = detectron2.data.detection_utils.read_image(path_to_image, format="BGR")
predictions = default_predictor(img)
instances = predictions["instances"].to('cpu')

pred_classes = instances.pred_classes
labels = [classes[i] for i in pred_classes]
boxes = instances.pred_boxes
if isinstance(boxes, detectron2.structures.boxes.Boxes):
    boxes = boxes.tensor.numpy()
else:
    boxes = np.asarray(boxes)

from PIL import Image
img = Image.fromarray(img)

# bbox = [xmin. ymin, xmax, ymax], 
# sort by bboxes and lebels by ymin
sorted_by_bbox_ymin = sorted(zip(boxes, labels), key=lambda x: x[0][1])
boxes, labels = list(zip(*sorted_by_bbox_ymin))
boxes = list(boxes)
labels = list(labels)

for idx, (label, bbox) in enumerate(zip(labels, boxes)):
    if label == "text":
        cropped_img = img.crop(bbox)
        croppped_img.save(f"{idx}_{label}_{bbox}.png")

@hpanwar08
Copy link
Owner

You could use the above code, it should get the predictions for you. You need to install detectron2 first and copy the cfg from this repo to your installation.

@hpanwar08
Copy link
Owner

Thanks.... @hpanwar08 For this work... It's working nice to some of images...'did you use the complete publaynet dataset for training?if not i like to train the remaining data in your model for that can you specify the portion of dataset you doesnt use ?

I used almost half of the publaynet data, it is mentioned in the README file.

@tnav96
Copy link

tnav96 commented Jul 3, 2020

I installed detectron2 already but if I just run the code above like that it obviously says that the library is not defined and cfg is not defined. I guess there is missing something like:

import argparse
import glob
import multiprocessing as mp
import os
import time
import cv2
import tqdm
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()
from detectron2.config import get_cfg
from detectron2.data.detection_utils import read_image
from detectron2.utils.logger import setup_logger
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog
from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg

cfg = get_cfg()
cfg.merge_from_file("detectron2-master/configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml")
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = os.path.join( 'trained', 'model_final_trimmed.pth')
cfg.MODEL.DEVICE='cpu'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants