No speed advantage when using batches. #58

Dario-Mantegazza · 2024-04-15T15:16:34Z

I did some tests when using both detection+recognition with a set of 30 images and I've seen that there is no speed improvements when using batches.
So I checked the code and if I got it right in your implementation,

tamil_ocr/ocr_tamil/ocr.py

Lines 527 to 536 in 71a91db

    
           # To handle multiple images 
        
           if isinstance(image,list): 
        
               text_list = [] 
        
               if self.detect: 
        
                   for img in image: 
        
                       temp = self.read_image_input(img) 
        
                       exported_regions,updated_prediction_result = self.craft_detect(temp) 
        
                       inter_text_list,conf_list = self.text_recognize_batch(exported_regions) 
        
                       final_result = self.output_formatter(inter_text_list,conf_list,updated_prediction_result) 
        
                       text_list.append(final_result)

you split the batch into single images and then pass each image to craft, get the BB and pass those to ParSeq.

I'm not an expert in Parseq, but if it already can deal with batches of BB why not simply take all the BB from the all batch and pass those as a single input to parseq?

To recap my suggestion why don't you do something like the following:

bbs=[]
for image in batch:
     bb_preds=craft(image)
     bbs.appens(bb_preds)
texts=parseq_read_batch(bbs)

This should be faster as you call parseq only once per batch and not per image, albeit with a larger memory cost but that can be dealt by the batches size parameter.

Obviously even better would be to do something like:

bbs=craft_batch(batch)
texts=parseq_batch(bbs)

The text was updated successfully, but these errors were encountered:

Dario-Mantegazza · 2024-04-15T15:32:09Z

Apparently CRAFT can run in batches, here

I think running the inference in parallel is difficult due to the post-processing step, which is performed in CPU unless you use multi-processing technique. However, the batch-processing of deep networks is possible within a memory limit.

clovaai/CRAFT-pytorch#44 (comment)

and in other comments in the issue section of CRAFT's GitHub, it is stated that batch prediction is feasible.
It would be interesting if the batch functionality of ocr-tamil would exploit this.

Dario-Mantegazza · 2024-04-15T15:38:00Z

Also, I think it would make more sense to decouple the batchsize used by parseq for the text recognition and the tamil-ocr batch size parameter. these should be two separate numbers.
I like this library, please keep working on it :)

gnana70 · 2024-04-15T16:04:31Z

Hi @Dario-Mantegazza , thanks for your feedback. I will try to include batch mode for CRAFT text detection in coming weeks.

Dario-Mantegazza · 2024-04-16T08:56:21Z

Hi again @gnana70, in the meantime I will make a fork and see if I can implement a temporary workaround. I will keep you posted.
Cheers

gnana70 · 2024-04-16T15:07:01Z

Hi @Dario-Mantegazza , thanks for your help. Please share your workaround once done.

Dario-Mantegazza · 2024-04-16T16:46:49Z

So I tried to change the code in the most simple and hacky way, but for now, I don't get better performances; I think that something is broken in my edited version and while all the model accepts batched input, something else curb the performance gain. I will upload my version that works partially on my fork but due to work deadlines I don't think I can spend more time on this.

gnana70 · 2024-04-16T17:49:44Z

@Dario-Mantegazza , no problem. I will investigate and fix it up

JamesDConley · 2024-04-30T05:07:28Z

Most of the time in processing appears to be the cv2/numpy code for extracting the detected word images from the main image. I swapped this code out for a simple min/max rectangle and saw time for a page I was testing on a file that went from 360s to under 15s.

For images with larger numbers of bounding boxes, this will be an even more drastic speedup, since it reduces this from 1-2 seconds per bounding box to around 1/100000 of a second per bounding box.

the only downside is that this isn't straightening the text- it just pulls out a bounding box. This works for my use case though since I am extracting from documents without any tilted text.

Here are the timings before and after for the portion of the code I was in

Before

Timer started!
Read Image took 0.00 seconds (0.00 seconds total)
Timer started!
	Got size took 0.00 seconds (0.00 seconds total)
	Got prediction took 11.34 seconds (11.34 seconds total)
	Transformed bboxes initial took 0.00 seconds (11.34 seconds total)
	Sorted bounding boxes took 0.00 seconds (11.34 seconds total)
	Updated prediction results took 0.00 seconds (11.34 seconds total)
	**Exported file paths took 348.48 seconds** (359.82 seconds total)
	Updated prediction results again took 0.00 seconds (359.82 seconds total)

After

Timer started!
Read Image took 0.00 seconds (0.00 seconds total)
Timer started!
	Got size took 0.00 seconds (0.00 seconds total)
	Got prediction took 11.08 seconds (11.08 seconds total)
	Transformed bboxes initial took 0.00 seconds (11.08 seconds total)
	Sorted bounding boxes took 0.00 seconds (11.08 seconds total)
	Updated prediction results took 0.00 seconds (11.08 seconds total)
	**Exported file paths took 0.01 seconds** (11.09 seconds total)
	Updated prediction results again took 0.00 seconds (11.09 seconds total)

Code is at https://github.com/JamesDConley/faster_tamil_ocr
Got a bit of debugging/testing left to do but I'll likely have a PR tomorrow or the following night.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No speed advantage when using batches. #58

No speed advantage when using batches. #58

Dario-Mantegazza commented Apr 15, 2024 •

edited

Loading

Dario-Mantegazza commented Apr 15, 2024

Dario-Mantegazza commented Apr 15, 2024

gnana70 commented Apr 15, 2024

Dario-Mantegazza commented Apr 16, 2024

gnana70 commented Apr 16, 2024

Dario-Mantegazza commented Apr 16, 2024

gnana70 commented Apr 16, 2024

JamesDConley commented Apr 30, 2024

No speed advantage when using batches. #58

No speed advantage when using batches. #58

Comments

Dario-Mantegazza commented Apr 15, 2024 • edited Loading

Dario-Mantegazza commented Apr 15, 2024

Dario-Mantegazza commented Apr 15, 2024

gnana70 commented Apr 15, 2024

Dario-Mantegazza commented Apr 16, 2024

gnana70 commented Apr 16, 2024

Dario-Mantegazza commented Apr 16, 2024

gnana70 commented Apr 16, 2024

JamesDConley commented Apr 30, 2024

Before

After

Dario-Mantegazza commented Apr 15, 2024 •

edited

Loading