Skip to content

Commit

Permalink
add slice op demo for quickstart (PaddlePaddle#12439)
Browse files Browse the repository at this point in the history
  • Loading branch information
GreatV authored May 25, 2024
1 parent c364821 commit 739400f
Show file tree
Hide file tree
Showing 3 changed files with 101 additions and 0 deletions.
40 changes: 40 additions & 0 deletions doc/doc_ch/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -253,6 +253,46 @@ for idx in range(len(result)):
im_show.save('result_page_{}.jpg'.format(idx))
```

* 使用滑动窗口进行检测和识别

要使用滑动窗口进行光学字符识别(OCR),可以使用以下代码片段:

```Python
from paddleocr import PaddleOCR
from PIL import Image, ImageDraw, ImageFont

# 初始化OCR引擎
ocr = PaddleOCR(use_angle_cls=True, lang="en")

img_path = "./very_large_image.jpg"
slice = {'horizontal_stride': 300, 'vertical_stride': 500, 'merge_x_thres': 50, 'merge_y_thres': 35}
results = ocr.ocr(img_path, cls=True, slice=slice)

# 加载图像
image = Image.open(img_path).convert("RGB")
draw = ImageDraw.Draw(image)
font = ImageFont.truetype("./doc/fonts/simfang.ttf", size=20) # 根据需要调整大小

# 处理并绘制结果
for res in results:
for line in res:
box = [tuple(point) for point in line[0]] # 将列表转换为元组列表
# 将四个角转换为两个角
box = [(min(point[0] for point in box), min(point[1] for point in box)),
(max(point[0] for point in box), max(point[1] for point in box))]
txt = line[1][0]
draw.rectangle(box, outline="red", width=2) # 绘制矩形
draw.text((box[0][0], box[0][1] - 25), txt, fill="blue", font=font) # 在矩形上方绘制文本

# 保存结果
image.save("result.jpg")

```

此示例初始化了启用角度分类的PaddleOCR实例,并将语言设置为英语。然后调用`ocr`方法,并使用多个参数来自定义检测和识别过程,包括处理图像切片的`slice`参数。

要更全面地了解切片操作,请参考[切片操作文档](./slice.md)

## 3. 小结

通过本节内容,相信您已经熟练掌握PaddleOCR whl包的使用方法并获得了初步效果。
Expand Down
21 changes: 21 additions & 0 deletions doc/doc_ch/slice.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# 切片操作

如果希望运行 PaddleOCR 处理一张非常大的图像或文档,对其进行检测和识别,可以使用切片操作,如下所示:

```python
ocr_inst = PaddleOCR(**ocr_settings)
results = ocr_inst.ocr(img, det=True, rec=True, slice=slice, cls=False, bin=False, inv=False, alpha_color=False)
```

其中,
`slice = {'horizontal_stride': h_stride, 'vertical_stride': v_stride, 'merge_x_thres': x_thres, 'merge_y_thres': y_thres}`

这里的 `h_stride``v_stride``x_thres``y_thres` 是用户可配置的参数,需要手动设置。切片操作符的工作原理是,在大图像上运行一个滑动窗口,创建图像的切片,并在这些切片上运行 OCR 算法。

然后将这些切片级别的零散结果合并,生成图像级别的检测和识别结果。水平和垂直步幅不能低于一定限度,因为过低的值会产生太多切片,导致计算结果非常耗时。例如,对于尺寸为 6616x14886 的图像,推荐使用以下参数:

```python
slice = {'horizontal_stride': 300, 'vertical_stride': 500, 'merge_x_thres': 50, 'merge_y_thres': 35}
```

所有边界框接近 `merge_x_thres``merge_y_thres` 的切片级检测结果将被合并在一起。
40 changes: 40 additions & 0 deletions doc/doc_en/quickstart_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -266,6 +266,46 @@ for idx in range(len(result)):
im_show.save('result_page_{}.jpg'.format(idx))
```

* Detection and Recognition Using Sliding Windows

To perform OCR using sliding windows, the following code snippet can be employed:

```Python
from paddleocr import PaddleOCR
from PIL import Image, ImageDraw, ImageFont

# Initialize OCR engine
ocr = PaddleOCR(use_angle_cls=True, lang="en")

img_path = "./very_large_image.jpg"
slice = {'horizontal_stride': 300, 'vertical_stride': 500, 'merge_x_thres': 50, 'merge_y_thres': 35}
results = ocr.ocr(img_path, cls=True, slice=slice)

# Load image
image = Image.open(img_path).convert("RGB")
draw = ImageDraw.Draw(image)
font = ImageFont.truetype("./doc/fonts/simfang.ttf", size=20) # Adjust size as needed

# Process and draw results
for res in results:
for line in res:
box = [tuple(point) for point in line[0]] # Convert list of lists to list of tuples
# Convert four corners to two corners
box = [(min(point[0] for point in box), min(point[1] for point in box)),
(max(point[0] for point in box), max(point[1] for point in box))]
txt = line[1][0]
draw.rectangle(box, outline="red", width=2) # Draw rectangle
draw.text((box[0][0], box[0][1] - 25), txt, fill="blue", font=font) # Draw text above the box

# Save result
image.save("result.jpg")

```

This example initializes the PaddleOCR instance with angle classification enabled and sets the language to English. The `ocr` method is then called with several parameters to customize the detection and recognition process, including the `slice` parameter for handling image slices.

For a more comprehensive understanding of the slicing operation, please refer to the [slice operation documentation](./slice_en.md).

<a name="3"></a>

## 3. Summary
Expand Down

0 comments on commit 739400f

Please sign in to comment.