add slice op demo for quickstart (PaddlePaddle#12439)

GreatV · May 25, 2024 · 739400f · 739400f
1 parent c364821
commit 739400f
Show file tree

Hide file tree

Showing 3 changed files with 101 additions and 0 deletions.
diff --git a/doc/doc_ch/quickstart.md b/doc/doc_ch/quickstart.md
@@ -253,6 +253,46 @@ for idx in range(len(result)):
     im_show.save('result_page_{}.jpg'.format(idx))
 ```
 
+* 使用滑动窗口进行检测和识别
+
+要使用滑动窗口进行光学字符识别（OCR），可以使用以下代码片段：
+
+```Python
+from paddleocr import PaddleOCR
+from PIL import Image, ImageDraw, ImageFont
+
+# 初始化OCR引擎
+ocr = PaddleOCR(use_angle_cls=True, lang="en")
+
+img_path = "./very_large_image.jpg"
+slice = {'horizontal_stride': 300, 'vertical_stride': 500, 'merge_x_thres': 50, 'merge_y_thres': 35}
+results = ocr.ocr(img_path, cls=True, slice=slice)
+
+# 加载图像
+image = Image.open(img_path).convert("RGB")
+draw = ImageDraw.Draw(image)
+font = ImageFont.truetype("./doc/fonts/simfang.ttf", size=20)  # 根据需要调整大小
+
+# 处理并绘制结果
+for res in results:
+    for line in res:
+        box = [tuple(point) for point in line[0]]  # 将列表转换为元组列表
+        # 将四个角转换为两个角
+        box = [(min(point[0] for point in box), min(point[1] for point in box)),
+               (max(point[0] for point in box), max(point[1] for point in box))]
+        txt = line[1][0]
+        draw.rectangle(box, outline="red", width=2)  # 绘制矩形
+        draw.text((box[0][0], box[0][1] - 25), txt, fill="blue", font=font)  # 在矩形上方绘制文本
+
+# 保存结果
+image.save("result.jpg")
+
+```
+
+此示例初始化了启用角度分类的PaddleOCR实例，并将语言设置为英语。然后调用`ocr`方法，并使用多个参数来自定义检测和识别过程，包括处理图像切片的`slice`参数。
+
+要更全面地了解切片操作，请参考[切片操作文档](./slice.md)。
+
 ## 3. 小结
 
 通过本节内容，相信您已经熟练掌握PaddleOCR whl包的使用方法并获得了初步效果。

diff --git a/doc/doc_ch/slice.md b/doc/doc_ch/slice.md
@@ -0,0 +1,21 @@
+# 切片操作
+
+如果希望运行 PaddleOCR 处理一张非常大的图像或文档，对其进行检测和识别，可以使用切片操作，如下所示：
+
+```python
+ocr_inst = PaddleOCR(**ocr_settings)
+results = ocr_inst.ocr(img, det=True, rec=True, slice=slice, cls=False, bin=False, inv=False, alpha_color=False)
+```
+
+其中，
+`slice = {'horizontal_stride': h_stride, 'vertical_stride': v_stride, 'merge_x_thres': x_thres, 'merge_y_thres': y_thres}`
+
+这里的 `h_stride`、`v_stride`、`x_thres` 和 `y_thres` 是用户可配置的参数，需要手动设置。切片操作符的工作原理是，在大图像上运行一个滑动窗口，创建图像的切片，并在这些切片上运行 OCR 算法。
+
+然后将这些切片级别的零散结果合并，生成图像级别的检测和识别结果。水平和垂直步幅不能低于一定限度，因为过低的值会产生太多切片，导致计算结果非常耗时。例如，对于尺寸为 6616x14886 的图像，推荐使用以下参数：
+
+```python
+slice = {'horizontal_stride': 300, 'vertical_stride': 500, 'merge_x_thres': 50, 'merge_y_thres': 35}
+```
+
+所有边界框接近 `merge_x_thres` 和 `merge_y_thres` 的切片级检测结果将被合并在一起。
diff --git a/doc/doc_en/quickstart_en.md b/doc/doc_en/quickstart_en.md
@@ -266,6 +266,46 @@ for idx in range(len(result)):
     im_show.save('result_page_{}.jpg'.format(idx))
 ```
 
+* Detection and Recognition Using Sliding Windows
+
+To perform OCR using sliding windows, the following code snippet can be employed:
+
+```Python
+from paddleocr import PaddleOCR
+from PIL import Image, ImageDraw, ImageFont
+
+# Initialize OCR engine
+ocr = PaddleOCR(use_angle_cls=True, lang="en")
+
+img_path = "./very_large_image.jpg"
+slice = {'horizontal_stride': 300, 'vertical_stride': 500, 'merge_x_thres': 50, 'merge_y_thres': 35}
+results = ocr.ocr(img_path, cls=True, slice=slice)
+
+# Load image
+image = Image.open(img_path).convert("RGB")
+draw = ImageDraw.Draw(image)
+font = ImageFont.truetype("./doc/fonts/simfang.ttf", size=20)  # Adjust size as needed
+
+# Process and draw results
+for res in results:
+    for line in res:
+        box = [tuple(point) for point in line[0]]  # Convert list of lists to list of tuples
+        # Convert four corners to two corners
+        box = [(min(point[0] for point in box), min(point[1] for point in box)),
+               (max(point[0] for point in box), max(point[1] for point in box))]
+        txt = line[1][0]
+        draw.rectangle(box, outline="red", width=2)  # Draw rectangle
+        draw.text((box[0][0], box[0][1] - 25), txt, fill="blue", font=font)  # Draw text above the box
+
+# Save result
+image.save("result.jpg")
+
+```
+
+This example initializes the PaddleOCR instance with angle classification enabled and sets the language to English. The `ocr` method is then called with several parameters to customize the detection and recognition process, including the `slice` parameter for handling image slices.
+
+For a more comprehensive understanding of the slicing operation, please refer to the [slice operation documentation](./slice_en.md).
+
 <a name="3"></a>
 
 ## 3. Summary