Skip to content

Commit

Permalink
docs: Update the pdf file path in the operation demonstration (Paddle…
Browse files Browse the repository at this point in the history
  • Loading branch information
Gmgge authored Aug 2, 2024
1 parent 9c19e6d commit d69bf81
Show file tree
Hide file tree
Showing 8 changed files with 26 additions and 26 deletions.
4 changes: 2 additions & 2 deletions docs/ppstructure/model_train/recovery_to_doc.en.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,14 +86,14 @@ pip3 install pdf2docx-0.0.0-py3-none-any.whl
```bash linenums="1"
# install paddleocr
pip3 install "paddleocr>=2.6"
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```

Command line:

```bash linenums="1"
python3 predict_system.py \
--image_dir=ppstructure/recovery/UnrealText.pdf \
--image_dir=ppstructure/docs/recovery/UnrealText.pdf \
--recovery=True \
--use_pdf2docx_api=True \
--output=../output/
Expand Down
6 changes: 3 additions & 3 deletions docs/ppstructure/model_train/recovery_to_doc.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,14 +84,14 @@ pip3 install pdf2docx-0.0.0-py3-none-any.whl
```bash linenums="1"
# 安装 paddleocr,推荐使用2.6版本
pip3 install "paddleocr>=2.6"
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```

通过命令行的方式:

```bash linenums="1"
python3 predict_system.py \
--image_dir=ppstructure/recovery/UnrealText.pdf \
--image_dir=ppstructure/docs/recovery/UnrealText.pdf \
--recovery=True \
--use_pdf2docx_api=True \
--output=../output/
Expand All @@ -117,7 +117,7 @@ paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=t
# 英文测试图
paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=true --lang='en'
# pdf测试文件
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
```

### 4.1 下载模型
Expand Down
6 changes: 3 additions & 3 deletions docs/ppstructure/quick_start.en.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ Two layout recovery methods are provided, For detailed usage tutorials, please r
Recovery by using PDF parse (only support pdf as input):

```bash linenums="1"
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```

Recovery by using OCR:
Expand Down Expand Up @@ -171,7 +171,7 @@ from paddleocr import PPStructure,save_structure_res
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)

save_folder = './output'
img_path = 'ppstructure/recovery/UnrealText.pdf'
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'
result = ocr_engine(img_path)
for index, res in enumerate(result):
save_structure_res(res, save_folder, os.path.basename(img_path).split('.')[0], index)
Expand All @@ -193,7 +193,7 @@ from PIL import Image
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)

save_folder = './output'
img_path = 'ppstructure/recovery/UnrealText.pdf'
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'

fitz = try_import("fitz")
imgs = []
Expand Down
10 changes: 5 additions & 5 deletions docs/ppstructure/quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ paddleocr --image_dir=ppstructure/docs/table/table.jpg --type=structure --layout
通过PDF解析(只支持pdf格式的输入):

```bash linenums="1"
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```

通过OCR技术:
Expand All @@ -89,7 +89,7 @@ paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --rec
通过PDF解析(只支持pdf格式的输入):

```bash linenums="1"
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```

通过OCR技术:
Expand All @@ -100,7 +100,7 @@ paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=t
# 英文测试图
paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=true --lang='en'
# pdf测试文件
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
```

### 2.2 Python脚本使用
Expand Down Expand Up @@ -189,7 +189,7 @@ from paddleocr import PPStructure,save_structure_res
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)

save_folder = './output'
img_path = 'ppstructure/recovery/UnrealText.pdf'
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'
result = ocr_engine(img_path)
for index, res in enumerate(result):
save_structure_res(res, save_folder, os.path.basename(img_path).split('.')[0], index)
Expand All @@ -211,7 +211,7 @@ from PIL import Image
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)

save_folder = './output'
img_path = 'ppstructure/recovery/UnrealText.pdf'
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'

fitz = try_import("fitz")
imgs = []
Expand Down
10 changes: 5 additions & 5 deletions ppstructure/docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ paddleocr --image_dir=ppstructure/docs/table/table.jpg --type=structure --layout
通过PDF解析(只支持pdf格式的输入):

```bash
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```

通过OCR技术:
Expand All @@ -112,7 +112,7 @@ paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --rec
通过PDF解析(只支持pdf格式的输入):

```bash
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```

通过OCR技术:
Expand All @@ -123,7 +123,7 @@ paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=t
# 英文测试图
paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=true --lang='en'
# pdf测试文件
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
```

<a name="22"></a>
Expand Down Expand Up @@ -217,7 +217,7 @@ from paddleocr import PPStructure,save_structure_res
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)

save_folder = './output'
img_path = 'ppstructure/recovery/UnrealText.pdf'
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'
result = ocr_engine(img_path)
for index, res in enumerate(result):
save_structure_res(res, save_folder, os.path.basename(img_path).split('.')[0], index)
Expand All @@ -239,7 +239,7 @@ from PIL import Image
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)

save_folder = './output'
img_path = 'ppstructure/recovery/UnrealText.pdf'
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'

fitz = try_import("fitz")
imgs = []
Expand Down
6 changes: 3 additions & 3 deletions ppstructure/docs/quickstart_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ Two layout recovery methods are provided, For detailed usage tutorials, please r
Recovery by using PDF parse (only support pdf as input):

```bash
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```

Recovery by using OCR:
Expand Down Expand Up @@ -200,7 +200,7 @@ from paddleocr import PPStructure,save_structure_res
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)

save_folder = './output'
img_path = 'ppstructure/recovery/UnrealText.pdf'
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'
result = ocr_engine(img_path)
for index, res in enumerate(result):
save_structure_res(res, save_folder, os.path.basename(img_path).split('.')[0], index)
Expand All @@ -222,7 +222,7 @@ from PIL import Image
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)

save_folder = './output'
img_path = 'ppstructure/recovery/UnrealText.pdf'
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'

fitz = try_import("fitz")
imgs = []
Expand Down
4 changes: 2 additions & 2 deletions ppstructure/recovery/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,14 +110,14 @@ pip3 install pdf2docx-0.0.0-py3-none-any.whl
```bash
# install paddleocr
pip3 install "paddleocr>=2.6"
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```

Command line:

```bash
python3 predict_system.py \
--image_dir=ppstructure/recovery/UnrealText.pdf \
--image_dir=ppstructure/docs/recovery/UnrealText.pdf \
--recovery=True \
--use_pdf2docx_api=True \
--output=../output/
Expand Down
6 changes: 3 additions & 3 deletions ppstructure/recovery/README_ch.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,14 +106,14 @@ pip3 install pdf2docx-0.0.0-py3-none-any.whl
```bash
# 安装 paddleocr,推荐使用2.6版本
pip3 install "paddleocr>=2.6"
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```

通过命令行的方式:

```bash
python3 predict_system.py \
--image_dir=ppstructure/recovery/UnrealText.pdf \
--image_dir=ppstructure/docs/recovery/UnrealText.pdf \
--recovery=True \
--use_pdf2docx_api=True \
--output=../output/
Expand Down Expand Up @@ -142,7 +142,7 @@ paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=t
# 英文测试图
paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=true --lang='en'
# pdf测试文件
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
```

<a name="4.1"></a>
Expand Down

0 comments on commit d69bf81

Please sign in to comment.