解決recognition的train test分割程式執行後的文檔每行間多出一行空格 #11280

DingHsun · 2023-11-21T01:59:56Z

使用gen_ocr_train_val_test.py分割recognition data後產生的train.txt、val.txt和test.txt每行label間多出一行空格行(\n)，導致訓練時出現異常，移除換行\n後便可正常運行。

因多出一行空格行，導致以下error。
[2023/11/21 09:58:27] ppocr ERROR: When parsing line D:\PaddleOCR\train_data\rec\train\FAB06_input_Win 2000_crop_1.jpg l , error happened with msg: Traceback (most recent call last): File "D:\PaddleOCR\ppocr\data\simple_dataset.py", line 252, in __getitem__ data['ext_data'] = self.get_ext_data() File "D:\PaddleOCR\ppocr\data\simple_dataset.py", line 124, in get_ext_data label = substr[1] IndexError: list index out of range

使用gen_ocr_train_val_test.py分割recognition data後產生的train.txt、val.txt和test.txt每行label間多出一行空格，導致訓練時出現異常，移除換行\n後便可正常運行。

CLAassistant · 2023-11-21T02:00:01Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Wayne Huang seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

shiyutang · 2023-11-22T11:54:41Z

你好，感谢贡献，请签署CLA。

shiyutang

LGTM

xiaozhou0311 · 2023-12-05T08:12:41Z

023/12/05 15:58:17] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 3 iterations
[2023/12/05 15:58:33] ppocr ERROR: When parsing line

, error happened with msg: Traceback (most recent call last):
File "E:\AI_Code\PaddleOCR-2.7.1\ppocr\data\simple_dataset.py", line 150, in getitem
label = substr[1]
IndexError: list index out of range

[2023/12/05 15:58:37] ppocr INFO: cur metric, precision: 0.8888888888888888, recall: 1.0, hmean: 0.9411764705882353, fps: 1.2720649282331695
[2023/12/05 15:58:37] ppocr INFO: save best model is to ./output/ch_PP-OCR_V3_det/best_accuracy
[2023/12/05 15:58:37] ppocr INFO: best metric, hmean: 0.9411764705882353, is_float16: False, precision: 0.8888888888888888, recall: 1.0, fps: 1.2720649282331695, best_epoch: 1
[2023/12/05 15:58:53] ppocr ERROR: When parsing line

, error happened with msg: Traceback (most recent call last):
File "E:\AI_Code\PaddleOCR-2.7.1\ppocr\data\simple_dataset.py", line 150, in getitem
label = substr[1]
IndexError: list index out of range

[2023/12/05 15:58:57] ppocr INFO: cur metric, precision: 0.8888888888888888, recall: 1.0, hmean: 0.9411764705882353, fps: 1.2869010204678446 为什么加载数据的时候没有出现 label = substr[1]，训练的时候就出现了

解決recognition的train test分割程式執行後的文檔每行間多出一行空格

a830237

使用gen_ocr_train_val_test.py分割recognition data後產生的train.txt、val.txt和test.txt每行label間多出一行空格，導致訓練時出現異常，移除換行\n後便可正常運行。

shiyutang added the Contributor PR is merged label Nov 22, 2023

shiyutang approved these changes Nov 22, 2023

View reviewed changes

shiyutang self-assigned this Nov 22, 2023

JiaXiao243 merged commit 80459f5 into PaddlePaddle:dygraph Nov 22, 2023
1 of 2 checks passed

paddle-bot bot added the contributor label Mar 8, 2024

paddle-bot bot assigned Sunting78 Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

解決recognition的train test分割程式執行後的文檔每行間多出一行空格 #11280

解決recognition的train test分割程式執行後的文檔每行間多出一行空格 #11280

DingHsun commented Nov 21, 2023

CLAassistant commented Nov 21, 2023

shiyutang commented Nov 22, 2023

shiyutang left a comment

xiaozhou0311 commented Dec 5, 2023

解決recognition的train test分割程式執行後的文檔每行間多出一行空格 #11280

解決recognition的train test分割程式執行後的文檔每行間多出一行空格 #11280

Conversation

DingHsun commented Nov 21, 2023

CLAassistant commented Nov 21, 2023

shiyutang commented Nov 22, 2023

shiyutang left a comment

Choose a reason for hiding this comment

xiaozhou0311 commented Dec 5, 2023