Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

解決recognition的train test分割程式執行後的文檔每行間多出一行空格 #11280

Merged
merged 1 commit into from
Nov 22, 2023

Conversation

DingHsun
Copy link
Contributor

使用gen_ocr_train_val_test.py分割recognition data後產生的train.txt、val.txt和test.txt每行label間多出一行空格行(\n),導致訓練時出現異常,移除換行\n後便可正常運行。

因多出一行空格行,導致以下error。
[2023/11/21 09:58:27] ppocr ERROR: When parsing line D:\PaddleOCR\train_data\rec\train\FAB06_input_Win 2000_crop_1.jpg l , error happened with msg: Traceback (most recent call last): File "D:\PaddleOCR\ppocr\data\simple_dataset.py", line 252, in __getitem__ data['ext_data'] = self.get_ext_data() File "D:\PaddleOCR\ppocr\data\simple_dataset.py", line 124, in get_ext_data label = substr[1] IndexError: list index out of range

使用gen_ocr_train_val_test.py分割recognition data後產生的train.txt、val.txt和test.txt每行label間多出一行空格,導致訓練時出現異常,移除換行\n後便可正常運行。
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Wayne Huang seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@shiyutang
Copy link
Collaborator

你好,感谢贡献,请签署CLA。

Copy link
Collaborator

@shiyutang shiyutang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shiyutang shiyutang self-assigned this Nov 22, 2023
@JiaXiao243 JiaXiao243 merged commit 80459f5 into PaddlePaddle:dygraph Nov 22, 2023
1 of 2 checks passed
@xiaozhou0311
Copy link

023/12/05 15:58:17] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 3 iterations
[2023/12/05 15:58:33] ppocr ERROR: When parsing line

, error happened with msg: Traceback (most recent call last):
File "E:\AI_Code\PaddleOCR-2.7.1\ppocr\data\simple_dataset.py", line 150, in getitem
label = substr[1]
IndexError: list index out of range

[2023/12/05 15:58:37] ppocr INFO: cur metric, precision: 0.8888888888888888, recall: 1.0, hmean: 0.9411764705882353, fps: 1.2720649282331695
[2023/12/05 15:58:37] ppocr INFO: save best model is to ./output/ch_PP-OCR_V3_det/best_accuracy
[2023/12/05 15:58:37] ppocr INFO: best metric, hmean: 0.9411764705882353, is_float16: False, precision: 0.8888888888888888, recall: 1.0, fps: 1.2720649282331695, best_epoch: 1
[2023/12/05 15:58:53] ppocr ERROR: When parsing line

, error happened with msg: Traceback (most recent call last):
File "E:\AI_Code\PaddleOCR-2.7.1\ppocr\data\simple_dataset.py", line 150, in getitem
label = substr[1]
IndexError: list index out of range

[2023/12/05 15:58:57] ppocr INFO: cur metric, precision: 0.8888888888888888, recall: 1.0, hmean: 0.9411764705882353, fps: 1.2869010204678446 为什么加载数据的时候没有出现 label = substr[1],训练的时候就出现了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants