Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
解决导出表格标注时遇到的通用问题
可复现软件版本
可复现问题资源
问题1:导出表格标注时添加colspan和rowspan时的异常
导出表格标注时,从Excel中获取标注表格的格式,对合并列或是合并行添加colspan和rowspan时判断条件错误。
例如:一个5行、5列的表格,第一行的第2列和第3列是合并的,第4列和第5列是合并的,代码中for循环构建出来的html_list 列表第一行是不对的。
代码修改后经过60张表格图片导出标注的验证,所有图片均符合模型训练要求并完成模型训练。
问题2:导出的gt文件中gt属性中html标签合规的问题
在rebuild_html_from_ppstructure_label方法中,生成的新html,colspan和rowspan的值不符合html直接显示标准。colspan和rowspan的值必须是数字,html的内容是在convert_token生成的,生成时colspan和rowspan的值均是字符串(应该是为了模型训练)。
经过60张图片导出标注,将gt.txt中的gt属性拷贝到后缀为html的文件中,直接打开,在浏览器可以看到还原的表格,全部正确。