Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

中文训练标签以及横竖排样本预处理 #16

Open
CC-todo opened this issue Mar 9, 2022 · 1 comment
Open

中文训练标签以及横竖排样本预处理 #16

CC-todo opened this issue Mar 9, 2022 · 1 comment

Comments

@CC-todo
Copy link

CC-todo commented Mar 9, 2022

你好,我这里有两个问题需要请教下:
1、中文标签训练时编码为什么格式,我训练时索引输出错误,索引往后偏移了3或4位不固定
2、代码里train时横竖排预处理都是为宽256 高64,这样不会把竖排样本变形吗

@RuijieJ
Copy link
Owner

RuijieJ commented Mar 9, 2022

  1. 中文标签格式和英文是一样的,都是“路径 文本”的形式,用的utf-8编码
  2. 训练多方向文字的时候得调整一下代码,我的方法是构建2个dataloader,分别把横竖排样本归一化到64256和25664,然后训练的时候从这两个dataloader里读样本,大体上代码长这样:
# h_loader和v_loader分别是横排样本和竖排样本的dataloader
count_h, count_v = len(self.h_loader), len(self.v_loader)
h_iter, v_iter = iter(self.h_loader), iter(self.v_loader)
while count_h > 0 or count_v > 0:
    if random.random() < count_h / (count_h +count_v):
        ims, texts = h_iter.next()
        count_h -= 1
    else:
        ims, texts = v_iter.next()
        count_v -= 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants