We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I have read the README carefully. 我已经仔细阅读了 README 上的操作指引。
I have pulled the latest code of main branch to run again and the problem still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。
使用language_id_score_filter算子可以过滤得到中文或英文,但如果想同时保留中英文应该怎么做? 显然如果复制2份language_id_score_filter算子,第一步过滤英文,第二步过滤中文是不可行的,因为在第一步就把中文过滤掉了
No response
The text was updated successfully, but these errors were encountered:
嗨,感谢你的建议!
目前language_id_score_filter的确只能保留某一种语言,但是你的这个建议我们认为非常好,我们会考虑在之后使这个算子支持保留多种语言的样本,不过这可能需要一些开发时间~
期望你继续保持关注!
Sorry, something went wrong.
你好,现在main分支的最新版本代码中,language_id_score_filter算子已经支持了同时保留多种语言,一个例子如下:
process: - language_id_score_filter: lang: [en, zh] # 参数为待保留的多种语言的列表 min_score: 0.9
再次感谢你的建议~
HYLcool
Successfully merging a pull request may close this issue.
Before Asking 在提问之前
I have read the README carefully. 我已经仔细阅读了 README 上的操作指引。
I have pulled the latest code of main branch to run again and the problem still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。
Search before asking 先搜索,再提问
Question
使用language_id_score_filter算子可以过滤得到中文或英文,但如果想同时保留中英文应该怎么做?
显然如果复制2份language_id_score_filter算子,第一步过滤英文,第二步过滤中文是不可行的,因为在第一步就把中文过滤掉了
Additional 额外信息
No response
The text was updated successfully, but these errors were encountered: