-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FastTokenizer] Add unittest for fast_tokenizer #4339
Conversation
Thanks for your contribution! |
Codecov Report
@@ Coverage Diff @@
## develop #4339 +/- ##
===========================================
+ Coverage 36.30% 38.69% +2.38%
===========================================
Files 419 422 +3
Lines 59254 59661 +407
===========================================
+ Hits 21514 23087 +1573
+ Misses 37740 36574 -1166
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
单测CI 通过
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
高质量的PR,点赞!几个小问题
@@ -68,10 +73,38 @@ def get_input_output_texts(self, tokenizer): | |||
|
|||
def test_full_tokenizer(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
所有的单测我建议把tokenizer和fast_tokenizer分开,即分test_tokenizer和test_fast_tokenizer两个method. 如果有错,比较容易debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done,新增test_fast_and_python_full_tokenizer单测,用于比较fast和python的tokenizer的输出是否一致。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个PR我ok的,@wj-Mcat 也看看
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
其它我觉得都 OK,就包名的小问题。
另外,fast-tokenizer的自动化打包发包的事情后续是否有必要添加到github action workflow中去吗?github action 中支持os matrix 来提供多平台自动化打包编译,不过最后还是看你们的安排。
pytest-xdist | ||
fast_tokenizer_python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我查了下,包名应该是:fast-tokenizer-python
,具体可见:https://pypi.org/project/fast-tokenizer-python/
我本地运行: pip install fast_tokenizer_python
会直接报错。
PR types
Bug fixes
PR changes
Others
Description
Add unittest for fast_tokenizer