Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Tokenizer #220

Open
Argmaster opened this issue Jun 22, 2024 · 0 comments
Open

Refactor Tokenizer #220

Argmaster opened this issue Jun 22, 2024 · 0 comments

Comments

@Argmaster
Copy link
Owner

Currently tokenizer and token representation classes does not support dumping to some easy to compare format which could be used for testing. Additionally there is not simple way to add native implementation of tokenizer to provide optional speedup of tokenization process. Currently the idea would be to extract token filling logic from token classes and make them as simple containers as possible, hence allowing construction from C/C++/Rust implementation of tokenizer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant