Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documented but missing methods for some tokenizers #1664

Merged
merged 1 commit into from
Jun 13, 2024

Conversation

SamanehSaadat
Copy link
Member

@SamanehSaadat SamanehSaadat commented Jun 13, 2024

get_vocabulary, id_to_token and token_to_id are documented in keras.io guides for ByteTokenizer and UnicodeCodepointTokenizer but they don't have real implementations of these methods. This PR adds these methods.

Fixes #1631.

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Pro tip, you can leave something like Fixes #1234 in the PR description, if you want issue 1234 to close automatically when the PR is merged.

@SamanehSaadat
Copy link
Member Author

@mattdangerw Thanks for the review and the tip! Updated the description.

@SamanehSaadat SamanehSaadat merged commit 5a734f9 into keras-team:master Jun 13, 2024
12 checks passed
@SamanehSaadat SamanehSaadat deleted the id-token branch June 13, 2024 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Documented id_to_token doesn't exist for UnicodeCodepointTokenizer
2 participants