Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert to bytes #11

Open
NoviScl opened this issue Sep 13, 2021 · 1 comment
Open

Convert to bytes #11

NoviScl opened this issue Sep 13, 2021 · 1 comment

Comments

@NoviScl
Copy link

NoviScl commented Sep 13, 2021

Hi,

May I check the exact script that you used to convert strings into UTF-8 bytes?

@stefan-it
Copy link

stefan-it commented Sep 27, 2021

Hi @NoviScl ,

really good question, after "some" searching I'm 100% sure that seqio is used. More precisely the ByteVocabulary implementation from here:

https://github.com/google/seqio/blob/3fd3175537540f8e0ce7579d9ae7936721adc05d/seqio/vocabularies.py#L349

This class is then later initialized here in the byt5 library:

byt5/byt5/tasks.py

Lines 44 to 45 in 2f46814

"inputs": t5.data.Feature(vocabulary=t5.data.ByteVocabulary()),
"targets": t5.data.Feature(vocabulary=t5.data.ByteVocabulary())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants