tokenization principles #119
Unanswered
arademaker
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
In Portuguese, Monday is segunda-feira. All week days are compounds with hífen. They are semantically compositional although feira is a Latin word.
How do you tokenize them? Any justification for breaking it into ADJ NOUN?
Somehow related to it, how to deal with productive prefixes like ex- and vice-
Beta Was this translation helpful? Give feedback.
All reactions