Tokenizer for ConveRT #4978
Labels
area:rasa-oss 🎡
Anything related to the open source Rasa framework
type:enhancement ✨
Additions of new features or changes to existing ones, should be doable in a single PR
Implement a tokenizer for ConveRT that allows us to use embedding for ConveRT in a sequence-fashion, for example, for the
CRFEntityExtractor
.Problem: ConveRT tokenizes words into subwords and adds special characters. Thus, the token start and end does not match the entities. We need to work on an alignment so that the tokens from ConveRT match the entities.
The text was updated successfully, but these errors were encountered: