Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Fixes pretrained embeddings for transformers that don't have end tokens #4732

Merged
merged 13 commits into from
Nov 11, 2020

Conversation

dirkgr
Copy link
Member

@dirkgr dirkgr commented Oct 15, 2020

Fixes #4649.

@dirkgr dirkgr added this to the 1.2 milestone Oct 15, 2020
@dirkgr dirkgr requested a review from AkshitaB October 15, 2020 22:35
@dirkgr
Copy link
Member Author

dirkgr commented Oct 15, 2020

Unfortunately this turns out not to be so easy, because the Huggingface tokenizer for T5 is a little broken. I'll convert this to a draft and tackle it in 1.3.

@dirkgr dirkgr modified the milestones: 1.2, 1.3 Oct 15, 2020
@dirkgr dirkgr marked this pull request as draft October 15, 2020 23:01
@dirkgr
Copy link
Member Author

dirkgr commented Nov 10, 2020

This is the issue on the huggingface side: huggingface/transformers#7840

@dirkgr dirkgr marked this pull request as ready for review November 11, 2020 00:14
@dirkgr dirkgr mentioned this pull request Nov 11, 2020
@dirkgr dirkgr merged commit f27ef38 into master Nov 11, 2020
@dirkgr dirkgr deleted the NoEndToken branch November 11, 2020 00:44
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Problem with PretrainedTransformerEmbedder and models such as T5
2 participants