Dimension mismatch after setting max sequence length #1154

FanW123 · 2019-11-20T20:46:39Z

Summary:
TokenTensorizer and ByteTokenTensorizer has difference way of handling max sequence length. Usually this won't cause any problem if the model does not use the two tensorizers to process inputs and targets.
But for the smart keyboard, it uses TokenTensorizer to process labels and ByteTokenTensorizer to process text inputs. This will cause dimension mismatch if the sentence is longer than the max sequence length.

TokenTokenizer: len(<EOS> + tokens + <BOS>) <= max sequence length
ByteTokenTensorizer: text <= max sequence length

This diff is to change the way ByteTokenTensorizer truncate text to max sequence length the same as TokenTokenize.

Reviewed By: psuzhanhy

Differential Revision: D18566684

facebook-github-bot · 2019-11-20T20:46:56Z

This pull request was exported from Phabricator. Differential Revision: D18566684

…h#1154) Summary: Pull Request resolved: facebookresearch#1154 TokenTensorizer and ByteTokenTensorizer has difference way of handling max sequence length. Usually this won't cause any problem if the model does not use the two tensorizers to process inputs and targets. But for the smart keyboard, it uses TokenTensorizer to process labels and ByteTokenTensorizer to process text inputs. This will cause dimension mismatch if the sentence is longer than the max sequence length. ``` TokenTokenizer: len(<EOS> + tokens + <BOS>) <= max sequence length ByteTokenTensorizer: len(tokens) <= max sequence length ``` This diff is to change the way ByteTokenTensorizer truncate text to max sequence length the same as TokenTokenize. Reviewed By: psuzhanhy Differential Revision: D18566684 fbshipit-source-id: c5f6e0668c383bdc8eec1cf108466b356cd9adcb

facebook-github-bot · 2019-11-21T00:37:24Z

This pull request was exported from Phabricator. Differential Revision: D18566684

facebook-github-bot · 2019-11-21T20:46:44Z

This pull request has been merged in 98e6761.

facebook-github-bot added CLA Signed Do not delete this pull request or issue due to inactivity. fb-exported labels Nov 20, 2019

FanW123 force-pushed the export-D18566684 branch from 69bfb1e to e097852 Compare November 21, 2019 00:37

facebook-github-bot closed this in 98e6761 Nov 21, 2019

facebook-github-bot added the Merged label Nov 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dimension mismatch after setting max sequence length #1154

Dimension mismatch after setting max sequence length #1154

FanW123 commented Nov 20, 2019

facebook-github-bot commented Nov 20, 2019

facebook-github-bot commented Nov 21, 2019

facebook-github-bot commented Nov 21, 2019

Dimension mismatch after setting max sequence length #1154

Dimension mismatch after setting max sequence length #1154

Conversation

FanW123 commented Nov 20, 2019

facebook-github-bot commented Nov 20, 2019

facebook-github-bot commented Nov 21, 2019

facebook-github-bot commented Nov 21, 2019