Skip to content
This repository has been archived by the owner on Nov 22, 2022. It is now read-only.

BERTTensorizerBaseImpl to reimplement BERTTensorizerBase to be TorchScriptable #1163

Conversation

chenyangyu1988
Copy link
Contributor

Summary:
BERTTensorizerBaseImpl to reimplement BERTTensorizerBase to be TorchScriptable
Over design:

PyText Tensorizer (for example: RoBERTaTensorizer) will delegate the numberize and tensorize logic to Scripted Tensorizer Implementation (for example: RoBERTaTensorizerImpl)

This requires to reimplement numberize() and tensorize() logic in Torchscriptable, but good news is that we already have such implementation in pytext/torchscript/tensorizer, we just need to make minor change.

On the PyText Tensorizer side, it will delegate numberize and tensorize logic to tensorizer_impl.

def numberize(self, row: Dict) -> Tuple[Any, ...]:
	per_sentence_tokens = [
            self.tokenizer.tokenize(row[column]) for column in self.columns
        ]
        return self.tensorizer_impl.numberize(per_sentence_tokens)

def tensorize(self, batch) -> Tuple[torch.Tensor, ...]:
	tokens, segment_labels, seq_lens, positions = zip(*batch)
        return self.tensorizer_impl.tensorize(
            tokens, segment_labels, seq_lens, positions
        )

Differential Revision: D18651538

@facebook-github-bot facebook-github-bot added CLA Signed Do not delete this pull request or issue due to inactivity. fb-exported labels Nov 22, 2019
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18651538

…criptable (facebookresearch#1163)

Summary:
Pull Request resolved: facebookresearch#1163

BERTTensorizerBaseImpl to reimplement BERTTensorizerBase to be TorchScriptable
Over design:

PyText Tensorizer (for example: RoBERTaTensorizer) will delegate the numberize and tensorize logic to Scripted Tensorizer Implementation (for example: RoBERTaTensorizerImpl)

This requires to reimplement numberize() and tensorize() logic in Torchscriptable, but good news is that we already have such implementation in pytext/torchscript/tensorizer, we just need to make minor change.

On the PyText Tensorizer side, it will delegate numberize and tensorize logic to tensorizer_impl.
```
def numberize(self, row: Dict) -> Tuple[Any, ...]:
	per_sentence_tokens = [
            self.tokenizer.tokenize(row[column]) for column in self.columns
        ]
        return self.tensorizer_impl.numberize(per_sentence_tokens)

def tensorize(self, batch) -> Tuple[torch.Tensor, ...]:
	tokens, segment_labels, seq_lens, positions = zip(*batch)
        return self.tensorizer_impl.tensorize(
            tokens, segment_labels, seq_lens, positions
        )
```

Reviewed By: rutyrinott

Differential Revision: D18651538

fbshipit-source-id: fdb5bb099cd3a4894f90df460650398516177220
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18651538

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 39467dc.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed Do not delete this pull request or issue due to inactivity. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants