Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Processor Stages in FARM #645

Closed
8 tasks done
tholor opened this issue Dec 1, 2020 · 1 comment
Closed
8 tasks done

Refactor Processor Stages in FARM #645

tholor opened this issue Dec 1, 2020 · 1 comment
Assignees

Comments

@tholor
Copy link
Member

tholor commented Dec 1, 2020

We currently have dict_to_samples and samples_to_features as two very separate stages.
For some use cases (e.g. fast tokenizers) this is not ideal. Let's redesign this crucial part in FARM.

  • @Timoeller implement "dataset_from_dicts" and do basic cleaning for squad => "boilerplate for dataset_from_dicts for other processors", "get rid of dict_to_samples etc. and just call dataset_from_dicts"
  • @brandenchan : refactor text classification
  • @tholor : refactor lm finetuning
  • @Timoeller: identify speed bottlenecks beside tokenization
  • @tholor @Timoeller @brandenchan : optimize those bottlenecks (vectorization / batching)
  • @brandenchan refactor ner
  • @brandenchan benchmark ner
  • split NQ and SQuAD processors in a cleaner way
@Timoeller
Copy link
Contributor

Fixed with #649

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants