Refactor Processor Stages in FARM #645

tholor · 2020-12-01T15:28:27Z

We currently have dict_to_samples and samples_to_features as two very separate stages.
For some use cases (e.g. fast tokenizers) this is not ideal. Let's redesign this crucial part in FARM.

@Timoeller implement "dataset_from_dicts" and do basic cleaning for squad => "boilerplate for dataset_from_dicts for other processors", "get rid of dict_to_samples etc. and just call dataset_from_dicts"
@brandenchan : refactor text classification
@tholor : refactor lm finetuning
@Timoeller: identify speed bottlenecks beside tokenization
@tholor @Timoeller @brandenchan : optimize those bottlenecks (vectorization / batching)
@brandenchan refactor ner
@brandenchan benchmark ner
split NQ and SQuAD processors in a cleaner way

Timoeller · 2021-01-06T16:41:11Z

Fixed with #649

tholor assigned brandenchan, tholor and Timoeller Dec 2, 2020

Timoeller closed this as completed Jan 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Processor Stages in FARM #645

Refactor Processor Stages in FARM #645

tholor commented Dec 1, 2020 •

edited by Timoeller

Loading

Timoeller commented Jan 6, 2021

Refactor Processor Stages in FARM #645

Refactor Processor Stages in FARM #645

Comments

tholor commented Dec 1, 2020 • edited by Timoeller Loading

Timoeller commented Jan 6, 2021

tholor commented Dec 1, 2020 •

edited by Timoeller

Loading