Evaluator column loading #200

lvwerra · 2022-07-24T11:30:23Z

Instead of loading the pipeline inputs into memory this PR just wraps the original dataset in a DatasetColumn class. This allows to evaluate on huge datasets (e.g. ImageNet) without OOM errors.

cc @fxmarty

HuggingFaceDocBuilderDev · 2022-07-24T11:33:56Z

The documentation is not available anymore as the PR was closed or merged.

fxmarty · 2022-07-24T14:06:25Z

Great I'll test tomorrow morning!

fxmarty · 2022-07-25T07:40:17Z

src/evaluate/evaluator/token_classification.py

@@ -119,7 +121,8 @@ def prepare_data(self, data: Union[str, Dataset], input_column: str, label_colum
            references = data[label_column]

        metric_inputs = {"references": references}
-        pipeline_inputs = [join_by.join(element) for element in data[input_column]]
+        data = data.map(lambda x: {input_column: join_by.join(x[input_column])})


Out of curiosity: is this faster than a list comprehension?

I don't think so, but it will not load the data into memory.

* add `DatasetColumn` class * add __iter__ * make style * adapt QA * adapt NER

leandro added 5 commits July 22, 2022 13:39

add DatasetColumn class

91b5f1e

add __iter__

90c0e84

make style

6a2f8f7

adapt QA

181b256

adapt NER

b418e90

fxmarty reviewed Jul 25, 2022

View reviewed changes

lvwerra merged commit 4a78290 into main Jul 25, 2022

lvwerra deleted the evaluator-column-loading branch July 25, 2022 09:21

mathemakitten pushed a commit that referenced this pull request Aug 3, 2022

Evaluator column loading (#200)

68354e7

* add `DatasetColumn` class * add __iter__ * make style * adapt QA * adapt NER

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluator column loading #200

Evaluator column loading #200

lvwerra commented Jul 24, 2022

HuggingFaceDocBuilderDev commented Jul 24, 2022 •

edited

Loading

fxmarty commented Jul 24, 2022

fxmarty Jul 25, 2022

lvwerra Jul 25, 2022 •

edited

Loading

Evaluator column loading #200

Evaluator column loading #200

Conversation

lvwerra commented Jul 24, 2022

HuggingFaceDocBuilderDev commented Jul 24, 2022 • edited Loading

fxmarty commented Jul 24, 2022

fxmarty Jul 25, 2022

Choose a reason for hiding this comment

lvwerra Jul 25, 2022 • edited Loading

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jul 24, 2022 •

edited

Loading

lvwerra Jul 25, 2022 •

edited

Loading