About time in results dataframe #647

toncho11 · 2024-09-23T08:12:22Z

I think the documentation of MOABB is not very clear on the "time" column. Please point me to a source if I am wrong.

I have 2 pipelines that take 53 minutes to run on many datasets and subjects. I do "WithinSession". So each line in the results is a session. The "time" column comprises of both the training and classification (not obvious).

But the time is not by session time, but by fold (thanks @gcattan). It is line

moabb/moabb/evaluations/evaluations.py

Line 262 in 0ee8eb6

"time": duration / 5.0, # 5 fold CV

So if I sum the "time" column in the results I get something like 5 minutes. Now I need to multiply this by 5 to get the total time spent on training and classification. So now 53 minutes of total run - 25 minutes is 28 minutes.

Is the above reasoning correct? Does multiplying by 5 really gives us the total time spent on training and classification?
So what are these 28 minutes indeed? Is this time spent on IO (loading data) and filtering (for the paradigm) and maybe other pre-processing steps?
So we usually get the mean time of a fold with:
print(results.groupby("pipeline").mean("score")[["score", "time"]])
because the mean of the "time" column is better estimation than the total time?

The text was updated successfully, but these errors were encountered:

PierreGtch · 2024-09-24T20:19:21Z

Hi @toncho11,

The “time” corresponds to the average time it takes to train and test the pipeline on one CV fold.
Indeed, loading the data is NOT counted in this time column, so the remaining 28 minutes are for loading and pre-processing the data. (note that loading and preprocessing the data is only done once for all the pipelines).

This “time” column allows you to compare the different pipelines together, not plan how long an experiment will take.

gcattan · 2024-09-25T07:16:58Z

Hi @PierreGtch . Thanks for your answer. It would be great if this is documented in the evaluation classes!

toncho11 · 2024-09-25T12:48:58Z

Thank you @PierreGtch! Confirming all this was very important!

I just wanted to correct my previous query. It should be:

print(results.groupby("pipeline").mean()[["score", "time"]])

toncho11 · 2024-11-29T21:02:42Z

Also time is reported in seconds. For example 0.18 in the time column means 180 milliseconds (average time per fold).

PierreGtch closed this as completed Sep 24, 2024

PierreGtch mentioned this issue Sep 25, 2024

Document the meaning of the different result columns #649

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About time in results dataframe #647

About time in results dataframe #647

toncho11 commented Sep 23, 2024 •

edited

Loading

PierreGtch commented Sep 24, 2024

gcattan commented Sep 25, 2024

toncho11 commented Sep 25, 2024

toncho11 commented Nov 29, 2024

About time in results dataframe #647

About time in results dataframe #647

Comments

toncho11 commented Sep 23, 2024 • edited Loading

PierreGtch commented Sep 24, 2024

gcattan commented Sep 25, 2024

toncho11 commented Sep 25, 2024

toncho11 commented Nov 29, 2024

toncho11 commented Sep 23, 2024 •

edited

Loading