performance-test: suggest a more comparable loading time for non-eager and eager execution #1319

dafnapension · 2024-11-02T20:24:56Z

Currently, Load Time reported in the performance test summary is the cumulative time spent in loader.load_data() . This is good for all types of loaders.
However, still within Loader.load_data(), immediately following the download of the dataset (from HF, for example, in case of LoadHF), comes MultiStream.from_iterables, the first introduction of the downloaded data into unitxt's recipe. In this introduction, eager-execution differs in its action from non-eager-execution:
In eager-mode, unitxt loops over each and every instance, to include it into the ListStreams constituting the MultiStream being generated. Whereas in the non-eager mode, unitxt just creates generators, for the GeneratorStreams constituting the MultiStream generated in this case, and returns from load_data().

Now, with load_iterables recently introduced into loaders.py by @elronbandel , we can clearly fetch net load time of datasets, excluding the introduction of each and every instance into unitxt that is only done in eager mode.

Here are demonstrating snakeviz-s:
For eager mode, more time is spent in Loader.load_data() than in LoaderHF.load_iterables:

Whereas for non-eager mode -- the same (the time to generate the generators is unnoticeable):

dafnapension · 2024-11-02T20:45:36Z

Hi @elronbandel ,
I am not sure how important is the issue raised here. We usually compare performance of PR against main either when both are eager or both are non-eager, so why bother. On the other hand - why not be accurate, and have each mode 'pay' for its going over the fresh instances in its reported Net Time. What is your view about this?

… execution Signed-off-by: dafnapension <dafnashein@yahoo.com>

…e of dataset, excluding the introduction of each and every instance into unitxt that is only done in eager mode Signed-off-by: dafnapension <dafnashein@yahoo.com>

dafnapension force-pushed the pure_loading branch 6 times, most recently from 3983657 to b070ca1 Compare November 7, 2024 15:49

performance-test: suggest a more canonic time for non-eager and eager…

9a284d6

… execution Signed-off-by: dafnapension <dafnashein@yahoo.com>

dafnapension force-pushed the pure_loading branch 2 times, most recently from b0e7350 to 5c95302 Compare November 7, 2024 18:29

now, with load_iterable in loaders, we can clearly fetch net load tim…

8463941

…e of dataset, excluding the introduction of each and every instance into unitxt that is only done in eager mode Signed-off-by: dafnapension <dafnashein@yahoo.com>

dafnapension force-pushed the pure_loading branch from 5c95302 to 8463941 Compare November 7, 2024 20:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance-test: suggest a more comparable loading time for non-eager and eager execution #1319

performance-test: suggest a more comparable loading time for non-eager and eager execution #1319

dafnapension commented Nov 2, 2024 •

edited

Loading

dafnapension commented Nov 2, 2024 •

edited

Loading

performance-test: suggest a more comparable loading time for non-eager and eager execution #1319

Are you sure you want to change the base?

performance-test: suggest a more comparable loading time for non-eager and eager execution #1319

Conversation

dafnapension commented Nov 2, 2024 • edited Loading

dafnapension commented Nov 2, 2024 • edited Loading

dafnapension commented Nov 2, 2024 •

edited

Loading

dafnapension commented Nov 2, 2024 •

edited

Loading