ev: Batch data for faster evaluation. #3051

lostella · 2023-11-10T21:07:50Z

Description of changes: Add batching to gluonts.model.evaluation, reaching speedups of ~6x in some cases (speedup depends also on how many metrics are being evaluated, more metrics => more speedup). The following example is using m4_daily data:

4227it [00:46, 91.83it/s]
evaluation (batch_size=1): 46.04444639701978
4227it [00:27, 152.03it/s]
evaluation (batch_size=2): 27.80498409701977
4227it [00:15, 277.03it/s]
evaluation (batch_size=5): 15.259674712986453
4227it [00:11, 371.23it/s]
evaluation (batch_size=10): 11.387762713013217
4227it [00:09, 451.05it/s]
evaluation (batch_size=20): 9.372666671988554
4227it [00:08, 512.34it/s]
evaluation (batch_size=50): 8.252159972995287
4227it [00:07, 546.00it/s]
evaluation (batch_size=100): 7.743695029988885
4227it [00:07, 561.83it/s]
evaluation (batch_size=200): 7.524872001988115
4227it [00:07, 576.23it/s]
evaluation (batch_size=500): 7.336895904998528

Code:

import timeit
import numpy as np
import pandas as pd

from gluonts.ev.metrics import (
    SMAPE,
    MASE,
    NRMSE,
    ND,
    MeanWeightedSumQuantileLoss,
    AverageMeanScaledQuantileLoss,
    MAECoverage,
)
from gluonts.dataset.repository import get_dataset
from gluonts.dataset.split import split
from gluonts.model.evaluation import evaluate_forecasts
from gluonts.model import SampleForecast


quantile_levels = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]

METRICS = [
    SMAPE(),
    MASE(),
    NRMSE(),
    ND(),
    MeanWeightedSumQuantileLoss(quantile_levels=quantile_levels),
    AverageMeanScaledQuantileLoss(quantile_levels=quantile_levels),
    MAECoverage(quantile_levels=quantile_levels),
]


if __name__ == "__main__":
    data = list(get_dataset("m4_daily").test)
    test_data = split(data, offset=-14)[1].generate_instances(
        prediction_length=14
    )
    inputs = list(test_data.input)
    labels = list(test_data.label)

    forecasts = [
        SampleForecast(
            start_date=entry["start"],
            samples=np.random.normal(size=(100, 14)),
        )
        for entry in labels
    ]

    batch_sizes = [1, 2, 5, 10, 20, 50, 100, 200, 500]

    ref = None

    for batch_size in batch_sizes:
        t0 = timeit.default_timer()
        res = evaluate_forecasts(
            forecasts,
            test_data=test_data,
            metrics=METRICS,
            batch_size=batch_size,
            seasonality=7,
        )
        t1 = timeit.default_timer()
        if ref is None:
            ref = res
        else:
            pd.testing.assert_frame_equal(ref, res)
        print(f"evaluation ({batch_size=}): {t1 - t0}")

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup

abdulfatir · 2023-11-11T20:07:03Z

src/gluonts/model/evaluation.py

    mask_invalid_label: bool = True,
    allow_nan_forecast: bool = False,
-    seasonality: Optional[int] = None
+    seasonality: int = 1


Why is the default being changed to 1?

Good point: I had initially done this because I was also computing the seasonal error in batches, that didn’t play well with potentially different seasonalities in a batch; turns out batching there is anyway problematic because of different length, so I should indeed be able to maintain the current behavior.

abdulfatir · 2023-11-11T20:09:01Z

src/gluonts/model/evaluation.py

+    label_batches = batcher(test_data.label, batch_size=batch_size)
+    forecast_batches = batcher(forecasts, batch_size=batch_size)
+
+    pbar = tqdm()


Can we also set the total length here?

We could, but depending on what dataset type one is using, this may end up iterating the dataset to get its length. I don’t think the current code does it, so maybe let’s do it in a separate change.

I guess you could try to call len on it as long as it doesn't consume it?

Yes, that's the idea. However this may still take time (say the dataset is huge and len needs to iterate it), which I would like to avoid at least in this PR

lostella added enhancement New feature or request BREAKING This is a breaking change (one of pr required labels) labels Nov 10, 2023

lostella requested review from jaheba and abdulfatir November 10, 2023 21:13

abdulfatir reviewed Nov 11, 2023

View reviewed changes

lostella added 3 commits November 13, 2023 09:11

add batching to evaluation

ec63700

fix mypy

a6c1fb7

revert default seasonality

b64abb2

lostella force-pushed the faster-eval branch from b9ebe92 to b64abb2 Compare November 13, 2023 08:11

revert

55e6796

jaheba changed the title ~~Speed up evaluation 6x~~ Speed up evaluation by batching data. Nov 13, 2023

jaheba changed the title ~~Speed up evaluation by batching data.~~ ev: Batch data for faster evaluation. Nov 13, 2023

jaheba approved these changes Nov 13, 2023

View reviewed changes

lostella merged commit 56eec11 into awslabs:dev Nov 13, 2023
19 checks passed

lostella deleted the faster-eval branch November 13, 2023 10:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ev: Batch data for faster evaluation. #3051

ev: Batch data for faster evaluation. #3051

lostella commented Nov 10, 2023 •

edited

Loading

abdulfatir Nov 11, 2023

lostella Nov 13, 2023

lostella Nov 13, 2023

abdulfatir Nov 11, 2023

lostella Nov 13, 2023

jaheba Nov 13, 2023

lostella Nov 13, 2023

ev: Batch data for faster evaluation. #3051

ev: Batch data for faster evaluation. #3051

Conversation

lostella commented Nov 10, 2023 • edited Loading

abdulfatir Nov 11, 2023

Choose a reason for hiding this comment

lostella Nov 13, 2023

Choose a reason for hiding this comment

lostella Nov 13, 2023

Choose a reason for hiding this comment

abdulfatir Nov 11, 2023

Choose a reason for hiding this comment

lostella Nov 13, 2023

Choose a reason for hiding this comment

jaheba Nov 13, 2023

Choose a reason for hiding this comment

lostella Nov 13, 2023

Choose a reason for hiding this comment

lostella commented Nov 10, 2023 •

edited

Loading