-
Notifications
You must be signed in to change notification settings - Fork 761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ev: Batch data for faster evaluation. #3051
Conversation
src/gluonts/model/evaluation.py
Outdated
mask_invalid_label: bool = True, | ||
allow_nan_forecast: bool = False, | ||
seasonality: Optional[int] = None | ||
seasonality: int = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the default being changed to 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point: I had initially done this because I was also computing the seasonal error in batches, that didn’t play well with potentially different seasonalities in a batch; turns out batching there is anyway problematic because of different length, so I should indeed be able to maintain the current behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
label_batches = batcher(test_data.label, batch_size=batch_size) | ||
forecast_batches = batcher(forecasts, batch_size=batch_size) | ||
|
||
pbar = tqdm() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also set the total length here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but depending on what dataset type one is using, this may end up iterating the dataset to get its length. I don’t think the current code does it, so maybe let’s do it in a separate change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you could try to call len
on it as long as it doesn't consume it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the idea. However this may still take time (say the dataset is huge and len
needs to iterate it), which I would like to avoid at least in this PR
b9ebe92
to
b64abb2
Compare
Description of changes: Add batching to
gluonts.model.evaluation
, reaching speedups of ~6x in some cases (speedup depends also on how many metrics are being evaluated, more metrics => more speedup). The following example is usingm4_daily
data:Code:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup