-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added brier score for p-metrics in forecasting #209
Conversation
Thanks for the PR, @jagjeet-singh. Would you mind adding a few sentences here explaining the motivation for the change, and links to any relative literature? Maybe those could go into the docstrings, as well? Which scenarios do we have tested so far? |
} | ||
assert_metrics_almost_equal(expected_metrics, metrics) | ||
|
||
# Case 5: Top-2, 2 forecast, uniform probabilities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably would be cleaner to make this 5 separate tests, with a pytest fixture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was on the fence about it but ended up refactoring the tests anyway. It's more organized now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the push.
"minFDE": 1.0, | ||
"DAC": 1.0, | ||
"MR": 0.0, | ||
"p-minADE": 1.16, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these numbers seem a bit hard to work out by hand -- can we come up with a simple example, e.g. straight lines or something, where we can work it out by hand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the final metric values need not be easy to work out by hand as long as the test example is self-explanatory. The test example covers a few cases which straight-line examples will miss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this particular case can even be worked out by hand. The forecast and ground truth are 1.0m apart in each frame. This is not any more complicated than straight lines.
The other case where the forecast is a turning trajectory might not be that straightforward.
) | ||
+ curr_min_fde | ||
) | ||
prob_min_ade.append((1 - pruned_probabilities[min_idx]) ** 2 + curr_min_ade) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what was the rationale for using -log before?
maybe a reference to papers that use Brier would be useful here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a reference to the original brier score in the PR description. To my knowledge, no one has used brier score with discplacent error. So there's not reference for that.
Brier score is a commonly used metric for probability calibration.
6f9f275
to
08e1454
Compare
08e1454
to
11943bf
Compare
This PR adds brier score based metric to forecasting. The two new metrics are:
brier-minFDE = (1.0-p)^2 + minFDE
andbrier-minADE = (1.0-p)^2 + minADE
Motivation for using brier score for assessing uncertainty:
Top methods from eval ai leaderboard: