Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignorance Score binning schemes #21

Open
kvelleby opened this issue Oct 12, 2023 · 2 comments
Open

Ignorance Score binning schemes #21

kvelleby opened this issue Oct 12, 2023 · 2 comments

Comments

@kvelleby
Copy link
Contributor

One comment in the workshop was that we should be looking into different binning schemes for the country- and grid-level.

I think this is a fair comment. The current binning scheme is equal for both levels (and hard coded in evaluate_submissions.py, you can vary it using CompetitionEvaluation.calculate_metrics):

bins = [0, 0.5, 2.5, 5.5, 10.5, 25.5, 50.5, 100.5, 250.5, 500.5, 1000.5]

At the country-level, stopping the binning at 1000 does sound a bit small. We would want models that could differentiate between a conflict with 1000 deaths and 100 000 deaths. Differentiating between 1-2, 3-5, 6-10, and 11-25 at the country-level might be a really unfair challenge.

At the priogrid-level, it could also be argued whether it is important to punish a model predicting 3 fatalities for not predicting 1 or 6. The issue is of course always there on the bin edges, but they are particularly visible when the binning range is small.

@hhegre
Copy link
Collaborator

hhegre commented Oct 13, 2023

We have been discussing this at length, and the logic of the scheme is that it is a roughly even distribution on a log scale, something I find to be optimal (and for that reason the challenge is not unfair; the vast majority of observations also fall in the lower bins). But I agree we might want to have more bins at the right end. Calculating the number of actuals in each bin would be good as part of a justification.

I would not mind specifying a supplementary version of this with another binning scheme, of course.

@kvelleby
Copy link
Contributor Author

Another possibility is to use the bayesian blocking approach on historical data to define bins for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants