Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework ev. #2914

Merged
merged 9 commits into from
Jun 14, 2023
Merged

Rework ev. #2914

merged 9 commits into from
Jun 14, 2023

Conversation

jaheba
Copy link
Contributor

@jaheba jaheba commented Jun 12, 2023

Changes:

  • All metrics now have canonical names, for example RMSE now is RMSE[mean], indicating that we compare against the mean prediction.
  • Renamed Evaluator to Metric, and Metric to MetricDefinition.
  • Add two classes MetricCollection and MetricDefinitionCollection to group respective objects.
  • Add evaluate function to run evaluations on metrics end to end.

Usage:

from gluonts.ev import nd, rmse, evaluate

evaluate(nd + rmse, data_batches, axis=1)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup

@lostella lostella added the BREAKING This is a breaking change (one of pr required labels) label Jun 12, 2023
@lostella
Copy link
Contributor

Do we need 3 "layers" for this: MetricDefinition, Metric, and Evaluator. In a sense, the Evaluator appears to be a collection of Metric objects, just like MetricCollection is a collection ("sum") of MetricDefinition, so I'm wondering if everything could be more compact.

@jaheba
Copy link
Contributor Author

jaheba commented Jun 13, 2023

I think we need it.

Otherwise we loose the flexibility that we wanted in the first place.

A derived metric updates multiple, simpler metrics independently and in
the end combines their results as defined in `post_process`."""

evaluators: Dict[str, Metric]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would metrics be a better name, to avoid confusion?

Comment on lines 54 to 56
def update_all(self, stream: Iterator[Mapping[str, np.ndarray]]) -> None:
for element in stream:
self.update(element)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could return self maybe

Comment on lines 112 to 115
def evaluate(metrics, data_batches, axis=None):
evaluator = metrics(axis)
evaluator.update_all(data_batches)
return evaluator.get()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If update_all returned self, this would be metrics(axis).update_all(data_batches).get()

Comment on lines 50 to 54
def update(self, data: Mapping[str, np.ndarray]) -> None:
for metric in self.metrics:
metric.update(data)

def update_all(self, stream: Iterator[Mapping[str, np.ndarray]]) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods could use a docstring I guess

@@ -34,115 +43,250 @@
)


@dataclass
class MetricCollection:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this have Metric as base class?

@jaheba jaheba merged commit 1c383af into awslabs:dev Jun 14, 2023
@jaheba jaheba deleted the rework-ev branch June 14, 2023 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BREAKING This is a breaking change (one of pr required labels)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants