Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sync_all_reduce to consider update->compute->update case #2803

Conversation

sadra-barikbin
Copy link
Collaborator

@sadra-barikbin sadra-barikbin commented Dec 21, 2022

Problem Description:
We get wrong result in the situation below, given that we are in a distributed config and SomeMetric has decorated its compute method with sync_all_reduce:

metric = SomeMetric() 
metric.update(y_pred, y)
correct_result = metric.compute()
metric.update(y_pred, y)
wrong_result = metric.compute()

This is the case because sync_all_reduce changes the state attributes to the accumulated values in all ranks, making them ready to call compute and calling update again doesn't make sense.

Solution (By @vfdev-5 ):
In sync_all_reduce, store the state attributes in temporary variables. Then do the collective ops on the state attributes. After that, call the decorated function. Finally, restore the state attributes and return the result of the decorated function.

  • New tests are added (if a new feature is added)
  • New doc strings: description and/or example code are in RST format
  • Documentation is updated (if required)

@github-actions github-actions bot added the module: metrics Metrics module label Dec 21, 2022
Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sadra-barikbin let's write a test that fails on master and is fixed with this PR. In the test we have to check Metric attributes of type: tensor and scalar

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Feb 16, 2023

@sadra-barikbin the code is definitely broken, no point to update the branch every time

tests/ignite/metrics/test_accuracy.py Show resolved Hide resolved
tests/ignite/metrics/test_accuracy.py Outdated Show resolved Hide resolved
tests/ignite/metrics/test_accuracy.py Show resolved Hide resolved
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Feb 16, 2023

@sadra-barikbin let's also remove self._is_reduced as now we do not need it anymore.

@sadra-barikbin
Copy link
Collaborator Author

Shall I remove reinit__is_reduced as well?

@sadra-barikbin
Copy link
Collaborator Author

Or you keep it for BC and just set _result to none if it exists?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Feb 16, 2023

Or you keep it for BC and just set _result to none if it exists?

Yes, we should keep it for BC and also for reseting cached value. Later we can rename it with deprecation cycles etc.

From now on, compute after compute does the whole thing again
…to-consider-compute-update-compute-case' into Fix-sync_all_reduce-decorator-to-consider-compute-update-compute-case
ignite/metrics/metric.py Outdated Show resolved Hide resolved
sadra-barikbin and others added 2 commits February 17, 2023 15:31
Co-authored-by: vfdev <vfdev.5@gmail.com>
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Feb 17, 2023

@sadra-barikbin please resolve the conflict for this PR

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @sadra-barikbin !

@vfdev-5 vfdev-5 merged commit 67c6709 into pytorch:master Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: metrics Metrics module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants