Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move contrib metrics files #3220

Merged
merged 11 commits into from
Mar 28, 2024
Merged

Conversation

leej3
Copy link
Collaborator

@leej3 leej3 commented Mar 27, 2024

Same migration pattern as #3204:

  • Move files from contrib.metrics to metrics
  • add deprecation warnings and tests for them
  • update rst files
  • make sure all tests past
  • migrate tests from tests/ignite/contrib/metrics to metrics and update as required

references are updated as part of this PR to avoid failures in building the docs

@github-actions github-actions bot added docs module: metrics Metrics module module: contrib Contrib module examples Examples labels Mar 27, 2024
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 27, 2024

@leej3
Copy link
Collaborator Author

leej3 commented Mar 27, 2024

@leej3 can you check why https://github.com/pytorch/ignite/actions/runs/8451157627/job/23149407599?pr=3220 is failing?

looking through it now...

@github-actions github-actions bot added the module: engine Engine module label Mar 27, 2024
@leej3 leej3 force-pushed the move-contrib-metrics-files branch from 1617863 to a3dfd96 Compare March 27, 2024 13:51
@leej3
Copy link
Collaborator Author

leej3 commented Mar 27, 2024

I refactored the commits to make it easier to spot the modifications to the files that were in ignite.contrib.metrics and are now in ignite.metrics.

I can't spot what might be causing the failure of the TPU tests though. The diff is small but persists across reruns:

    def test_distrib_single_device_xla():
        device = idist.device()
>       _test_distrib_compute(device)

tests/ignite/metrics/regression/test_mean_error.py:233: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/ignite/metrics/regression/test_mean_error.py:127: in _test_distrib_compute
    _test("cpu")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

metric_device = device(type='cpu')

    def _test(metric_device):
        metric_device = torch.device(metric_device)
        m = MeanError(device=metric_device)
    
        y_pred = torch.rand(size=(100,), device=device)
        y = torch.rand(size=(100,), device=device)
    
        m.update((y_pred, y))
    
        y_pred = idist.all_gather(y_pred)
        y = idist.all_gather(y)
    
        np_y = y.cpu().numpy()
        np_y_pred = y_pred.cpu().numpy()
    
        np_sum = (np_y - np_y_pred).sum()
        np_len = len(np_y_pred)
        np_ans = np_sum / np_len
    
>       assert m.compute() == pytest.approx(np_ans)
E       assert 0.003967249393463134 == 0.003967256546020508 ± 4.0e-09
E         
E         comparison failed
E         Obtained: 0.003967249393463134
E         Expected: 0.003967256546020508 ± 4.0e-09

@github-actions github-actions bot added the ci CI label Mar 27, 2024
this test fails intermittently. the main difference between this branch and master is the a chance
difference in the order the tests are run which reliably triggers the failure.
@leej3 leej3 force-pushed the move-contrib-metrics-files branch from 79ef36f to f08dfd3 Compare March 27, 2024 15:14
@leej3
Copy link
Collaborator Author

leej3 commented Mar 27, 2024

I didn't discover anything especially useful from ssh-ing into the tpu tests machine. Some of the tests in ignite/metrics/regression fail intermittently on master and this branch. The failures are deterministic but failures are triggered by things like the order the tests are run. For example running pytest regression/test_mean_error.py reliably passes on both branches but pytest regression reliably fails on both branchs.

I have adjusted the tolerance of the comparisons to make it pass more reliably.

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, John! LGTM

@vfdev-5 vfdev-5 merged commit b8fc451 into pytorch:master Mar 28, 2024
19 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci CI docs examples Examples module: contrib Contrib module module: engine Engine module module: metrics Metrics module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants