update epoch metrics to use collections #1758

Moh-Yakoub · 2021-03-08T23:55:27Z

Description: As title. The main idea is to allow the epochmetrics to use collections of tensors

This is a WIP to gather feedback about the approach, once it's approved I will clean up the implementation and implement more test cases

Check list:

New tests are added (if a new feature is added)
New doc strings: description and/or example code are in RST format
Documentation is updated (if required)

vfdev-5

Thanks for the PR @Moh-Yakoub ! The approach is good, let's simplify few things and it will be ok

ignite/metrics/epoch_metric.py

ignite/utils.py

Moh-Yakoub · 2021-03-14T15:56:50Z

@vfdev-5 I've noticed that specifying the output type of the sequence/mapping caused all tests to fail because of the following
TypeError: 'ABCMeta' object is not subscriptable. any idea what maybe causing this?

I am not able to reproduce locally, all my tests pass locally

vfdev-5 · 2021-03-15T15:38:03Z

@vfdev-5 I've noticed that specifying the output type of the sequence/mapping caused all tests to fail because of the following
TypeError: 'ABCMeta' object is not subscriptable. any idea what maybe causing this?

I am not able to reproduce locally, all my tests pass locally

Maybe, this is the answer : #1758 (comment)
otherwise, please try to google that.

vfdev-5 · 2021-03-15T20:25:55Z

@Moh-Yakoub can we fix this one in priority, please ?

…into epoch_metrics_tensors

ignite/metrics/epoch_metric.py

Moh-Yakoub · 2021-03-16T22:43:36Z

@vfdev-5 I am getting a lot of RuntimeError: connect() timed out. test failures and another one Engine run is terminating due to exception: Mismatched data types: One rank had type int64, but another rank had type float32. on Horovod. Do you suspect it's related to the broadcasting code?

vfdev-5 · 2021-03-16T22:56:10Z

@Moh-Yakoub yes, failure is real and actually we are wrong with the implementation when using broadcast on tensors.

The issue is with data types:

        result = 0.0  # <----- all procs has result as float scalar
        if idist.get_rank() == 0:
            # Run compute_fn on zero rank only
            result = self.compute_fn(_prediction_tensor, _target_tensor)  # <---- now result is a tensor (list of tensors, etc) on rank 0

        # compute_fn outputs: scalars, tensors, tuple/list/mapping of tensors.
        if not _is_scalar_or_collection_of_tensor(result):
            raise TypeError(
                "output not supported: compute_fn should return scalar, tensor, tuple/list/mapping of tensors"
            )

        if ws > 1:
            # broadcast result to all processes
            # !!!! BUG : WE TRY TO BROADCAST TENSOR TO A SCALAR PLACEHOLDERS FOR OTHER PROCS...
            return apply_to_type(  # type: ignore
                result, (torch.Tensor, float, int), partial(idist.broadcast, src=0),
            )

I have to think about that...

sdesrozis · 2021-03-22T18:17:50Z

@Moh-Yakoub yes, failure is real and actually we are wrong with the implementation when using broadcast on tensors.

I understand the need but I’m quite surprised. It means that every process would not know the handled type. It looks weird to me. Maybe it is because I’m familiar with strongly typed languages.

Could the return type of self.compute_fn be a union of possible types ?

vfdev-5 · 2021-03-22T18:22:02Z

Could the return type of self.compute_fn be a union of possible types ?

@sdesrozis it is a union of known types: a scalar or a sequence/mapping/tuple of tensors.

ignite/metrics/epoch_metric.py

vfdev-5 · 2021-04-20T23:11:01Z

@Moh-Yakoub I merged this PR #1839 and it should unblock this PR. So, we can now write safely like that :

        result = None
        if idist.get_rank() == 0:
            # Run compute_fn on zero rank only
            result = self.compute_fn(_prediction_tensor, _target_tensor)

        # compute_fn outputs: scalars, tensors, tuple/list/mapping of tensors.
        if not _is_scalar_or_collection_of_tensor(result):
            raise TypeError(
                "output not supported: compute_fn should return scalar, tensor, tuple/list/mapping of tensors"
            )

        if ws > 1:
            # broadcast result to all processes
            return apply_to_type(  # type: ignore
                result, (torch.Tensor, float, int), partial(idist.broadcast, src=0, safe_mode=True),
            )

Moh-Yakoub · 2021-04-25T22:11:22Z

@vfdev-5 Thanks a lot for the info and sorry for the late reply. I will work on this now.

vfdev-5

@Moh-Yakoub I saw you updated the code according to #1758 (comment) but now it still does not use safe_mode to broadcast... Please, update the PR and remove the comments. Thanks

tests/ignite/contrib/metrics/test_average_precision.py

tests/ignite/contrib/metrics/test_cohen_kappa.py

tests/ignite/contrib/metrics/test_precision_recall_curve.py

Moh-Yakoub · 2021-04-28T22:18:19Z

@vfdev-5 I've already removed the comments. related to broadcasts

I'm already using

return apply_to_type(  # type: ignore
                result, (torch.Tensor, float, int), partial(idist.broadcast, src=0, safe_mode=True),

So the safe_mode is passed. Is there anything else to pass it to the broadcast method

vfdev-5 · 2021-04-28T22:46:15Z

@Moh-Yakoub any ideas why CI is still failing ?

vfdev-5 · 2021-04-28T22:46:39Z

tests/ignite/contrib/metrics/test_roc_curve.py

@@ -22,7 +22,8 @@ def test_no_sklearn(mock_no_sklearn):
        RocCurve()


-def test_roc_curve():
+# TODO uncomment those once #1700 is merge


please, remove all these comments !

vfdev-5 · 2021-04-28T22:51:25Z

tests/ignite/contrib/metrics/test_average_precision.py

@@ -287,7 +287,7 @@ def test_distrib_gpu(distributed_context_single_node_nccl):

 @pytest.mark.distributed
 @pytest.mark.skipif(not idist.has_native_dist_support, reason="Skip if no native dist support")
-def test_distrib_cpu(distributed_context_single_node_gloo):
+def _test_distrib_cpu(distributed_context_single_node_gloo):


This was a temp way to disable test, let's enable those tests once the CI is passing on epoch metric distrib tests.

vfdev-5 · 2021-04-28T22:51:34Z

tests/ignite/contrib/metrics/test_cohen_kappa.py

@@ -282,7 +282,7 @@ def test_distrib_gpu(distributed_context_single_node_nccl):

 @pytest.mark.distributed
 @pytest.mark.skipif(not idist.has_native_dist_support, reason="Skip if no native dist support")
-def test_distrib_cpu(distributed_context_single_node_gloo):
+def _test_distrib_cpu(distributed_context_single_node_gloo):


vfdev-5 · 2021-04-28T22:51:49Z

tests/ignite/contrib/metrics/test_precision_recall_curve.py

@@ -25,7 +25,7 @@ def test_no_sklearn(mock_no_sklearn):
        pr_curve.compute()


-def test_precision_recall_curve():
+def _test_precision_recall_curve():


vfdev-5 · 2021-04-28T22:55:04Z

ignite/metrics/epoch_metric.py

+        # compute_fn outputs: scalars, tensors, tuple/list/mapping of tensors.
+        if not _is_scalar_or_collection_of_tensor(result):
+            raise TypeError(
+                "output not supported: compute_fn should return scalar, tensor, tuple/list/mapping of tensors"


Suggested change

"output not supported: compute_fn should return scalar, tensor, tuple/list/mapping of tensors"

"output not supported: compute_fn should return scalar, tensor, tuple/list/mapping of tensors, "

f"got {type(result)}"

vfdev-5 · 2021-04-28T22:55:46Z

ignite/metrics/epoch_metric.py

+        # compute_fn outputs: scalars, tensors, tuple/list/mapping of tensors.
+        if not _is_scalar_or_collection_of_tensor(result):


This check should be inside if idist.get_rank() == 0: I think

vfdev-5 · 2021-05-08T21:46:23Z

@Moh-Yakoub do you plan to finish with this PR ?

Moh-Yakoub · 2021-07-29T23:22:46Z

@Moh-Yakoub do you plan to finish with this PR ?

@vfdev-5 this PR have lagged behind now, Are there other PRs that have solved the issue, otherwise I can continue this one?

vfdev-5 · 2021-07-29T23:31:40Z

@Moh-Yakoub actually we figured out that this is not possible to do what we want here like that. I asked @KickItLikeShika to work on a simplified version of this feature. I propose you to close this one and maybe tackle something else from the list: https://github.com/pytorch/ignite/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22

What do you think ?

Moh-Yakoub · 2021-07-30T19:00:48Z

@Moh-Yakoub actually we figured out that this is not possible to do what we want here like that. I asked @KickItLikeShika to work on a simplified version of this feature. I propose you to close this one and maybe tackle something else from the list: https://github.com/pytorch/ignite/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22

What do you think ?

Sure that sounds good, closing this now.

update epoch metrics to use collections

52d8918

github-actions bot added the module: metrics Metrics module label Mar 8, 2021

vfdev-5 reviewed Mar 9, 2021

View reviewed changes

ignite/metrics/epoch_metric.py Outdated Show resolved Hide resolved

ignite/metrics/epoch_metric.py Outdated Show resolved Hide resolved

Moh-Yakoub and others added 7 commits March 9, 2021 00:34

method refactor

d8063f8

fix style issue + add more tests

c4efbeb

disable failing tests temporarily

50daa98

autopep8 fix

2aa5db6

Merge branch 'master' into epoch_metrics_tensors

8df252f

update failing tests

bb932d7

update failing tests

8f1f11c

github-actions bot added the module: utils Utils module label Mar 11, 2021

Moh-Yakoub force-pushed the epoch_metrics_tensors branch from f9a3f45 to 15813c3 Compare March 11, 2021 21:27

Moh-Yakoub and others added 4 commits March 12, 2021 18:45

update tests

8dfc3d6

Merge branch 'master' into epoch_metrics_tensors

f86d66a

Merge branch 'master' into epoch_metrics_tensors

51ee7e6

fix return type + adding proper cast

fca8edc

vfdev-5 reviewed Mar 13, 2021

View reviewed changes

ignite/utils.py Outdated Show resolved Hide resolved

simplify type checking

a6935e8

github-actions bot removed the module: utils Utils module label Mar 13, 2021

Merge branch 'master' into epoch_metrics_tensors

c29e882

fix typing issue

109d911

Moh-Yakoub changed the title ~~[WIP] update epoch metrics to use collections~~ update epoch metrics to use collections Mar 15, 2021

Moh-Yakoub added 3 commits March 15, 2021 21:13

Merge branch 'master' into epoch_metrics_tensors

13b1f8f

start checks

b2df286

Merge branch 'epoch_metrics_tensors' of github.com:Moh-Yakoub/ignite …

77960de

…into epoch_metrics_tensors

vfdev-5 reviewed Mar 15, 2021

View reviewed changes

ignite/metrics/epoch_metric.py Show resolved Hide resolved

ignite/metrics/epoch_metric.py Outdated Show resolved Hide resolved

fix apply_to_type

279f33c

Merge branch 'master' into epoch_metrics_tensors

35d9c49

vfdev-5 mentioned this pull request Mar 22, 2021

Added safe_mode for Idist broadcast #1839

Merged

3 tasks

KickItLikeShika reviewed Mar 22, 2021

View reviewed changes

ignite/metrics/epoch_metric.py Show resolved Hide resolved

Moh-Yakoub added 5 commits April 25, 2021 23:15

merge master

afc6861

update broadcast

d802ef8

merge conflicts

1e210fd

disable failing test temporarily

e502d24

trigger build

1707785

vfdev-5 reviewed Apr 26, 2021

View reviewed changes

Moh-Yakoub added 2 commits April 28, 2021 23:12

Merge branch 'master' into epoch_metrics_tensors

dc16b4c

remove todo statements

f5f4e3f

vfdev-5 reviewed Apr 28, 2021

View reviewed changes

Moh-Yakoub closed this Jul 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update epoch metrics to use collections #1758

update epoch metrics to use collections #1758

Moh-Yakoub commented Mar 8, 2021

vfdev-5 left a comment

Moh-Yakoub commented Mar 14, 2021

vfdev-5 commented Mar 15, 2021

vfdev-5 commented Mar 15, 2021

Moh-Yakoub commented Mar 16, 2021

vfdev-5 commented Mar 16, 2021 •

edited

Loading

sdesrozis commented Mar 22, 2021 •

edited

Loading

vfdev-5 commented Mar 22, 2021 •

edited

Loading

vfdev-5 commented Apr 20, 2021

Moh-Yakoub commented Apr 25, 2021

vfdev-5 left a comment

Moh-Yakoub commented Apr 28, 2021

vfdev-5 commented Apr 28, 2021

vfdev-5 Apr 28, 2021

vfdev-5 Apr 28, 2021

vfdev-5 Apr 28, 2021

vfdev-5 Apr 28, 2021

vfdev-5 Apr 28, 2021

vfdev-5 Apr 28, 2021

vfdev-5 commented May 8, 2021 •

edited

Loading

Moh-Yakoub commented Jul 29, 2021

vfdev-5 commented Jul 29, 2021 •

edited

Loading

Moh-Yakoub commented Jul 30, 2021

	"output not supported: compute_fn should return scalar, tensor, tuple/list/mapping of tensors"
	"output not supported: compute_fn should return scalar, tensor, tuple/list/mapping of tensors, "
	f"got {type(result)}"

		# compute_fn outputs: scalars, tensors, tuple/list/mapping of tensors.
		if not _is_scalar_or_collection_of_tensor(result):

update epoch metrics to use collections #1758

update epoch metrics to use collections #1758

Conversation

Moh-Yakoub commented Mar 8, 2021

vfdev-5 left a comment

Choose a reason for hiding this comment

Moh-Yakoub commented Mar 14, 2021

vfdev-5 commented Mar 15, 2021

vfdev-5 commented Mar 15, 2021

Moh-Yakoub commented Mar 16, 2021

vfdev-5 commented Mar 16, 2021 • edited Loading

sdesrozis commented Mar 22, 2021 • edited Loading

vfdev-5 commented Mar 22, 2021 • edited Loading

vfdev-5 commented Apr 20, 2021

Moh-Yakoub commented Apr 25, 2021

vfdev-5 left a comment

Choose a reason for hiding this comment

Moh-Yakoub commented Apr 28, 2021

vfdev-5 commented Apr 28, 2021

vfdev-5 Apr 28, 2021

Choose a reason for hiding this comment

vfdev-5 Apr 28, 2021

Choose a reason for hiding this comment

vfdev-5 Apr 28, 2021

Choose a reason for hiding this comment

vfdev-5 Apr 28, 2021

Choose a reason for hiding this comment

vfdev-5 Apr 28, 2021

Choose a reason for hiding this comment

vfdev-5 Apr 28, 2021

Choose a reason for hiding this comment

vfdev-5 commented May 8, 2021 • edited Loading

Moh-Yakoub commented Jul 29, 2021

vfdev-5 commented Jul 29, 2021 • edited Loading

Moh-Yakoub commented Jul 30, 2021

vfdev-5 commented Mar 16, 2021 •

edited

Loading

sdesrozis commented Mar 22, 2021 •

edited

Loading

vfdev-5 commented Mar 22, 2021 •

edited

Loading

vfdev-5 commented May 8, 2021 •

edited

Loading

vfdev-5 commented Jul 29, 2021 •

edited

Loading