Updated precision_recall_curve.py #2490

sayantan1410 · 2022-02-24T17:40:27Z

Updated the file precision_recall_curve.py file.
For testing I have ran the tests in the tests/ignite/contrib/metrics/test_precision_recall_curve.py and all of them passed but haven't separately tested in distributed environment.
It could contain some issues though.

ignite/contrib/metrics/precision_recall_curve.py

sdesrozis · 2022-02-24T19:04:39Z

@sayantan1410 Thanks ! I left a few comments.

sayantan1410 · 2022-02-24T20:21:19Z

@sdesrozis Corrected !! Please check.
Also, to solve the DDP issue, I have to write the module from scratch and use the reset, compute and update methods along with the necessary decorators right ?

sdesrozis · 2022-02-24T20:25:11Z

Good ! Although, the issue is not solved. Actually, the tests should be refactored as cohen kappa ones. You can see the difference, it misses ddp tests. Could you try do improve the tests ?

sayantan1410 · 2022-02-24T20:26:57Z

Yes, Sure I can do that, no problem !!

sdesrozis · 2022-02-24T20:28:59Z

Yes, Sure I can do that, no problem !!

Thanks ! It should be very similar to tests/ignite/contrib/metrics/test_cohen_kappa.py !

ignite/contrib/metrics/precision_recall_curve.py

sayantan1410 · 2022-02-27T04:51:47Z

@vfdev-5 I have updated the epoch_metric, so that it can broadcast tuples. Let me know what to correct.

ignite/contrib/metrics/precision_recall_curve.py

ignite/metrics/epoch_metric.py

sdesrozis · 2022-02-27T06:11:00Z

@sayantan1410 Thanks for the update. You should modify the tests to validate your improvement. See this file tests/ignite/metrics/test_epoch_metric.py

ignite/metrics/epoch_metric.py

sayantan1410 · 2022-03-02T11:45:25Z

@sayantan1410 Thanks for the update. You should modify the tests to validate your improvement. See this file tests/ignite/metrics/test_epoch_metric.py

Okay will check it out.

… feature

tests/ignite/contrib/metrics/test_precision_recall_curve.py

sayantan1410 · 2022-03-06T14:08:29Z

Hey, Can you tell why all the tests are not done when autopep8 makes a commit.

sdesrozis · 2022-03-06T18:37:38Z

I don't know. Did you apply all the tools (as blake, flake, etc.) before pushing ?

Anyway, have a look here

https://github.com/pytorch/ignite/runs/5441042415?check_suite_focus=true

It seems that a few things does not work fine in the code. Please format correctly the code and push.

sayantan1410 · 2022-03-07T07:21:34Z

It is asking me to add a blank line between line 162 and 163 in test_precision_recall_curve.py but that has been added by the autopep8 commit just after that, but I cannot figure out how to run the check for that commit. The problem raised as I haven't formatted the code with lint before pushing it.

sayantan1410 · 2022-03-07T15:04:38Z

@sdesrozis @vfdev-5 The windows, TPU and macos tests are successful in distributed environment but the ubuntu and hvd tests are failing, can you help me with this ?

sdesrozis · 2022-03-07T15:16:02Z

@sayantan1410 I think your tests are failing somewhere. The test test_precision_recall_curve::test_distrib_gloo_cpu_or_gpu is the first that fails, and since it's a distributed one, it causes all others distributed tests to fail.

Did you try to launch the parallel tests on your own machine ?

sdesrozis · 2022-03-07T16:21:51Z

I pushed a fix (I hope). I used safe mode of idist.broadcast() and np arrays were converted before.

Let's see.

sdesrozis · 2022-03-07T17:16:33Z

@sayantan1410 Could you fix the tests ? Results are not equal to sklearn because tensors were not converted to numpy in asserts. Thanks !

sayantan1410 · 2022-03-07T17:50:53Z

Yes, and the distributed tests have passed as well locally in CPU, but I cannot run the tests on GPU as I do not have GPU in my machine.

@sayantan1410 I think your tests are failing somewhere. The test test_precision_recall_curve::test_distrib_gloo_cpu_or_gpu is the first that fails, and since it's a distributed one, it causes all others distributed tests to fail.

Did you try to launch the parallel tests on your own machine ?

sayantan1410 · 2022-03-07T17:51:18Z

@sayantan1410 Could you fix the tests ? Results are not equal to sklearn because tensors were not converted to numpy in asserts. Thanks !

Thank you so much for the help, I will surely do that.

sdesrozis · 2022-03-07T18:13:18Z

Yes, and the distributed tests have passed as well locally in CPU, but I cannot run the tests on GPU as I do not have GPU in my machine.

You didn't run distributed tests because they didn't work. No GPU needed. In the tests folder, you must run the run_cpu_tests.py locally instead of pytest. Explore the script to catch how run distributed tests 😉

sdesrozis

@sayantan1410 You did it ! Thanks a lot for this work !

vfdev-5 · 2022-03-08T08:54:34Z

@sayantan1410 @sdesrozis there is an issue with this PR, check this GPU CI job: https://app.circleci.com/pipelines/github/pytorch/ignite/2510/workflows/ced3dff0-05c4-4c92-b695-9c325b48af93/jobs/7720

____________________________ test_distrib_nccl_gpu _____________________________

distributed_context_single_node_nccl = {'local_rank': 0, 'rank': 0, 'world_size': 1}

    @pytest.mark.distributed
    @pytest.mark.skipif(not idist.has_native_dist_support, reason="Skip if no native dist support")
    @pytest.mark.skipif(torch.cuda.device_count() < 1, reason="Skip if no GPU")
    def test_distrib_nccl_gpu(distributed_context_single_node_nccl):
    
        device = idist.device()
>       _test_distrib_compute(device)

tests/ignite/contrib/metrics/test_precision_recall_curve.py:251: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/ignite/contrib/metrics/test_precision_recall_curve.py:191: in _test_distrib_compute
    _test(y_pred, y, batch_size, idist.device())
tests/ignite/contrib/metrics/test_precision_recall_curve.py:168: in _test
    res = prc.compute()
ignite/contrib/metrics/precision_recall_curve.py:103: in compute
    precision, recall, thresholds = self.compute_fn(_prediction_tensor, _target_tensor)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

y_preds = tensor([1, 1, 0, 1, 1, 0, 1, 1, 0, 0], device='cuda:0')
y_targets = tensor([0, 1, 0, 1, 1, 1, 0, 1, 0, 0], device='cuda:0')

    def precision_recall_curve_compute_fn(y_preds: torch.Tensor, y_targets: torch.Tensor) -> Tuple[Any, Any, Any]:
        try:
            from sklearn.metrics import precision_recall_curve
        except ImportError:
            raise RuntimeError("This contrib module requires sklearn to be installed.")
    
>       y_true = y_targets.numpy()
E       TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

ignite/contrib/metrics/precision_recall_curve.py:16: TypeError
_________________________ test_distrib_gloo_cpu_or_gpu _________________________

distributed_context_single_node_gloo = {'local_rank': 0, 'rank': 0, 'world_size': 1}

    @pytest.mark.distributed
    @pytest.mark.skipif(not idist.has_native_dist_support, reason="Skip if no native dist support")
    def test_distrib_gloo_cpu_or_gpu(distributed_context_single_node_gloo):
    
        device = idist.device()
>       _test_distrib_compute(device)

tests/ignite/contrib/metrics/test_precision_recall_curve.py:260: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/ignite/contrib/metrics/test_precision_recall_curve.py:191: in _test_distrib_compute
    _test(y_pred, y, batch_size, idist.device())
tests/ignite/contrib/metrics/test_precision_recall_curve.py:168: in _test
    res = prc.compute()
ignite/contrib/metrics/precision_recall_curve.py:103: in compute
    precision, recall, thresholds = self.compute_fn(_prediction_tensor, _target_tensor)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

y_preds = tensor([1, 1, 0, 1, 1, 0, 1, 1, 0, 0], device='cuda:0')
y_targets = tensor([0, 1, 0, 1, 1, 1, 0, 1, 0, 0], device='cuda:0')

    def precision_recall_curve_compute_fn(y_preds: torch.Tensor, y_targets: torch.Tensor) -> Tuple[Any, Any, Any]:
        try:
            from sklearn.metrics import precision_recall_curve
        except ImportError:
            raise RuntimeError("This contrib module requires sklearn to be installed.")
    
>       y_true = y_targets.numpy()
E       TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

sdesrozis · 2022-03-08T08:55:26Z

I'm on it.

vfdev-5

@sayantan1410 thanks for the PR and sorry for delay. Please send another one to fix my review comments.

vfdev-5 · 2022-03-08T09:47:54Z

ignite/contrib/metrics/precision_recall_curve.py

+
+    def compute(self) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
+        if len(self._predictions) < 1 or len(self._targets) < 1:
+            raise NotComputableError("EpochMetric must have at least one example before it can be computed.")


- "EpochMetric must have ..." + "PrecisionRecallCurve must have ..."

Fixing it soon.

vfdev-5 · 2022-03-08T09:48:43Z

ignite/contrib/metrics/precision_recall_curve.py

+            precision = torch.Tensor(precision)
+            recall = torch.Tensor(recall)
+            # thresholds can have negative strides, not compatible with torch tensors
+            # https://discuss.pytorch.org/t/negative-strides-in-tensor-error/134287/2
+            thresholds = torch.Tensor(thresholds.copy())


Tensor creation should be done with torch.tensor and not torch.Tensor

Okay, will change it.

sayantan1410 · 2022-03-08T10:33:58Z

@sdesrozis Thank you for all the help, got to learn a lot !!

vfdev-5 · 2022-03-08T11:44:12Z

tests/ignite/contrib/metrics/test_precision_recall_curve.py

+
+def _test_distrib_integration(device):
+
+    rank = idist.get_rank()


This test case does not use rank and each process generates some random preds and true values per distributed process. PrecisionRecallCurve implementation should gather all data from all processes but it is checked against local process computation:

np_y_true = y_true.cpu().numpy().ravel() np_y_preds = y_preds.cpu().numpy().ravel() sk_precision, sk_recall, sk_thresholds = precision_recall_curve(np_y_true, np_y_preds)

I would assume this to fail but looks like it is passing. @sayantan1410 can you check why it is so ?

Yeah sur will check it !!

Actually the error comes from the following test which seems wrong (and need a fix too)

ignite/tests/ignite/contrib/metrics/regression/test_median_absolute_error.py

Line 152 in 90d79aa

def _test_distrib_integration(device):

@sayantan1410 Have a look to this correct implementation

ignite/tests/ignite/contrib/metrics/regression/test_mean_error.py

Line 139 in 90d79aa

def _test(n_epochs, metric_device):

@sdesrozis Will check it soon !!

@sdesrozis A small question,

ignite/tests/ignite/contrib/metrics/regression/test_mean_error.py

Line 139 in 90d79aa

def _test(n_epochs, metric_device):

Here, also we don't have something like idist.all_gather so how is the data getting gathered from all the processes ?

sayantan1410 and others added 2 commits February 24, 2022 23:06

updated precision_recall_curve.py

d01467a

autopep8 fix

c5e2757

github-actions bot added the module: contrib Contrib module label Feb 24, 2022

removed unsed imports

0ce730f

sdesrozis reviewed Feb 24, 2022

View reviewed changes