-
-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed test in test_precision_recall_curve and tests/ignite/contrib/metrics/regression #2511
Conversation
@sayantan1410 PR seems good. However we need to understand why the tests were passing previously. Please fetch all the data and see why it was passing. Thanks |
I think there are some others tests broken like this one in regression. |
@vfdev-5 Yeah trying to look for the reason !! |
@sdesrozis Should I change them in this PR only ? |
@sayantan1410 It would be great if you could look for others corrupted tests and fix them if you find any. I would say it's mainly in contrib/metrics/regression. You will just have to rename the PR if others fixes are done. Thanks a lot for your help ! |
Yeah sure will do that. |
Probable reason why
There is a difference is value between precision, recall, thresholds and sk_precision, sk_recall and sk_thresolds in a distributed configuration, however, the difference was small and pytest.approx was taking care of it so that tests were passing. Code to reproduce: import torch
import ignite.distributed as idist
from ignite.exceptions import NotComputableError
from ignite.metrics import EpochMetric
from ignite.contrib.metrics import PrecisionRecallCurve
from sklearn.metrics import precision_recall_curve
from ignite.engine import Engine
rank = idist.get_rank()
torch.manual_seed(12)
device = idist.device()
def _test(n_epochs, metric_device):
metric_device = torch.device(metric_device)
n_iters = 80
size = 151
offset = n_iters * size
y_true = torch.randint(0, 2, size=(offset * idist.get_world_size(),)).to(device)
y_preds = torch.randint(0, 2, size=(offset * idist.get_world_size(),)).to(device)
def update(engine, i):
return (
y_preds[i * size + rank * offset : (i + 1) * size + rank * offset],
y_true[i * size + rank * offset : (i + 1) * size + rank * offset],
)
engine = Engine(update)
prc = PrecisionRecallCurve(device=metric_device)
prc.attach(engine, "prc")
data = list(range(n_iters))
engine.run(data=data, max_epochs=n_epochs)
assert "prc" in engine.state.metrics
precision, recall, thresholds = engine.state.metrics["prc"]
np_y_true = y_true.cpu().numpy().ravel()
np_y_preds = y_preds.cpu().numpy().ravel()
sk_precision, sk_recall, sk_thresholds = precision_recall_curve(np_y_true, np_y_preds)
print(f"precision: {precision}")
print(f"recall: {recall}")
print(f"thresholds: {thresholds}")
print(f"sk_precision: {sk_precision}")
print(f"sk_recall: {sk_recall}")
print(f"sk_thresholds: {sk_thresholds}")
metric_devices = ["cpu"]
if device.type != "xla":
metric_devices.append(idist.device())
for metric_device in metric_devices:
for _ in range(1):
print("new test")
_test(n_epochs=1, metric_device=metric_device) Command I used to run the code:
|
@vfdev-5 @sdesrozis Hey probably the tests for the metrics in contrib/metrics/regression are fixed, can you check once. |
@sayantan1410 your repro code example does not reproduce the issue and does not work with DDP as there is no communication group created. |
Yeah, I realized this now. |
@vfdev-5 @sdesrozis this was the change I was thinking about, please check once. |
|
||
offset = n_iters * size | ||
y_true = torch.rand(size=(offset * idist.get_world_size(),)).to(device) | ||
y_preds = torch.rand(size=(offset * idist.get_world_size(),)).to(device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sayantan1410 either revert these modifications or make them coherent to others and rename the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, making them coherent with others, also some tests are failing, fixing them as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you updated others ? Do not forget to rename the PR title to reflect that you change not only test_precision_recall_curve.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in the other tests this has been changed and I have changed the PR title as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you add
rank = idist.get_rank()
torch.manual_seed(rank)
here and other regression files ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this to precision_recall_curve and was waiting for all the tests to pass for it. Now I will add it to the other files as well.
@vfdev-5 yeah, now I understand it better than at that time :) |
@vfdev-5 Hey, Any idea why the hvd test is failling, other tests are passing now. |
@sayantan1410 looks like majority of tests are failing and not only hvd... Check |
@vfdev-5 Hey, that is of the old tests, I have made a commit after that and it is showing me that the tests are passing except for hvd. Probably you need to refresh the page once. |
Oh, I see, sorry. I rerun the tests, output looks strange |
@@ -186,7 +188,7 @@ def update(engine, i): | |||
e = np.abs(np_y_true - np_y_preds) / np.abs(np_y_true - np_y_true.mean()) | |||
np_res = np.median(e) | |||
|
|||
assert pytest.approx(res) == np_res | |||
assert pytest.approx(res, rel=1e-3) == np_res |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why you up the rel tolerance ? If it starts failing than there can be something in the code as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there can be, but the values were slightly different for all the tests, I looked into the code but couldn't figure out a reason so thought of changing the tolerance.
|
||
offset = n_iters * size | ||
y_true = torch.rand(size=(offset * idist.get_world_size(),)).to(device) | ||
y_preds = torch.rand(size=(offset * idist.get_world_size(),)).to(device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you updated others ? Do not forget to rename the PR title to reflect that you change not only test_precision_recall_curve.py
@@ -185,7 +187,7 @@ def update(engine, i): | |||
e = np.abs(np_y_true - np_y_preds) | |||
np_res = np.median(e) | |||
|
|||
assert pytest.approx(res) == np_res | |||
assert pytest.approx(res, rel=1) == np_res |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same remark here, relative error as 1 is too high !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I also thought so, I am trying to find why there is this difference in values.
@vfdev-5 Hey, Getting assertion error if I am not increasing the tolerance, Any idea how should I fix this ? |
@sayantan1410 can we split this PR such that improved tests that are passing could be merged and those are not passing we could investigate case by case. Otherwise you have a stuck block of modifications... |
@vfdev-5 I could do that, basically, only the precision_recall_curve is working, if possible can we have a meet once, I can show you where I am stuck, and we can solve the issue may be. |
Closing this PR in favor of #2655. Thanks for your work @sayantan1410 ! |
Related to #2490
I couldn't figure out why the test was passing initially, I probably needed to gather data from all the processes with idist.all_gather. But as per the conversation in the mentioned PR I have made the changes to it. Let me know if anything needs to be I will be happy to fix it.