-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add internal counters to send only the latest datapoints to studio #788
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #788 +/- ##
==========================================
+ Coverage 95.53% 95.55% +0.02%
==========================================
Files 55 55
Lines 3559 3580 +21
Branches 319 319
==========================================
+ Hits 3400 3421 +21
Misses 111 111
Partials 48 48 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Do you think there are any additional tests that would be helpful to add?
Sure! I took a bad habit the last few years not to write tests. It needs to become a habit once again. Sorry for that, I'll add them :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to have a proper test, from the brief description I don't quite understand in what situations we have a bug - it would be helpful to have a test that makes it obvious / clear
…gger (another thread)
13415e1
to
c0c6199
Compare
@shcheklein Let me know what you think :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating the description, I think I understand it better now.
Just a few small questions to clarify re the test. Otherwise I think it should be fine (though I'm not an expert in the details re the data points management)
@daavoo it would be great if you can take a look :) |
* add test for repeated step in studio * drop outdated lightning test comments
thanks for the update @AlexandreKempf ! can we add a test / change this test a bit to have more datapoints with a different cadence of updates. Correct me folks, if I'm wrong but we keep counter per metric path, right? some metric can be updated a few times before the next step (and even before sync), some once. And we need in all cases make sure that on |
@AlexandreKempf Are you ready to merge this one? |
Following a bug detected during this feature development.
Using the
step
value to send the latest data to Studio can lead to weird behavior because thestep
is poorly defined in some loggers (eg. pytorch lightning logger). Because the step definition is poorly defined in lightning we used a hack to ensure thelog_metrics
calls by the lightning trainer were correct. But callinglog_metrics
from outside the lightning trainer (a separate thread for instance) leads to data not being sent to Studio or duplicates data.This PR introduces a counter for each metric that increments when Studio receives the data points. Instead of using the
step
property as a proxy for which data has been sent to studio, we literally count them now. This way, when we want to send data points to Studio, we can only send the points it hasn't received yet.The test added in the PR fails in the
main
branch becauselogs[test_metric]
iswhich is expected. But
test_calls
isWith this PR, both values
logs[test_metric]
andtest_calls
are the same and don't contain duplicated elements.❗ I have followed the Contributing to DVCLive guide.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.