-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CDC + Metrics] cdcsdk_sent_lag_micros metrics has sudden peaks #14881
Labels
Comments
shamanthchandra-yb
added
area/docdb
YugabyteDB core features
status/awaiting-triage
Issue awaiting triage
labels
Nov 7, 2022
yugabyte-ci
added
kind/bug
This issue is a bug
priority/medium
Medium priority issue
area/cdcsdk
CDC SDK
and removed
area/docdb
YugabyteDB core features
labels
Nov 7, 2022
Sumukh-Phalgaonkar
pushed a commit
that referenced
this issue
Nov 11, 2022
…s sudden peaks. Summary: This diff addresses the issue of "cdcsdk_sent_lag_micros" showing sudden peaks. This was beacause "cdcsdk_last_sent_physicaltime" metric was dropping to zero, and as a result a large value was assigned to "cdcsdk_sent_lag_micros" metric(//cdcsdk_last_sent_physicaltime// is used to calculate the //cdcsdk_sent_lag_micros//). This gives peaks in the graphs for the lag metric. The issue comes up when CDC doesn't get any records from WAL/IntentDB, as a result //cdcsdk_last_sent_physicaltime// was being set to zero. We are fixing this by passing the existing value of the //cdcsdk_last_sent_physicaltime// metric which is returned in case CDC doesn't find any record from from WAL/IntentDB. Test Plan: Running existing C and Java tests Reviewers: sdash, rvenkatesh, vkushwaha, abharadwaj, skumar Reviewed By: abharadwaj, skumar Subscribers: ycdcxcluster Differential Revision: https://phabricator.dev.yugabyte.com/D20964
Sumukh-Phalgaonkar
pushed a commit
that referenced
this issue
Nov 11, 2022
…os" metric shows sudden peaks. Summary: Original commit: 98db092 / D20964 This diff addresses the issue of "cdcsdk_sent_lag_micros" showing sudden peaks. This was beacause "cdcsdk_last_sent_physicaltime" metric was dropping to zero, and as a result a large value was assigned to "cdcsdk_sent_lag_micros" metric(//cdcsdk_last_sent_physicaltime// is used to calculate the //cdcsdk_sent_lag_micros//). This gives peaks in the graphs for the lag metric. The issue comes up when CDC doesn't get any records from WAL/IntentDB, as a result //cdcsdk_last_sent_physicaltime// was being set to zero. We are fixing this by passing the existing value of the //cdcsdk_last_sent_physicaltime// metric which is returned in case CDC doesn't find any record from from WAL/IntentDB. Test Plan: Running existing C and Java tests Reviewers: sdash, rvenkatesh, vkushwaha, skumar, abharadwaj Reviewed By: abharadwaj Subscribers: ycdcxcluster Differential Revision: https://phabricator.dev.yugabyte.com/D21042
jayant07-yb
pushed a commit
to jayant07-yb/yugabyte-db
that referenced
this issue
Dec 7, 2022
…ric shows sudden peaks. Summary: This diff addresses the issue of "cdcsdk_sent_lag_micros" showing sudden peaks. This was beacause "cdcsdk_last_sent_physicaltime" metric was dropping to zero, and as a result a large value was assigned to "cdcsdk_sent_lag_micros" metric(//cdcsdk_last_sent_physicaltime// is used to calculate the //cdcsdk_sent_lag_micros//). This gives peaks in the graphs for the lag metric. The issue comes up when CDC doesn't get any records from WAL/IntentDB, as a result //cdcsdk_last_sent_physicaltime// was being set to zero. We are fixing this by passing the existing value of the //cdcsdk_last_sent_physicaltime// metric which is returned in case CDC doesn't find any record from from WAL/IntentDB. Test Plan: Running existing C and Java tests Reviewers: sdash, rvenkatesh, vkushwaha, abharadwaj, skumar Reviewed By: abharadwaj, skumar Subscribers: ycdcxcluster Differential Revision: https://phabricator.dev.yugabyte.com/D20964
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Jira Link: DB-4162
Description
cdcsdk_sent_lag_micros sudden peak is seen sometimes. Witnessed during start and series of peaks when it ended, but still when cdc was streaming. We should address this.
Ideally, let’s say there was some lag and once TPCC ingest is complete, lag graph should decrease gradually from point it was existing, to indicate lag records are being read now. Instead, we see sudden lot of peaks.
The text was updated successfully, but these errors were encountered: