Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CDC + Metrics] cdcsdk_sent_lag_micros metrics has sudden peaks #14881

Closed
shamanthchandra-yb opened this issue Nov 7, 2022 · 0 comments
Closed
Assignees
Labels
area/cdcsdk CDC SDK kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@shamanthchandra-yb
Copy link

shamanthchandra-yb commented Nov 7, 2022

Jira Link: DB-4162

Description

cdcsdk_sent_lag_micros sudden peak is seen sometimes. Witnessed during start and series of peaks when it ended, but still when cdc was streaming. We should address this.

Ideally, let’s say there was some lag and once TPCC ingest is complete, lag graph should decrease gradually from point it was existing, to indicate lag records are being read now. Instead, we see sudden lot of peaks.

Screenshot 2022-10-28 at 2 22 07 PM (1)
Screenshot 2022-10-28 at 2 10 44 PM (1)

@shamanthchandra-yb shamanthchandra-yb added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Nov 7, 2022
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue area/cdcsdk CDC SDK and removed area/docdb YugabyteDB core features labels Nov 7, 2022
Sumukh-Phalgaonkar pushed a commit that referenced this issue Nov 11, 2022
…s sudden peaks.

Summary:
This diff addresses the issue of "cdcsdk_sent_lag_micros" showing sudden peaks. This was beacause "cdcsdk_last_sent_physicaltime" metric was dropping to zero, and as a result a large value was assigned to "cdcsdk_sent_lag_micros" metric(//cdcsdk_last_sent_physicaltime// is used to calculate the //cdcsdk_sent_lag_micros//). This gives peaks in the graphs for the lag metric.

The issue comes up when  CDC doesn't get any records from WAL/IntentDB, as a result //cdcsdk_last_sent_physicaltime// was being set to zero. We are fixing this by passing the existing value of the //cdcsdk_last_sent_physicaltime// metric which is returned in case  CDC doesn't find  any record from from WAL/IntentDB.

Test Plan: Running existing C and Java tests

Reviewers: sdash, rvenkatesh, vkushwaha, abharadwaj, skumar

Reviewed By: abharadwaj, skumar

Subscribers: ycdcxcluster

Differential Revision: https://phabricator.dev.yugabyte.com/D20964
Sumukh-Phalgaonkar pushed a commit that referenced this issue Nov 11, 2022
…os" metric shows sudden peaks.

Summary:
Original commit: 98db092 / D20964
This diff addresses the issue of "cdcsdk_sent_lag_micros" showing sudden peaks. This was beacause "cdcsdk_last_sent_physicaltime" metric was dropping to zero, and as a result a large value was assigned to "cdcsdk_sent_lag_micros" metric(//cdcsdk_last_sent_physicaltime// is used to calculate the //cdcsdk_sent_lag_micros//). This gives peaks in the graphs for the lag metric.

The issue comes up when  CDC doesn't get any records from WAL/IntentDB, as a result //cdcsdk_last_sent_physicaltime// was being set to zero. We are fixing this by passing the existing value of the //cdcsdk_last_sent_physicaltime// metric which is returned in case  CDC doesn't find  any record from from WAL/IntentDB.

Test Plan: Running existing C and Java tests

Reviewers: sdash, rvenkatesh, vkushwaha, skumar, abharadwaj

Reviewed By: abharadwaj

Subscribers: ycdcxcluster

Differential Revision: https://phabricator.dev.yugabyte.com/D21042
jayant07-yb pushed a commit to jayant07-yb/yugabyte-db that referenced this issue Dec 7, 2022
…ric shows sudden peaks.

Summary:
This diff addresses the issue of "cdcsdk_sent_lag_micros" showing sudden peaks. This was beacause "cdcsdk_last_sent_physicaltime" metric was dropping to zero, and as a result a large value was assigned to "cdcsdk_sent_lag_micros" metric(//cdcsdk_last_sent_physicaltime// is used to calculate the //cdcsdk_sent_lag_micros//). This gives peaks in the graphs for the lag metric.

The issue comes up when  CDC doesn't get any records from WAL/IntentDB, as a result //cdcsdk_last_sent_physicaltime// was being set to zero. We are fixing this by passing the existing value of the //cdcsdk_last_sent_physicaltime// metric which is returned in case  CDC doesn't find  any record from from WAL/IntentDB.

Test Plan: Running existing C and Java tests

Reviewers: sdash, rvenkatesh, vkushwaha, abharadwaj, skumar

Reviewed By: abharadwaj, skumar

Subscribers: ycdcxcluster

Differential Revision: https://phabricator.dev.yugabyte.com/D20964
@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cdcsdk CDC SDK kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

4 participants