-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[vpj][controller] Emit push job status metrics from controller #1185
[vpj][controller] Emit push job status metrics from controller #1185
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to go through the tests, but putting a preliminary review out
internal/venice-client-common/src/main/java/com/linkedin/venice/utils/EnumUtils.java
Outdated
Show resolved
Hide resolved
internal/venice-client-common/src/main/java/com/linkedin/venice/utils/EnumUtils.java
Outdated
Show resolved
Hide resolved
services/venice-controller/src/main/java/com/linkedin/venice/controller/VeniceHelixAdmin.java
Outdated
Show resolved
Hide resolved
services/venice-controller/src/main/java/com/linkedin/venice/controller/VeniceHelixAdmin.java
Outdated
Show resolved
Hide resolved
services/venice-controller/src/main/java/com/linkedin/venice/controller/VeniceHelixAdmin.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nisargthakkar for the review. Addressed the comments.
services/venice-controller/src/main/java/com/linkedin/venice/controller/VeniceHelixAdmin.java
Outdated
Show resolved
Hide resolved
services/venice-controller/src/main/java/com/linkedin/venice/controller/VeniceHelixAdmin.java
Outdated
Show resolved
Hide resolved
internal/venice-client-common/src/main/java/com/linkedin/venice/utils/EnumUtils.java
Outdated
Show resolved
Hide resolved
internal/venice-client-common/src/main/java/com/linkedin/venice/utils/EnumUtils.java
Outdated
Show resolved
Hide resolved
services/venice-controller/src/main/java/com/linkedin/venice/controller/VeniceHelixAdmin.java
Outdated
Show resolved
Hide resolved
...ce-test-common/src/integrationTest/java/com/linkedin/venice/endToEnd/PushJobDetailsTest.java
Outdated
Show resolved
Hide resolved
clients/venice-push-job/src/main/java/com/linkedin/venice/vpj/VenicePushJobConstants.java
Outdated
Show resolved
Hide resolved
internal/venice-common/src/main/java/com/linkedin/venice/PushJobCheckpoints.java
Outdated
Show resolved
Hide resolved
...s/venice-push-job/src/test/java/com/linkedin/venice/hadoop/TestVenicePushJobCheckpoints.java
Outdated
Show resolved
Hide resolved
services/venice-controller/src/main/java/com/linkedin/venice/controller/VeniceHelixAdmin.java
Outdated
Show resolved
Hide resolved
...venice-controller/src/main/java/com/linkedin/venice/controller/stats/PushJobStatusStats.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nisargthakkar. Addressed the comments.
...venice-controller/src/main/java/com/linkedin/venice/controller/stats/PushJobStatusStats.java
Show resolved
Hide resolved
services/venice-controller/src/main/java/com/linkedin/venice/controller/VeniceHelixAdmin.java
Outdated
Show resolved
Hide resolved
internal/venice-common/src/main/java/com/linkedin/venice/PushJobCheckpoints.java
Outdated
Show resolved
Hide resolved
clients/venice-push-job/src/main/java/com/linkedin/venice/vpj/VenicePushJobConstants.java
Outdated
Show resolved
Hide resolved
...s/venice-push-job/src/test/java/com/linkedin/venice/hadoop/TestVenicePushJobCheckpoints.java
Outdated
Show resolved
Hide resolved
internal/venice-common/src/main/java/com/linkedin/venice/ConfigKeys.java
Outdated
Show resolved
Hide resolved
internal/venice-common/src/main/java/com/linkedin/venice/ConfigKeys.java
Outdated
Show resolved
Hide resolved
internal/venice-common/src/test/java/com/linkedin/venice/PushJobCheckPointsTest.java
Outdated
Show resolved
Hide resolved
...ce-test-common/src/integrationTest/java/com/linkedin/venice/endToEnd/PushJobDetailsTest.java
Outdated
Show resolved
Hide resolved
...e-controller/src/main/java/com/linkedin/venice/controller/VeniceControllerClusterConfig.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nisargthakkar addressed the comments
...ce-test-common/src/integrationTest/java/com/linkedin/venice/endToEnd/PushJobDetailsTest.java
Outdated
Show resolved
Hide resolved
...e-controller/src/main/java/com/linkedin/venice/controller/VeniceControllerClusterConfig.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. A couple of nitpicks
...ntroller/src/test/java/com/linkedin/venice/controller/TestVeniceControllerClusterConfig.java
Outdated
Show resolved
Hide resolved
...ntroller/src/test/java/com/linkedin/venice/controller/TestVeniceControllerClusterConfig.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @m-nagarajan!
Thanks @nisargthakkar |
VPJ
communicates with the controller to writePushJobDetails
to thePUSH_JOB_DETAILS_STORE_NAME
system store. This PR introduces new metrics emitted by the parent controller for push job success/failure.New Metrics Added (
Count
andCountSinceLastMeasurement
added via Tehuti PR, hence usingtehuti:0.12.2
):batch_push_job_success
,batch_push_job_failed_user_error
,batch_push_job_failed_non_user_error
incremental_push_job_success
,incremental_push_job_failed_user_error
,incremental_push_job_failed_non_user_error
Current flow is VPJ checks
push.job.status.upload.enable
config and sendsPushJobDetails
to/send_push_job_details
path inVenice-controller
, which writes it to the push job details system store. Approaches considered to emit metrics:push.job.status.upload.enable
config. This config is enabled everywhere.PushJobDetails
to holdpush.job.status.upload.enable
config and move the logic of determining success/failure status toVenicePushJob
.Chose approach 1 as it’s the simplest and doesn’t require deployment ordering (controllers -> VPJ) unlike other options and no schema evolution needed.
Configs introduced:
Added parent controller config
push.job.failure.checkpoints.to.define.user.error
to provide a custom list of these checkpoints based on the usecases to emit the metrics accordingly.DEFAULT_PUSH_JOB_USER_ERROR_CHECKPOINTS
will be used by default.For Reviewers:
Main code changes are in
VeniceHelixAdmin.java
How was this PR tested?
GH CI
Does this PR introduce any user-facing changes?