Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add retry count value of custom metric #11927

Merged
merged 3 commits into from
Dec 14, 2023
Merged

Conversation

moonyoungCHAE
Copy link
Contributor

@moonyoungCHAE moonyoungCHAE commented Oct 2, 2023

Fixes #7705

Motivation

Currently, there is no way to obtain the retry count for each step. If we can use the retry count as a custom metric, it would be useful for identifying problematic steps.

Modifications

I have added the retry count as a metric value.

Verification

I conducted a test as follows:
If a step is successful, the retry metric will not be shown. However, if a step fails and is retried, the retry metric should be displayed.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: retry-with-steps-
spec:
  entrypoint: retry-with-steps
  templates:
    - name: retry-with-steps
      steps:
        - - name: success
            template: random-success
          - name: fail
            template: random-fail
    - name: random-success
      container:
        image: python:alpine3.6
      metrics:
        prometheus:
          - name: retry_count_success
            help: "retry count of success step that would not be appear, as it would not be retried"
            counter:
              value: "{{retries}}"
    - name: random-fail
      retryStrategy:
        limit: "3"
      container:
        image: python:alpine3.6
        command: [python, -c]
        args: ["invalid_args@@@; import sys; print('retries: {{retries}}'); exit_code = random.choice([0, 1, 1]); sys.exit(exit_code)"]
      metrics:
        prometheus:
          - name: retry_count_fail
            help: "retry count of fail and retry step that would be appear, as it would be retried"
            counter:
              value: "{{retries}}"
스크린샷 2023-10-02 오후 4 25 11

The fail step retried 3 times.

And this is what I got from localhost:9090/metrics

# HELP argo_workflows_retry_count_fail retry count of fail and retry step that would be appear, as it would be retried
# TYPE argo_workflows_retry_count_fail counter
argo_workflows_retry_count_fail 3

Signed-off-by: moonyoungCHAE <xpf_fl@naver.com>
Signed-off-by: moonyoungCHAE <xpf_fl@naver.com>
@juliev0
Copy link
Contributor

juliev0 commented Nov 23, 2023

So sorry, I'm going through old PRs and I see that unfortunately this one has been forgotten. I will try to review it soon.

Argo Workflows needs more people approving PRs unfortunately. If your company is by chance interested, I am gauging interest in this idea: #12229

@juliev0 juliev0 self-assigned this Dec 14, 2023
@juliev0 juliev0 enabled auto-merge (squash) December 14, 2023 05:07
@juliev0 juliev0 merged commit dee7ec5 into argoproj:main Dec 14, 2023
27 checks passed
@agilgur5 agilgur5 added this to the v3.6.0 milestone May 4, 2024
@agilgur5 agilgur5 added the area/retryStrategy Template-level retryStrategy label May 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metrics area/retryStrategy Template-level retryStrategy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Count retries of steps in a workflow
3 participants