Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please support something like "allow-failure" for a given job #2347

Closed
mvo5 opened this issue Apr 1, 2020 · 166 comments
Closed

Please support something like "allow-failure" for a given job #2347

mvo5 opened this issue Apr 1, 2020 · 166 comments
Labels
enhancement New feature or request external

Comments

@mvo5
Copy link

mvo5 commented Apr 1, 2020

Edit from @vanZeben:

As this is a GitHub Actions platform feature request, discussion for this feature has been moved to https://github.com/orgs/community/discussions/15452, please go and put your support behind that community discussion instead of this issue.


We use github actions in our "snapd" project and we love them.

One small feature we would love to see is a way to mark a test job as "allow-failure" (or a term along these lines) [0]. This would simply mean that the overall /pulls overview page would show the PR as with the little green tick-mark (and maybe in the tooltip 5/6 OK, 1 ignored). It would still show as a failure in the details view (maybe with a different icon?).

Our use-case is that we have some CI environments that fail frequently because of external factors like repository mirrors that are our of sync etc. We still want to run the CI on these systems but not get distracted too much by these out-of-our-control issues.

Hope this makes sense.

Thanks!
Michael

[0] E.g.

jobs:
  spread:
    runs-on: ubuntu-latest
    allow-failure: true
    steps:
    - do-some-flaky-stuff
@mvo5 mvo5 added the enhancement New feature or request label Apr 1, 2020
@thboop
Copy link
Collaborator

thboop commented Apr 1, 2020

Hey @mvo5 Thanks for the feature request!

We do support marking a step to allow failure via continue-on-error.

We also support marking specific checks as required
That will allow you to merge pr's without a particular job succeeding.

I think the latter should solve your issue at the job level, is there a reason that doesn't well for you?

@mvo5
Copy link
Author

mvo5 commented Apr 1, 2020

Hey @thboop. Thanks for your quick reply!

Yeah, it's really just about the little green tick at the pull-request overview page. AFAICT when one job (even if it's not required) fails the overview PR list will show this PR as failed. Having a way to mark certain jobs as not rquired would still show the pulls as green (or yellow?) instead of red.

But I do understand this is a bit of a specific request, so feel free to close it if you think it's a bit too odd. We had it with our old CI system and I liked it.

@fingolfin
Copy link

I don't think it is that specific. People have requested this before: https://git.luolix.topmunity/t5/GitHub-Actions/continue-on-error-allow-failure-UI-indication/td-p/37033

I just came here via Google as I was surprised I couldn't find anything like this in the documentation. It's standard with e.g. Travis CI

@jhenstridge
Copy link

I guess one way to handle this would be for jobs with an allow-failure flag set to generate a neutral conclusion on failure instead of failure. As I understand it, this would let the overall check suite conclusion be success if such a job fails.

@jadephilipoom
Copy link

I'm also an Actions user who would love to see this feature. My team's project has a long build and a few jobs that are flaky and frequently take a while to fix. We do just ignore failures on those for merging PRs, but being able to make the green tick agree with that convention would be really nice.

@jakesylvestre
Copy link

This would be great

@ghostsquad
Copy link

in the interim, what if there was a step that simply posted the "allowed failures" in a comment (that same comment updated each time the workflow ran). Similar CodeCov.

according to the docs, this is a detectable condition:

https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#steps-context

steps..outcome string The result of a completed step before continue-on-error is applied. Possible values are success, failure, cancelled, or skipped. When a continue-on-error step fails, the outcome is failure, but the final conclusion is success.
steps..conclusion string The result of a completed step after continue-on-error is applied. Possible values are success, failure, cancelled, or skipped. When a continue-on-error step fails, the outcome is failure, but the final conclusion is success.

@henryiii
Copy link

I thought that jobs.<job_id>.continue-on-error sounded exactly like this feature, but it doesn't seem to work like that. For me, at least, setting a job to "continue-on-error" still causes a red x for the workflow; it doesn't seem to have any effect at all.

@dopplershift
Copy link

By default, when any job within a workflow fails, all the other jobs are cancelled (unless 'fail-fast' is False).continue-on-error specifies that a certain job failing should not trigger cancelling other jobs. (I would definitely love some way to keep a job from signalling a failing check.)

@ramsey
Copy link

ramsey commented Sep 10, 2020

Not only does continue-on-error not work as expected (as mentioned by everyone on this thread), but the example in the documentation appears to result in a syntax error that I cannot figure out how to get around. As a result, I've been trying all sorts of permutations to figure out if this syntax error is causing continue-on-error not to work as expected.

Maybe it is?

Here's the documentation example verbatim:

runs-on: ${{ matrix.os }}
continue-on-error: ${{ matrix.experimental }}
strategy:
  fail-fast: false
  matrix:
    node: [11, 12]
    os: [macos-latest, ubuntu-18.04]
    experimental: [false]
    include:
      - node: 13
        os: ubuntu-18.04
        experimental: true

My version looks similar:

unit-tests:
  name: "Unit Tests"

  runs-on: ${{ matrix.operating-system }}
  continue-on-error: ${{ matrix.experimental }}

  strategy:
    fail-fast: false
    matrix:
      dependencies:
        - "lowest"
        - "highest"
      php-version:
        - "7.4"
        - "8.0"
      operating-system:
        - "ubuntu-latest"
      experimental: [false]
      include:
        - php: "8.0"
          composer-options: "--ignore-platform-reqs"
          experimental: true

When this runs, it fails at the unit-tests job, due to the following error. (See the failure here.)

Error when evaluating 'runs-on' for job 'unit-tests'. (Line: 121, Col: 14): Unexpected value ''

Somehow, it's getting an empty string for runs-on, or at least, that's what this error seems to indicate to me.

When I remove the line that defines experimental: [false] (and only this line), the build runs successfully (see here).

So, maybe the successful build is not actually taking into account the continue-on-error property? Does this syntax error help point to some kind of culprit that could resolve this issue for everyone on this thread?

@nschonni
Copy link

@ramsey your issue doesn't seem related to the thread. Your job is failing because your include doesn't have a value for operating-system

@ramsey
Copy link

ramsey commented Sep 11, 2020

Why does my include work without operating-system when I remove experimental: [false] from the matrix?

@ramsey
Copy link

ramsey commented Sep 17, 2020

@nschonni You were right. Turns out, I had my matrix messed up. Sorry for the churn here, y'all. 😀

@szepeviktor
Copy link

szepeviktor commented Sep 18, 2020

This is one solution

      - name: Install dependencies
        id: composer-run
        continue-on-error: true
        run: composer update --${{ matrix.dependency-version }} --prefer-dist --no-interaction --no-suggest

      - name: Execute tests
        if: steps.composer-run.outcome == 'success' && steps.composer-run.conclusion == 'success'
        run: vendor/bin/phpunit

continue-on-error causes difference in outcome and conclusion!

@zackees
Copy link

zackees commented Aug 19, 2024

I am a maintainer of FastLED and we have a bunch of platforms that we don't support yet like the esp32c2.

I want to track which builds are expected to succeed and which are allowed to fail. However we don't have this feature yet so I just don't run those optional builds on all pulls.

This seems like an easy feature to put in and it seems like Microsoft is wasting more time dealing with the community requesting this than actually implementing the feature.

Just implement the feature and stop wasting time on dealing with the community requesting it!

@EwoutH
Copy link

EwoutH commented Aug 19, 2024

image

image

At some point the GitHub Actions runners will get a new product owner and be like - What? Really?! Why?! HOW?!?!

@LecrisUT
Copy link

You see, the key point is to make sure it's marked closed so that nobody actually looks into it. I believe last I read about this topic the main people who were working on this were laid-off or something to that extent.

On a brighter note, this is probably one of those issues that everyone who has managed github actions is present (insert relevant xkcd here - 2363?), so you can use it to give a quick message to everyone.


Hello everyone, hope you will have a lovely day! Thank you for your work on your github workflows implementations, it is great to see what enginious approaches you've developed, they've been a great inspiration!

@EwoutH
Copy link

EwoutH commented Aug 19, 2024

You know what, let's get some current GitHub runner developers in the party!

Hi @TingluoHuang, @ericsciple, @rentziass, @nikola-jokic, @joshmgross, @AllanGuigou and @luketomlinson!

I'm tagging you all from the most upvoted issue on this repo ever! Who doesn't want the bragging rights that (s)he unblocked the most requested feature in GitHub Actions history?

What do you say, that's not enough? Go implement it, it's probably only a couple hours. And then you can rightfully claim you made the largest amount of users happy in one single blow!

You guys rock and you can do this! 🚀🚀🚀

@nebuk89
Copy link

nebuk89 commented Aug 21, 2024

Sorry I missed this one as it was marked as closed (I am not an admin yet so cannot re-open!). We are currently going through adding paper cut sized items to our backlog for this year (you will see me/ @Steve-Glass and @lkfortuna trying to get through the backlog :) )

I have added this and will dig in with the team today how big this is/any concerns to see when we can get this one added. Where I am trying to comment on a bunch of these issues my GitHub notifications are now 🔥 so please be patient as I try to get back round here :)

dead-claudia added a commit to MithrilJS/mithril.js that referenced this issue Aug 25, 2024
Tests still appear to fail per #2898. Unfortunately, I need actions/runner#2347 to ignore the test failures properly - I need them to be warnings, not hard errors.
noonio added a commit to cardano-scaling/hydra that referenced this issue Aug 29, 2024
<!-- Describe your change here -->

This PR enhances the [demo
tutorial](https://hydra.family/head-protocol/docs/getting-started/) by
enabling `hydra-cluster` benchmarks to run on an active Hydra cluster.

**usage**

See the newly introduced `network-test.yaml` for the related invocations of pumba and the hydra clients. Supposing they are running, you simply run:

```sh
nix run .#legacyPackages.x86_64-linux.hydra-cluster.components.benchmarks.bench-e2e -- \
          demo \
          --output-directory=$(pwd)/benchmarks \
          --scaling-factor=100 \
          --timeout=1000s \
          --testnet-magic 42 \
          --node-socket=${NETWORK_DIR}/node.socket \
          --hydra-client=localhost:4001 \
          --hydra-client=localhost:4002 \
          --hydra-client=localhost:4003
```

and you will get some statistics on txns confirmed, time taken, etc.

**prerequisites**
- A Cardano node must be running on specified `node-socket`.
- Hydra nodes must be operational on provided `hydra-client` hosts.
- There’s no need to pre-seed the keys, as the bench-demo script will
automatically fund them using the faucet.
- Note that the reference scripts should already be published, and the
Hydra nodes must be running with those scripts.


**Todo**

- [x] Fix the `FIXME` about `> 33`
- [x] Remove duplicate seeding
- [x] Make sure the entire CI process doesn't fail when the pumba causes
the network to fail
- [x] Make it so that if it _fails_ the head is closed.
- [x] Quick little matrix to run a few different scenarios
- [x] Make the bench-e2e fail if it didn't submit all the txns ( ideally
would also be able to see visually in the job list; but Github is
missing a feature see also actions/runner#2347
)
- [x] Get docker info via `docker inspect` instead of parsing yaml (!)
- [x] Make sure `results.csv` is written to the `outputDirectory` not
the tmp directory
- [x] Upload the results as part of the artifacts
- [x] Write the summary out even when it failed

---

<!-- Consider each and tick it off one way or the other -->
* [x] CHANGELOG updated or not needed
* [x] Documentation updated or not needed
* [x] Haddocks updated or not needed
* [x] No new TODOs introduced or explained herafter

---------

Co-authored-by: Noon van der Silk <noon.vandersilk@iohk.io>
Co-authored-by: Sebastian Nagel <sebastian.nagel@ncoding.at>
ch1bo added a commit to cardano-scaling/hydra that referenced this issue Sep 13, 2024
A bit of a hack to only run the network tests we expect to succeed. This
at least ensures we don't get any worse, even if it doesn't directly
allow us to track if we're getting better.

See also actions/runner#2347
dead-claudia added a commit to dead-claudia/mithril.js that referenced this issue Sep 23, 2024
Tests still appear to fail per MithrilJS#2898. Unfortunately, I need actions/runner#2347 to ignore the test failures properly - I need them to be warnings, not hard errors.
@mathis-la-debrouille
Copy link

Drone and others have it, and while it may seem minor, good UI for CI is incredibly important. I hope to see this 'on' soon.

@wsanchez
Copy link

I got sufficiently tired of this that I came up with a work-around. Posting it here in case others find it useful:

https://github.com/burningmantech/ranger-ims-server/pull/1347/files

What I did gets me:

  • a green checkmark for the job (not great, but this is the trade-off we are making)
  • a green checkmark for the workflow (the goal here)
  • a comment in the PR indicating that an optional build failed (replacement for red checkmark for the job)
Screenshot 2024-10-31 at 10 42 26 AM

@Caellian
Copy link

@wsanchez Thank you. That support for a yellow exclamation mark instead of a green chackmark is basically the "good UI for CI is incredibly important" part.

So until we get the (apparently) super difficult to implement different status value in DB, complicated backend support for if (step.allow_failure && step.failed) { status = warn } else { existing logic }, and advanced yellow color support the likes of which even CIE can't comprehend in frontend, that's sadly the best DX that can be supported.

Management is probably too busy bringing 3 new LM models to Copilot - a gimmick for investors that will wow them for precisely maybe another quarter - to push for actually useful features to the product, that most devs want and need.

@pboling
Copy link

pboling commented Nov 16, 2024

GitHub is actively anti-developer at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request external
Projects
None yet
Development

No branches or pull requests