Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: "publish bleeding edge" failures often don't have issues opened for them #40414

Closed
tbg opened this issue Sep 3, 2019 · 9 comments
Closed

Comments

@tbg
Copy link
Member

tbg commented Sep 3, 2019

This build is pretty red. Not having it green means that we don't have these binaries stageable, and I think it also impacts the nightly roachtest builds which has the bleeding edge build as a prereq. We should not lose this kind of test coverage at this point.

https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_MergeToMaster?branch=%3Cdefault%3E&buildTypeTab=overview

Nobody really checks this page, so we need to fix the issue posting ASAP.

Examples here: https://cockroachlabs.slack.com/archives/CJ0H8Q97C/p1567069131004000

Epic CRDB-10439

Jira issue: CRDB-5537

@tbg
Copy link
Member Author

tbg commented Sep 6, 2019

https://teamcity.cockroachdb.com/admin/editBuildRunners.html?id=buildType:Cockroach_MergeToMaster

Here's the build step that should do this.

Here's a build where that claims to have worked but didn't: https://teamcity.cockroachdb.com/viewLog.html?buildId=1471297&buildTypeId=Cockroach_MergeToMaster&tab=buildLog&_focus=167#_state=167

I would start by running that script locally against that build to see if the problem is in posting the issue or in detecting the failed tests.

@bobvawter
Copy link
Member

I've been chasing the TeamCity API calls made by the teamcity-post-failures.py script and they're all looking like they still line up with what the script expects. At this point, I think the next step is to check in some debugging changes and wait for it to not-report again.

@jordanlewis
Copy link
Member

and I think it also impacts the nightly roachtest builds which has the bleeding edge build as a prereq.

I changed the nightlies to not depend on publish bleeding edge. They just pull the most recently generated binaries from the same branch. So at least that problem is solved.

@tbg
Copy link
Member Author

tbg commented Sep 27, 2019

FWIW, this issue is still going wild:

https://teamcity.cockroachdb.com/viewLog.html?buildId=1511685&buildTypeId=Cockroach_UnitTests

None of these failures had issues opened for them. They're new issues, so it's especially unfortunate that we don't get to see them. I'm not aware that anyone else is monitoring teamcity the way I feel forced I have to.

@jordanlewis
Copy link
Member

I'll take a look at this very soon. Maybe the script has rotten.

@kenliu
Copy link

kenliu commented Oct 15, 2019

@jordanlewis did you get a chance to look into this?

@tbg tbg unassigned kenliu and bobvawter Dec 4, 2019
craig bot pushed a commit that referenced this issue Dec 4, 2019
42565: build: use json output and avoid go-test-teamcity r=jordanlewis a=tbg

Previously, our tests would produce text output and go-test-teamcity
would be charged with translating from that to TeamCity output.
This was buggy, error-prone and undertested. Parsing messy test output
is hard and there are all kinds of corner cases.

Instead of trying to solve this ourselves, rely on the built-in json
output that the go toolchain provides. This output is far from perfect
- I'm pretty sure it has some bugs as well - but it will become better
over time without any involvement on our end. Additionally, TeamCity
knows how to parse this format already, so go-test-teamcity goes away,
and - hopefully - with each TeamCity upgrade the way the tests are
handled is going to improve.

There is one thing that go-test-teamcity did that we would like to keep,
namely filtering the test output to omit the output for tests that
passed. Our tests are really chatty, and we don't want to pollute our
TeamCity machines with 200mb log files.

So I wrote a little tool that does *just* this filtering (called
testfilter) which we'll invoke instead of go-test-teamcity.

The pipeline looks something like this:

```
make test JSON=true ... \
    testfilter -mode=strip \
    | tee tmp.json
testfilter -mode=omit < tmp.json | testfilter -convert > failures.txt
rm tmp.json
```

where

- `mode=strip` removes the output for non-failing tests (but not the
  start-stop events). TeamCity will thus learn about all tests, but won't
  be burdened with retaining (or even parsing) test output that is
  irrelevant.
- `mode=omit` only keeps failing tests (i.e. will never even mention any
  others). We pipe this to:
- `mode=convert` to get back to human-readable text, in case someone
  would like to download an artifact with only relevant lines in it.

I don't expect anyone to really need the artifact, because TeamCity
lets you copy the "stack trace" (=test output) to clipboard from
the UI, but you never know.

The very nature of how robust our CI pipeline is dictates that I will
have broken something, somewhere. I'll be watching out for fallout when
this merges.
However there are also at least two improvements:

- the "stress pull request" failures are now properly recognized by
  TeamCity; previously one had to sift through the build log.
- teamcity-post-failures.py is no more (actually it still is, but
  it's a noop, just to make sure we don't have to maintain two
  separate TeamCity configs for before/after this change). Issue
  posting is now done via github-post.

Fixes #15034.
Informs #40414.

Release note: None

42921: Added myself to AUTHORS file. r=aaron-crl a=aaron-crl



Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Co-authored-by: Aaron Blum <aaron@cockroachlabs.com>
@tbg
Copy link
Member Author

tbg commented Dec 4, 2019

teamcity-post-failures.py is now a noop on master; we use the same tool that nightly stress uses to post issues from release branches now. Will close this once I've confirmed it working.

@tbg tbg self-assigned this Dec 4, 2019
@jordanlewis
Copy link
Member

Re-ping on this. I haven't done anything on it, but it would be great if this script could do something on failure - I've gotten too used to polling #teamcity on Slack to find failures.

For example, check out this recent failure: https://teamcity.cockroachdb.com/viewLog.html?buildId=1995033&tab=buildResultsDiv&buildTypeId=Cockroach_MergeToMaster

I think this is happening during mkrelease for darwin?

@tbg
Copy link
Member Author

tbg commented Jun 17, 2020

I don't think we ever opened issues on master when the build failed (or did teamcity-post-failures.py do that as some sort of accidental side effect?), it was always for posting issues based on Go tests failing.

@tbg tbg closed this as completed May 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants