Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Github rate limits cause transient service test failures in CI #979

Closed
paulmelnikow opened this issue Apr 30, 2017 · 24 comments · Fixed by #1105
Closed

Github rate limits cause transient service test failures in CI #979

paulmelnikow opened this issue Apr 30, 2017 · 24 comments · Fixed by #1105
Labels
bug Bugs in badges and the frontend developer-experience Dev tooling, test framework, and CI

Comments

@paulmelnikow
Copy link
Member

There are a couple problems with GIthub API auth on Travis related to the service tests (#937).

(1) The service test runner, which runs as part of every build, fails transiently about 10% of the time. It's because the runner fetches the PR title from github, to decide which tests to run, and these requests are unauthenticated, so they often fail due to rate limits. The Github API allows 60 requests per hour for unauthenticated requests and 5,000 per hour for authorized requests.

> gh-badges@1.3.0 test:services /home/travis/build/badges/shields
> mocha --delay service-tests/runner/cli.js "--pr"
PR: badges/shields#974
Error: 403 Forbidden
    at Error (native)
    at fetch.then.res (/home/travis/build/badges/shields/service-tests/runner/cli.js:31:15)
    at process._tickCallback (internal/process/next_tick.js:109:7)

(That's from https://travis-ci.org/badges/shields/builds/226980312#L760-L769)

(2) Service tests for github (e.g. #745, #976) will intermittently fail for the same reason. This won't affect every build, only the nightly scheduled builds on master, and PRs tagged with [github].

Reliable builds are important, so let's find a way to fix this.

@paulmelnikow paulmelnikow added bug Bugs in badges and the frontend developer-experience Dev tooling, test framework, and CI labels Apr 30, 2017
@Daniel15
Copy link
Member

The Github API allows 60 requests per hour for unauthenticated requests and 5,000 per hour for authorized requests.

Sounds like using authenticated requests is the way to go. Does Travis have some secure way of storing secrets like GitHub auth tokens? I guess it's tricky to allow pull requests to have access to the auth token in a secure way though (as someone could submit a PR with a modified build script that echos the auth token).

I guess GitHub's webhook would provide all the pull request information to Travis, including the title of the PR. Is there a way for the build script to grab the info from Travis?

@Daniel15
Copy link
Member

Might be able to use $TRAVIS_COMMIT_MESSAGE:

TRAVIS_COMMIT_MESSAGE: The commit subject and body, unwrapped.

https://docs.travis-ci.com/user/environment-variables/

@webcaetano
Copy link
Contributor

@paulmelnikow seems the obvious fix is to authenticated these requests on cli.js

@paulmelnikow
Copy link
Member Author

Does Travis have some secure way of storing secrets like GitHub auth tokens? I guess it's tricky to allow pull requests to have access to the auth token in a secure way though (as someone could submit a PR with a modified build script that echos the auth token).

They do, though you are correct. We could create an account (e.g. shields-integration-tests) with no repo access, and just use it for the tests. If someone snags it they could use up our quota, and maybe if they're being really malicious get that account shut down. (Annoying, though hardly the worst thing.) The whole thing would appear in the Travis logs, so it's not like someone would really "get away with it." Somewhere I've seen Github say they are okay with companies creating a robot account here and there. I had two such accounts at my last company.

Might be able to use $TRAVIS_COMMIT_MESSAGE:

Indeed. Travis does --depth=10 on checkout, so we could also check a few commits further back. Perhaps TRAVIS_COMMIT_RANGE would be helpful to make sure we're only checking commits.

The commit message is not quite as convenient a place as the PR title. The advantage of the PR title is that it's quite visible, and easily edited by either the originator or a maintainer. It only needs to be set once, not repeated in subsequent commits.

Heh, I just ran a test, and to my surprise TRAVIS_COMMIT_RANGE gets populated with a merge commit Travis creates to run the test. Meh. (Reference)

TRAVIS_COMMIT_MESSAGE=Merge 21f8d54e748582aecb8ee8988d1266c58c45d667 into 896b2aab2ffe727d3c0cac79eb39bf424cb738dc

Annnd… we also want Github tests to work.

I'd suggest we make a robot account, take reasonable steps to secure the credentials in Travis, and monitor how it goes.

@Daniel15
Copy link
Member

Daniel15 commented May 7, 2017

take reasonable steps to secure the credentials in Travis

How could we do this? If the access token is readable in pull requests (as it would need to be in order for PRs to use it), someone could simply edit the build script to echo it and then send a pull request, which would reveal the token. 😕

The safest approach I can think of is to add an endpoint to shields.io that gets the metadata. For example, https://shields.io/github-pull-request-metadata/1234, where 1234 is the pull request number. It could return something like {"service":"foo"}. That way the token can be private, while still allowing pull requests to get the data they need. The build script would just hit that URL to get the metadata.

@paulmelnikow
Copy link
Member Author

How could we do this? If the access token is readable in pull requests (as it would need to be in order for PRs to use it), someone could simply edit the build script to echo it and then send a pull request, which would reveal the token. 😕

You're right, though that would be in the Travis records. The worst case scenario is not that bad, though. I'm not terribly concerned.

By reasonable steps, I mean let's secure it in the Travis config, so someone would have to go out of their way to see it. Let's not check it into the repository in cleartext, for example.

The safest approach I can think of is to add an endpoint to shields.io that gets the metadata

That is a nice idea for dealing with the metadata, though it wouldn't help with the Github tests. 😕

@paulmelnikow
Copy link
Member Author

I'm thinking I will go ahead and address this. I'll make a github account for shields CI and configure Travis with the token. It's not a perfect solution, though the worst case – someone stealing our token – is no worse than today!

@Daniel15
Copy link
Member

You can likely use app ID + app secret rather than token. I think that increases the request limits too, and app ID + secret leaking doesn't grant the user access to anything (since nobody will actually 'log in' to the app)

@RedSparr0w
Copy link
Member

RedSparr0w commented Sep 25, 2017

should just be able to set the environment variable on Travis-CI:
image
And should be able to access it from the script using something like this in github-auth.js:

autosave(githubUserTokensFile, {data:[]}).then(function(f) {
  githubUserTokens = typeof process.env.TRAVIS != 'undefined' &&  process.env.TRAVIS ? process.env.github_token : f;

with 'display value in build log': off shows like:
image

edit: also if someone logs the key to console it is replaced with [secure] there are ways around this, modifying the key would show it, but as you said earlier a 'bot' account with no real access perms wouldn't matter much, more of a annoyance than anything else and don't think it would be a common occurrence.

@paulmelnikow
Copy link
Member Author

Just seeing your comments. Indeed, the app ID + secret gives you access to extra requests. (I'm pretty sure this is how Shields works! 😄) Agreed with @RedSparr0w that it doesn't matter whether we use a token or an app ID + secret. Github lets you choose which permissions a personal access token receives, and this CI account will have no permissions anyway. There's no way we can protect the key, since a contributor can run whatever code they might want in our CI environment.

@paulmelnikow
Copy link
Member Author

Woohoo! I am going to go re-trigger a bunch of builds and see if we can't get some of these PRs to green :)

@paulmelnikow
Copy link
Member Author

#1095 fixed rate limits for the server, but not where the test harness uses Github. Have addressed that in #1105.

@paulmelnikow
Copy link
Member Author

Argh, still seeing failures.

I checked the build settings, and the GITHUB_TOKEN setting had been removed. Maybe done as a security measure by Travis? Have restored it. Will see what happens.

@RedSparr0w
Copy link
Member

Strange, hopefully it sticks this time! 🤞

@RedSparr0w
Copy link
Member

Looks like this is the issue,

Travis CI makes encrypted variables and data available only to pull requests coming from the same repository. These are considered trustworthy, as only members with write access to the repository can send them.

@paulmelnikow
Copy link
Member Author

That's what I thought, though I'm using an environment variable set in the UI, not an encrypted variable.

@RedSparr0w
Copy link
Member

Looks like it also affects the secure variables, based on the first line, but they don't mention it again further on at all so may be wrong.

The most important restriction for pull requests is about secure environment variables and encrypted data.

Only other work around i could think of is setup a .php page or something to handle all the test request as then the api key would be hidden.

@Daniel15
Copy link
Member

Only other work around i could think of is setup a .php page or something to handle all the test request as then the api key would be hidden.

It could be an endpoint on the shields.io service itself. That's something I suggested in my comment from May: #979 (comment)

@paulmelnikow
Copy link
Member Author

My response then:

That is a nice idea for dealing with the metadata, though it wouldn't help with the Github tests.

The Github service makes arbitrary calls to Github, though we need metadata, which is predictable, for every test.

It's better to have super flaky Github service tests than all our tests be super flaky. I agree, let's do this. It fixes 90% of our problems. After we do that, and merge #1111, we'll be in good shape for everything but [Github].

@paulmelnikow
Copy link
Member Author

paulmelnikow commented Sep 29, 2017

Related to #1012. I will ping the OP and get started on a branch that covers both.

Meanwhile, let's get #1111 reviewed so the change to pull metadata from img.shields.io won't conflict.

Added: And #1112, which the new branch depends on.

@paulmelnikow
Copy link
Member Author

Opened #1114.

paulmelnikow added a commit that referenced this issue Oct 2, 2017
This adds badges for Github issues and pull requests. You can display the state, title, username, number of comments, age, time since last update, and state of checks.

Provides an endpoint the Shields CI can use to fetch PR titles for #979 and resolves #1011.
@paulmelnikow
Copy link
Member Author

Opened #1120 with the tooling change while #1114 is waiting to be deployed.

paulmelnikow added a commit that referenced this issue Oct 6, 2017
Use the endpoint on img.shields.io added in #1114 to fetch the PR title.

This sidesteps the authentication requirement, and helps with #979 – except for PRs that need to run the Github service tests, which will need a separate solution.
@paulmelnikow
Copy link
Member Author

This may actually be fixed in #1338.

paulmelnikow added a commit that referenced this issue Dec 6, 2017
I don’t like that our build goes red on master all the time due to flaky service tests. I thought I’d look into other CI services that would make it possible to run the scheduled tests nightly without causing those messages to show up.

CircleCI, Heroku CI, and Codeship were obvious choices. Heroku CI wasn’t free and I didn’t have any experience with Codeship, so I looked into CircleCI. I’ve used their 1.0 system a lot though this was my first time on their 2.0 system. As with earlier versions, they’ve put a lot of work into making the build fast – perhaps more than any other CI system I’ve seen.

I had such good results, my goal shifted from scheduled daily builds (that don’t litter our commit history with red builds) to improving the CI experience as a whole.

This change made a big impact:

- Build logs load much, much faster. In the test I just ran, 22 seconds to < 2 seconds, a 90% improvement.
- Status of each step shows up right in the GitHub UI, which makes it much faster to see exactly what’s failed.
- Builds run about 50-75% faster on account of parallelism.
- GitHub service tests are fixed. This has been a long-standing issue.
- Ability to ssh into a build container to debug failures.

Here’s what I did:

- Created custom Docker images with our dependencies. To be honest, I’m not even sure these are necessary, only to install the greenkeeper-lockfile. We could get dejavu from npm. They make startup very fast.
- Created an npm-install stage which loads all dependencies into node_modules and caches them.
- Created separate stages for our main tests, service tests, and frontend tests, and stages to run the main tests and service tests in Node 6. These run in parallel, up to four at a time.
- Separated service test ID output from the service test results themselves. (I check these often during the PR process, when I confirm that service tests actually ran. Because the production Shields server caches the title, after updating it you can’t tell whether the update is taking effect.)
- Added a personal access token for the shields-ci user. This should actually fix the long-standing issue #979. CircleCI provides an option to “Pass secrets to builds from forked pull requests,” which means unlike Travis, they’ll give us enough rope to shoot ourselves in the foot.
- Schedule a daily build, which runs all the service tests.
@paulmelnikow
Copy link
Member Author

The rate limits are no longer causing problems. 🙌

Tests are still failing occasionally. However this is because of timeouts. This is worth addressing. Feel free to open new issues to discuss, as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugs in badges and the frontend developer-experience Dev tooling, test framework, and CI
Projects
None yet
4 participants