Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service testing strategy #927

Closed
paulmelnikow opened this issue Mar 31, 2017 · 7 comments
Closed

Service testing strategy #927

paulmelnikow opened this issue Mar 31, 2017 · 7 comments
Labels
developer-experience Dev tooling, test framework, and CI operations Hosting, monitoring, and reliability for the production badge servers

Comments

@paulmelnikow
Copy link
Member

I'd like to suggest a goal for the project: automated tests of the vendor badges.

As Thaddée mentioned in #411:

Testing is of course imperfect (as in, manual). The CLI and backend engine have tests, but the server does not. What would be helpful however isn't tests here; it would be automated error reporting. We know that a list of badges should work. Having a program that goes through them and reports what the vendor server sent back if the badge failed would be great. However, it should absolutely not complexify the existing logic of the server.

I agree that tests like these would be valuable for end users. They would detect API problems when they occur, rather than hoping/waiting someone will eventually show up and report them.

They'd be relatively simple to write. They would help badge developers , and push more of the testing responsibility from the reviewer to the contributor.

In PR reviews, I spend a significant amount of time manually testing various combinations of inputs. After requesting changes I really should start all over again. Tests would make this faster and much more reliable.

I'm skilled enough to spot many common problems in this type of code, however unlike Thaddée I'm not an expert in this codebase. I don't have the eye to recognize all the problematic patterns that he would. I can't tell the code is obviously right, or obviously wrong code, just by looking at it.

Given how easy it is to write vendor badges, and what's sure to be endless growth of available repositories and tools, PR review will remain a bottleneck. With automatic tests, testing responsibility can be scaled out to a wide net of contributors. Writing tests for existing vendors is another easy way for contributors to put love into the project, even those who are content with the existing feature set.

Some downsides:

  1. The tests would be slow, due to slow external servers. We'd want to easily run a subset.
  2. The tests would be somewhat flaky due to flaky external servers. When running the tests, we'd want to include some amount of retrying.
  3. Since assertions depend on external servers and external data, they would need to be somewhat flexible. A valid regex for semver, e.g., or for a build which may be passing, failing, or errored. We risk making the assertions so complex that they allow false negatives: assertions which pass when they should fail.
  4. They would cover something pretty fundamental that can’t be tested any other way: changes to a vendor's API.

There are a few important attributes of tests like these:

  • Easy to read and write
  • Simple assertions – no false negatives
  • Retriable – so it’s possible to run automatically
  • Non-invasive – they should not add complexity to the server

We could call these integration tests.

There’s another, more ambitious kind of test I'd also like to see: tests which record vendor responses and play them back later. They would inject mock vendor responses to handle cases like malformed requests, which can’t be tested any other way. Fundamentally, they could test that the server code is doing exactly what it is supposed to, regardless of whether a package still exists or a vendor server is temporarily down.

Important attributes:

  • Easy to debug – simple flow control, minimal magic, good error messages
  • Blazingly fast – whole suite runs in a few seconds at most

We could run the whole suite of tests on every PR. That ensures non-breakage if other features due to a change, speeds up reviews, and lets us merge with confidence.

Running the whole suite is especially useful for refactoring: developing other, potentially significant changes to the implementation which should not affect most of the server’s behavior.

The tool I've been experimenting with is Nock Back. Working tests are reliable and it's pretty good overall. However it doesn't have the most active maintenance, I've run into some bugs (nock/nock#870), and I've found failing tests to be tricky to debug (nock/nock#781, nock/nock#869).

We could call these recorded tests.

Checking code coverage is important not just for improving the code quality, but also the quality of the service. In my experience writing just a few of these, I found behaviors which should be improved, code paths that do not make any callback, and other errors in error-handling code.

However, recording responses is a really big bite to take, especially with some uncertainty about the current state of the tooling. So I'd suggest we leave recording for another day.

For now, the goals should be the following:

  • Badge developers can easily test their work
  • They can commit those tests so reviewers can see them
  • Reviewers can run a subset of the tests (or CI can infer the affected services and do this automatically)
  • Developers can TDD bug fixes
  • Full suite runs periodically and/or on pushes to master, and we find out when badges have broken

This would include:

  1. Tests using real vendor servers (true integration tests)
  2. Tests of error handling using mock responses (functional tests)
  3. Optional grouping by vendor
  4. Retrying transient failures
  5. Maybe down the line…
    a. Automatically inferring which vendors to run
    b. Reporting code coverage

Thoughts? Concerns?

And importantly, would anyone else like to join in the fun? 😀

@paulmelnikow
Copy link
Member Author

I worked up a third POC:

The vendor spec looks like this:

'use strict';

const Joi = require('joi');
const ServiceTester = require('../service-tester');

const t = new ServiceTester('CRAN', '/cran');
module.exports = t;

t.create('version')
  .get('/v/devtools.json')
  .expectJSONTypes(Joi.object().keys({
    name: Joi.equal('cran'),
    value: Joi.string().regex(/^v\d+\.\d+\.\d+$/)
  }));

t.create('specified license')
  .get('/l/devtools.json')
  .expectJSON({ name: 'license', value: 'GPL (>= 2)' });

t.create('unknown package')
  .get('/l/some-bogus-package.json')
  .expectJSON({ name: 'cran', value: 'not found' });

t.create('unknown info')
  .get('/z/devtools.json')
  .expectStatus(404)
  .expectJSON({ name: 'badge', value: 'not found' });

t.create('malformed response')
  .get('/v/foobar.json')
  .intercept(nock => nock('http://crandb.r-pkg.org')
    .get('/foobar')
    .reply(200))
  .expectJSON({ name: 'cran', value: 'invalid' });

t.create('connection error')
  .get('/v/foobar.json')
  .intercept(nock => nock('http://crandb.r-pkg.org')
    .get('/foobar')
    .replyWithError({ code: 'ECONNRESET' }))
  .expectJSON({ name: 'cran', value: 'inaccessible' });

t.create('unspecified license')
  .get('/l/foobar.json')
  // JSON without License.
  .intercept(nock => nock('http://crandb.r-pkg.org')
    .get('/foobar')
    .reply(200, {}))
  .expectJSON({ name: 'license', value: 'unknown' });

This provides close to 100% code coverage for the CRAN/METACRAN badges. Overall, I'm happy with it. I find the Joi syntax too verbose. This is the syntax from my sketch, which I prefer:

t.create('version')
  .get('/v/devtools.json')
  .expectJSON({ name: 'cran', value: v => v.should.match(/^v\d+\.\d+\.\d+$/) });

The implementation of that was more complicated than I expected so I'm going to defer it. This is good enough to start writing tests. We can always improve it later.

@paulmelnikow paulmelnikow mentioned this issue Apr 8, 2017
10 tasks
@espadrine
Copy link
Member

Somewhat tangentially, I always thought it would be welcome to have a (non-test) script that would simply record a known-good output of all .json badges into a JSON file, along with all network outputs from using try.html. Then we could also have a (test) script that would be able to check that all the .json badges remain the same, ensuring that PRs don't cause regressions.

@paulmelnikow
Copy link
Member Author

That's a nice idea. It would help prevent regression in the code, and could run fast, on every commit. It's a nice complement to tests which hit the vendors' servers, designed to catch issues like #939 which are with the services.

A problem we'd need to solve would be how to maintain the battery of recorded outputs when a dev wants to make additions or changes. When a dev adds new functionality, they need to record a few new requests, but keep the rest intact. When changing code to accommodate API changes upstream, they need to replace a few requests, but keep the rest intact.

@paulmelnikow paulmelnikow added developer-experience Dev tooling, test framework, and CI operations Hosting, monitoring, and reliability for the production badge servers labels Apr 17, 2017
@paulmelnikow
Copy link
Member Author

Re: #1286 (comment) ^^ @chris48s

I do think it's important that we hit the real services during PRs, and also nightly, so we can detect breaking changes in the upstream APIs. The current setup is not perfect. I don't like that the build breaks on master when there are problems upstream. And we still don't have a good solution for our GitHub service tests. (#979)

@chris48s
Copy link
Member

Unsurprisingly, it seems like you've already thought about this quite thoroughly :)

Having read this thread, I see that whereas you would usually want to mock out interactions with an external service the nature of this project means you do want some tests in your suite that will start failing if an external service goes away or makes a backwards-incompatible API change.

I guess the situation I am thinking of is not so much api breaks (which are important to detect), but if you think about this test for example:

https://github.com/badges/shields/blob/master/service-tests/npm.js#L34-L36

This test could start failing because Express changes its licence, rather than because there is a regression in the code under test or NPM has changed its API. Obviously in writing tests we can try to make good choices about the packages/repos we test against, but it is always an issue..

I think given where the project/test suite is now, prioritising the integration tests is most important but adding what you've described as 'recorded tests' has value too. Do you have any services where you also have 'recorded tests' in place?

@paulmelnikow paulmelnikow changed the title Automated tests of the vendor badges Service testing strategy Dec 6, 2017
@paulmelnikow
Copy link
Member Author

We don't have any recorded tests right now.

Think this issue is a good record, though not sure there's anything actionable right now. Let's open a new issue when there is!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
developer-experience Dev tooling, test framework, and CI operations Hosting, monitoring, and reliability for the production badge servers
Projects
None yet
Development

No branches or pull requests

3 participants