Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating continuous fuzzing by way of OSS-Fuzz #771

Closed
DavidKorczynski opened this issue Jan 18, 2021 · 11 comments
Closed

Integrating continuous fuzzing by way of OSS-Fuzz #771

DavidKorczynski opened this issue Jan 18, 2021 · 11 comments

Comments

@DavidKorczynski
Copy link
Contributor

Hi,

I was thinking that it would be nice to set up continuous fuzzing of jsonschema, by way of OSS-Fuzz. In this PR: google/oss-fuzz#4996 I have done exactly that, namely created the necessary logic from an OSS-Fuzz perspective to integrate jsonschema. This includes developing initial fuzzers as well as integrating into OSS-Fuzz.

Essentially, OSS-Fuzz is a free service run by Google that performs continuous fuzzing of important open source projects. The only expectation of integrating into OSS-Fuzz is that bugs will be fixed. This is not a "hard" requirement in that no one enforces this and the main point is if bugs are not fixed then it is a waste of resources to run the fuzzers, which we would like to avoid.

If you would like to integrate, could I please have an email(s) that will get access to the data produced by OSS-Fuzz, such as bug reports, coverage reports and more stats. Notice the emails affiliated with the project will be public in the OSS-Fuzz repo, as they will be part of a configuration file.

@Julian
Copy link
Member

Julian commented Jan 20, 2021

Hi there. Thanks for the offer / sending the PR.

So -- all of jsonschema's code is pure Python at the minute, so I'd be curious whether OSS-Fuzz could say anything interesting without at least some information about how to generate JSON Schema specification -alike objects. But happy to see what pops out too.

The email address that's in the SECURITY.md file is a decent place to send these.

CC @Zac-HD in case you're interested or have opinions :)

And thanks for raising!

@Julian
Copy link
Member

Julian commented Jan 20, 2021

I didn't know this, though surely @Zac-HD will, but looks like OSS-Fuzz supports generating data via Hypothesis.

Something like that would be a big improvement over random dictionary poking I think. Zac may already be doing that himself as part of hypothesis-jsonschema? But if not, @DavidKorczynski I think that'd be the right kind of integration with OSSFuzz.

@DavidKorczynski
Copy link
Contributor Author

Am happy to set up a property-based approach with Hypothesis if you are happy to integrate with OSS-Fuzz! If I go ahead with the integration I can submit the fuzzers upstream in this repository instead of keeping them on OSS-Fuzz, then we can also get the property-based testing going - does that sound good?

@Julian
Copy link
Member

Julian commented Jan 20, 2021

Sounds good to me!

@Zac-HD
Copy link
Member

Zac-HD commented Jan 21, 2021

I do indeed have opinions! (and wrote the integration docs over the weekend 😉) The real trick here is that Hypothesis supports driving arbitrary tests using a traditional fuzzer, which naturally includes OSS-Fuzz's various backends.

In short, I'd be very surprised if you can discover interesting bugs by parsing strings into JSON and nothing else - though I can imagine this working OK if you had all the JSON tokens ({}[]",0123456789 etc) and schema keywords in a dictionary to randomly splice in; and then started from a decent seed corpus like the upstream test suite plus schemastore.org schemas.

Fortunately though, hypothesis-jsonschema exists specifically to be the better way to do this! I even have some internal test tooling to generate arbitrary schemas: https://github.com/Zac-HD/hypothesis-jsonschema/blob/master/tests/gen_schemas.py which cover basically everything except $ref, I think 😁

The main tricks will be that:

  1. Not all good property-based tests make good fuzz targets; for this you want an integration-test which can in principle exercise the whole codebase. This makes assertions in the code (vs in the test) particularly powerful.
  2. My tooling, both in hypothesis-jsonschema and in the test code above, generally assume that jsonschema mostly works. I have discovered several bugs using them, but it's more likely to come in the form of odd crashes (assert FTW) than clear test failures so a quick code audit with this in mind might be useful. That said, Hypothesis' builtin shrinking tends to make the cause of errrs quite immediately obvious!

Also happy to collaborate and split any integration reward or direct to charity (e.g. the PSF or one of https://www.givewell.org/charities/top-charities)

@Julian
Copy link
Member

Julian commented Jan 21, 2021

@Zac-HD I think I followed that (and very helpful as usual).

I can't say in my brain that I know yet what sorts of fuzzing seem useful here but as you say you've certainly found issues via it before so maybe there are more gaps to fill...

Also happy to collaborate and split any integration reward or direct to charity (e.g. the PSF or one of https://www.givewell.org/charities/top-charities)

This sounds great to me too yeah. Should take it offline maybe to discuss but you know I still have a soft spot for PyPy so throwing some dollars at them to make hypothesis+PyPy support even better is attractive :) but so is PSF.

@DavidKorczynski
Copy link
Contributor Author

My general view on refining the fuzzer to rely on more structural approaches is that it should be verified empirically. The argument is that the coverage-guided aspects of the fuzzing engine will be great at coming up with inputs that satisfy the various input structures of the target application. I think this is particularly true in a case like this where the execution speed is high, the structural complexity of json is relatively low (say in comparison to PDFs or image formats), and OSS-Fuzz will throw significant CPU power on it. The original fuzzer starts hitting into jsonschema (in seconds). Based on these my personal view is to refine only after we get empirical results, i.e. if we get results then that's great and if not then we should refine. Naturally I respect the view of the maintainers - but my personal advice would be to either not refine at first or have both.

The perspective the fuzzer takes (the original one) was simply to follow the pattern described here https://pypi.org/project/jsonschema/ i.e. the comment in the code A sample schema, like what we'd get from json.load().

@Julian
Copy link
Member

Julian commented Jan 21, 2021

Based on these my personal view is to refine only after we get empirical results,

The question though to me is what results we are expecting. For a normal fuzzing process that OSS-Fuzz is running, it seems to me often that is "the software doesn't crash", especially if it's fuzzing code in memory-unsafe languages.

If a fuzzer is to say anything useful though about JSON Schema (and this library jsonschema) I think it needs to know sets of invariants we believe to be true, and then it can go find counterexamples to them if they exist.

If I'm understanding your comment I think you're saying that we should test one invariant "valid JSON doesn't blow up jsonschema", but yeah I don't know that I expect fuzzing to produce much that's interesting there, but who knows...

I think if I follow @Zac-HD's comment:

Not all good property-based tests make good fuzz targets; for this you want an integration-test which can in principle exercise the whole codebase. This makes assertions in the code (vs in the test) particularly powerful.

That that looks more like what I'd expect, namely if the key invariant is "valid pairs of schemas and instances produce successful jsonschema output" and "invalid pairs of schema and instances produce unsuccessful jsonschema output" that there's likely to be more bang-for-the-buck there.

But I'm as I say also willing to go with what the experts say :) so you @DavidKorczynski may be more familiar with OSS-Fuzz and I know @Zac-HD is more familiar with property testing in general so I'd be willing to defer.

@DavidKorczynski
Copy link
Contributor Author

DavidKorczynski commented Jan 21, 2021

Ah right - now I understand. The bugs that I am after are unhandled exceptions. validate promises to throw ValidationError or SchemaError - that's the guarantee I am after, i.e. if we can trigger other types of exceptions assuming we give input acceptable by validate.

@Julian
Copy link
Member

Julian commented Jan 21, 2021

Probably you know this but validate only makes that "guarantee" for some subset of objects, yes? E.g. if you give it the working equivalent of:

class Foo(dict):
    def __getitem__(self, key):
        if key == "12": raise ZeroDivisionError()
        return self.__dict__[key]

you indeed may get ZeroDivisionError (i.e. it does not wrap other exceptions emitted during validation).

But for suitable subsets of objects, ones I assume you'll use as fuzzing input, then yeah.

(And fair enough, maybe we start there.)

@Zac-HD
Copy link
Member

Zac-HD commented Jan 22, 2021

Re: donation of any integration reward

I'd be very happy to direct it to PyPy for use at their discretion... and to include a note suggesting that efficient code coverage would be great for fuzzers 😉

Fuzzing with consume_unicode

I think this could be effective - it's certainly worth including both this and the Hypothesis-based harness (see e.g. Levels of Fuzzing). I just think that including a "dictionary" of tokens and a decent seed corpus would make it considerably more efficient.

More sophisticated tests

I'd love to write a harness which generates a random valid schema, then random known-valid objects, and checks that they validate as expected. I already know that via #686 this isn't always the case! Metamorphic testing also has some super-powerful ideas. For example, given a valid schema and instance, deleting a constraint from the schema should never make the instance invalid... and with time we could think of many more, I'm sure.

But this is a conversation for later 🙂

@Julian Julian closed this as completed Mar 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants