-
-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrating continuous fuzzing by way of OSS-Fuzz #771
Comments
Hi there. Thanks for the offer / sending the PR. So -- all of jsonschema's code is pure Python at the minute, so I'd be curious whether OSS-Fuzz could say anything interesting without at least some information about how to generate JSON Schema specification -alike objects. But happy to see what pops out too. The email address that's in the SECURITY.md file is a decent place to send these. CC @Zac-HD in case you're interested or have opinions :) And thanks for raising! |
I didn't know this, though surely @Zac-HD will, but looks like OSS-Fuzz supports generating data via Hypothesis. Something like that would be a big improvement over random dictionary poking I think. Zac may already be doing that himself as part of hypothesis-jsonschema? But if not, @DavidKorczynski I think that'd be the right kind of integration with OSSFuzz. |
Am happy to set up a property-based approach with Hypothesis if you are happy to integrate with OSS-Fuzz! If I go ahead with the integration I can submit the fuzzers upstream in this repository instead of keeping them on OSS-Fuzz, then we can also get the property-based testing going - does that sound good? |
Sounds good to me! |
I do indeed have opinions! (and wrote the integration docs over the weekend 😉) The real trick here is that Hypothesis supports driving arbitrary tests using a traditional fuzzer, which naturally includes OSS-Fuzz's various backends. In short, I'd be very surprised if you can discover interesting bugs by parsing strings into JSON and nothing else - though I can imagine this working OK if you had all the JSON tokens ( Fortunately though, The main tricks will be that:
Also happy to collaborate and split any integration reward or direct to charity (e.g. the PSF or one of https://www.givewell.org/charities/top-charities) |
@Zac-HD I think I followed that (and very helpful as usual). I can't say in my brain that I know yet what sorts of fuzzing seem useful here but as you say you've certainly found issues via it before so maybe there are more gaps to fill...
This sounds great to me too yeah. Should take it offline maybe to discuss but you know I still have a soft spot for PyPy so throwing some dollars at them to make hypothesis+PyPy support even better is attractive :) but so is PSF. |
My general view on refining the fuzzer to rely on more structural approaches is that it should be verified empirically. The argument is that the coverage-guided aspects of the fuzzing engine will be great at coming up with inputs that satisfy the various input structures of the target application. I think this is particularly true in a case like this where the execution speed is high, the structural complexity of json is relatively low (say in comparison to PDFs or image formats), and OSS-Fuzz will throw significant CPU power on it. The original fuzzer starts hitting into jsonschema (in seconds). Based on these my personal view is to refine only after we get empirical results, i.e. if we get results then that's great and if not then we should refine. Naturally I respect the view of the maintainers - but my personal advice would be to either not refine at first or have both. The perspective the fuzzer takes (the original one) was simply to follow the pattern described here https://pypi.org/project/jsonschema/ i.e. the comment in the code |
The question though to me is what results we are expecting. For a normal fuzzing process that OSS-Fuzz is running, it seems to me often that is "the software doesn't crash", especially if it's fuzzing code in memory-unsafe languages. If a fuzzer is to say anything useful though about JSON Schema (and this library If I'm understanding your comment I think you're saying that we should test one invariant "valid JSON doesn't blow up I think if I follow @Zac-HD's comment:
That that looks more like what I'd expect, namely if the key invariant is "valid pairs of schemas and instances produce successful jsonschema output" and "invalid pairs of schema and instances produce unsuccessful jsonschema output" that there's likely to be more bang-for-the-buck there. But I'm as I say also willing to go with what the experts say :) so you @DavidKorczynski may be more familiar with OSS-Fuzz and I know @Zac-HD is more familiar with property testing in general so I'd be willing to defer. |
Ah right - now I understand. The bugs that I am after are unhandled exceptions. |
Probably you know this but class Foo(dict):
def __getitem__(self, key):
if key == "12": raise ZeroDivisionError()
return self.__dict__[key] you indeed may get But for suitable subsets of objects, ones I assume you'll use as fuzzing input, then yeah. (And fair enough, maybe we start there.) |
Re: donation of any integration rewardI'd be very happy to direct it to PyPy for use at their discretion... and to include a note suggesting that efficient code coverage would be great for fuzzers 😉 Fuzzing with
|
Hi,
I was thinking that it would be nice to set up continuous fuzzing of jsonschema, by way of OSS-Fuzz. In this PR: google/oss-fuzz#4996 I have done exactly that, namely created the necessary logic from an OSS-Fuzz perspective to integrate jsonschema. This includes developing initial fuzzers as well as integrating into OSS-Fuzz.
Essentially, OSS-Fuzz is a free service run by Google that performs continuous fuzzing of important open source projects. The only expectation of integrating into OSS-Fuzz is that bugs will be fixed. This is not a "hard" requirement in that no one enforces this and the main point is if bugs are not fixed then it is a waste of resources to run the fuzzers, which we would like to avoid.
If you would like to integrate, could I please have an email(s) that will get access to the data produced by OSS-Fuzz, such as bug reports, coverage reports and more stats. Notice the emails affiliated with the project will be public in the OSS-Fuzz repo, as they will be part of a configuration file.
The text was updated successfully, but these errors were encountered: