Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous fuzzing #55

Merged
merged 7 commits into from
Sep 4, 2019
Merged

Continuous fuzzing #55

merged 7 commits into from
Sep 4, 2019

Conversation

bookmoons
Copy link
Contributor

Proposing to integrate continuous fuzzing. Fuzzit gives free service for open source projects.

This targets Parse and ParseStrict by input + New by fuzzing the timestamp and entropy.

Setup is like this:

  • In Fuzzit create targets ulid-new ulid-parse ulid-parse-strict.
  • In Fuzzit settings grab an API key. On the repo settings in Travis paste it to envvar FUZZIT_API_KEY.

Thanks for considering.

@peterbourgon @tsenart

@coveralls
Copy link

coveralls commented Aug 22, 2019

Coverage Status

Coverage remained the same at 97.959% when pulling 27fd369 on bookmoons:master into e51a56f on oklog:master.

@bookmoons
Copy link
Contributor Author

It looks like the Travis build failed due to missing the API key. I've run successful tests on my own Travis account. If you decide to go for this, getting the API key in should make that work.

@peterbourgon
Copy link
Member

Interesting. I'm going to try to build this up in a separate PR. Will update with results.

@peterbourgon
Copy link
Member

Oops.

@peterbourgon peterbourgon reopened this Aug 22, 2019
@peterbourgon
Copy link
Member

peterbourgon commented Aug 24, 2019

So I think this package is a great candidate for fuzzing, but I'd like to do a few things before committing any kind of change to master. Some of this will seem really basic if you're already familiar with fuzzing, but please bear with me :)

First, and ignoring the continuous fuzzing stuff for now, I want make sure we all understand exactly the value proposition. It's clear that Fuzz{New,Parse,ParseStrict} exercise the robustness of the various parsers, but are we using the input []byte the right way? Should we be parsing it in different ways? And what do the return codes imply?

Second, what should we do with the "interesting" inputs discovered by fuzzing? For example, I ran it locally, and it found something that I converted into this unit test:

func TestFuzzCorpus(t *testing.T) {
	for _, testcase := range []struct {
		name    string
		ms      []byte
		entropy []byte
		want    error
	}{
		{
			name:    "1",
			ms:      []byte{0x45, 0x6f, 0xef, 0xbf, 0xbd, 0x01, 0x00, 0x00},
			entropy: []byte{0x00, 0x10, 0xbf, 0x80, 0xef, 0xbf, 0xbd, 0x6e, 0x66},
			want:    io.ErrUnexpectedEOF, // need at least 10 bytes of entropy
		},
	} {
		t.Run(testcase.name, func(t *testing.T) {
			var ms uint64
			if err := binary.Read(bytes.NewReader(testcase.ms), binary.LittleEndian, &ms); err != nil {
				t.Fatal(err)
			}
			_, err := ulid.New(ms, bytes.NewReader(testcase.entropy))
			if want, have := testcase.want, err; want != have {
				t.Fatalf("error: want %v, have %v", want, have)
			}
		})
	}
}

Is that the goal? Should we be doing something more? Something different, i.e. keeping the interesting data files in a subdir that we commit and parsing them from that?

Finally, besides the ulid_fuzz.go file, I'd like to find a way to sequester all of the required scaffolding—the updated build environment, fuzz scripts, corpus data, etc.—separately from the existing CI stuff. Is that feasible? Maybe @bookmoons you can provide a link to some other repos where this stuff is integrated that I look to for examples?

@tsenart any opinions?

@bookmoons
Copy link
Contributor Author

bookmoons commented Aug 24, 2019

Thanks for looking at it @peterbourgon. Going to run through answers to all of these questions.


Maybe @bookmoons you can provide a link to some other repos where this stuff is integrated that I look to for examples?

coredns recently got Fuzzit set up. So did systemd.


It's clear that Fuzz{New,Parse,ParseStrict} exercise the robustness of the various parsers, but are we using the input []byte the right way? Should we be parsing it in different ways? And what do the return codes imply?

Fuzzit is using go-fuzz. It's a genetic fuzzer. So it instruments your binary, detects coverage changes with new inputs, and uses that to decide what to mutate next and what to keep the same. It uses this kind of natural selection model to try to evolve knowledge about your program and automatically walk all the branches. So the idea is you just let it watch what happens with each input and build up that information. It will try to hit all possible behavior.

Return 0 means there's no special meaning to this case so use your normal coverage logic. Return 1 means increase priority, it wants that signal if something parses successfully. If you look at the go-fuzz readme these values are near the top.

Data is a random input generated by go-fuzz, note that in most cases it is invalid. The function must return 1 if the fuzzer should increase priority of the given input during subsequent fuzzing (for example, the input is lexically correct and was parsed successfully); -1 if the input must not be added to corpus even if gives new coverage; and 0 otherwise; other values are reserved for future use.


Second, what should we do with the "interesting" inputs discovered by fuzzing? For example, I ran it locally, and it found something that I converted into this unit test

The Fuzzit CI setup has 2 stages. 1 is the main fuzzing that tries to find new crashes. The other is a fuzz regression test that just runs past crashes. Fuzzit keeps a log of these past crashes and the regression test is a quick little run in CI of that list of past crashes.

It also tracks a corpus, all the interesting cases that led to new branches in the program. So future fuzzing runs build on the knowledge developed by past runs.

It'll send you an email if there's a new crash. Basically I think you take it as a bug report, get the bug fixed, and then it won't crash again in fuzz regression tests.


Finally, besides the ulid_fuzz.go file, I'd like to find a way to sequester all of the required scaffolding—the updated build environment, fuzz scripts, corpus data, etc.—separately from the existing CI stuff. Is that feasible?

I'm not sure if there's a way to do this. I think Fuzzit only runs through Travis CI. You could set up go-fuzz to run manually, but then you're maybe not getting it on every commit.

All the corpus data should be tracked by Fuzzit, so I don't think you'll end up with that in the repo. But that CI config seems to be the channel they're using.

@bookmoons
Copy link
Contributor Author

bookmoons commented Aug 25, 2019

There's a successful run under my Travis account if you'd like to see what it looks like.

Fuzz regression tests pull data from Fuzzit and run in Travis. It's a short little test. The main Fuzz test submits to Fuzzit to do a longer run and ends the build as soon as it's in the queue.

https://travis-ci.org/bookmoons/ulid/builds/575119636

@bookmoons
Copy link
Contributor Author

This fasthttp package also just got fuzzit setup.

valyala/fasthttp#634

@tsenart
Copy link
Contributor

tsenart commented Sep 3, 2019

@bookmoons
Copy link
Contributor Author

I think so, will look into that.

@bookmoons
Copy link
Contributor Author

Added a new target that tests construction by passing fuzz through a MonotonicEntropy. I've tried to exercise the monotonicity logic by constructing 10 ULIDs each iteration.

@bookmoons
Copy link
Contributor Author

Fuzzit have kindly made a ulid org for the project. I've preconfigured all the targets and added @tsenart and @peterbourgon to the org. A test build succeeded and I see fuzzing jobs in the queue:
https://travis-ci.org/bookmoons/ulid/builds/580569854

Final setup is to get the API key into CI:

  • Grab an API key from Fuzzit.
  • Paste to FUZZIT_API_KEY in Travis.

Copy link
Contributor

@tsenart tsenart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

@tsenart tsenart merged commit a41d229 into oklog:master Sep 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants