/* ---- Google Analytics Code Below */

Friday, March 13, 2020

The Value of the FuzzBench

A means to support debugging code and thus searching for code vulnerabilities.  We used it the early days.  Google uses it and recently published some public capabilities.  See a past article here on this: Fuzzing for Testing Coding Security Vulnerabilities, from the ACM.

Below was brought to my attention here from  Steve Gibson's Security Now. #758   Podcast

The Fuzzy Bench
Posted last week to the Google Open Source Blog:

“FuzzBench: Fuzzer Benchmarking as a Service”
Monday, March 2, 2020

WikiPedia defines fuzzing this way:
     “Fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks. Typically, fuzzers are used to test programs that take structured inputs. This structure is specified, e.g., in a file format or protocol and distinguishes valid from invalid input. An effective fuzzer generates semi-valid inputs that are "valid enough" in that they are not directly rejected by the parser, but do create unexpected behaviors deeper in the program and are "invalid enough" to expose corner cases that have not been properly dealt with.” .... 

So, fuzzing is an interesting way to uncover bugs in programs. Rather than having a single highly-skilled hacker with knowledge of all past vulnerabilities carefully and methodically trying this or that in attempt to exploit something that might be possible, fuzzing is like the thousand monkeys all pounding on typewriters to see whether any of them might, by pure happenstance, hit upon something novel and useful.

 In another of Google’s “working to make the world a better place because we have plenty of money, so why not?”, last week they posted the explanation of their latest initiative: We are excited to launch FuzzBench, a fully automated, open source, free service for evaluating fuzzers. The goal of FuzzBench is to make it painless to rigorously evaluate fuzzing research and make fuzzing research easier for the community to adopt.   .... 

Fuzzing is an important bug finding technique. At Google, we’ve found tens of thousands of bugs with fuzzers like libFuzzer and AFL. There are numerous research papers that either improve upon these tools (e.g. MOpt-AFL, AFLFast, etc) or introduce new techniques (e.g. Driller, QSYM, etc) for bug finding. However, it is hard to know how well these new tools and techniques generalize on a large set of real world programs. Though research normally includes evaluations, these often have shortcomings—they don't use a large and diverse set of real world benchmarks, use few trials, use short trials, or lack statistical tests to illustrate if findings are significant. This is understandable since full scale experiments can be prohibitively expensive for researchers. For example, a 24-hour, 10-trial, 10 fuzzer, 20 benchmark experiment would require 2,000 CPUs to complete in a day.

To help solve these issues the OSS-Fuzz team is launching FuzzBench, a fully automated, open source, free service. FuzzBench provides a framework for painlessly evaluating fuzzers in a reproducible way. To use FuzzBench, researchers can simply integrate a fuzzer and FuzzBench will run an experiment for 24 hours with many trials and real world benchmarks. Based on data Security Now! #757 14

from this experiment, FuzzBench will produce a report comparing the performance of the fuzzer to others and give insights into the strengths and weaknesses of each fuzzer. This should allow researchers to focus more of their time on perfecting techniques and less time setting up evaluations and dealing with existing fuzzers.

Integrating a fuzzer with FuzzBench is simple, as most integrations are less than 50 lines of code. Once a fuzzer is integrated, it can fuzz almost all 250+ OSS-Fuzz projects out of the box. We have already integrated ten fuzzers, including AFL, LibFuzzer, Honggfuzz, and several academic projects such as QSYM and Eclipser.


Reports include statistical tests to give an idea how likely it is that performance differences between fuzzers are simply due to chance, as well as the raw data so researchers can do their own analysis. Performance is determined by the amount of covered program edges, though we plan on adding crashes as a performance metric. You can view a sample report here:

(Check out the sample... Fuzzers were run against all sorts of well-known code bases.) curl, FreeType, jsoncpp, libjpeg, libpcap, libpng, libxml2, openssl, openthread, php, sqlite3, vorbis, woff2)

How to Participate
Our goal is to develop FuzzBench with community contributions and input so that it becomes the gold standard for fuzzer evaluation. We invite members of the fuzzing research community to contribute their fuzzers and techniques, even while they are in development. Better evaluations will lead to more adoption and greater impact for fuzzing research.

We also encourage contributions of better ideas and techniques for evaluating fuzzers. Though we have made some progress on this problem, we have not solved it and we need the community’s help in developing these best practices.

So, yeah... another big tip of the hat to Google for so willingly and usefully giving back to the community and, really, to the world. Though everyone knows that fuzzing is not the end-all solution for hardening our software and making it bulletproof, it inarguably provides another avenue into providing real world code security.

Efficient Fuzzing is not easy. It’s the province of University academics and there’s still a lot of progress to be made. So having an open platform for testing and comparing fuzzing innovations can only help academics to test and hone their new ideas. .... '

No comments: