A ground-truth fuzzing benchmark suite based on real programs with real bugs.
In addition to being a collection of real targets with ground-truth bugs, Magma also provides an infrastructure of build scripts and toolsets that facilitate the usage of the benchmark and make it usable out-of-the-box. To achieve that, Magma relies on the Docker build system and provides a Dockerfile that specifies the build process.
To enable the quick launching of experiments and identical campaigns, Magma relies on Docker images that set up the fuzzing environment and allow running multiple identical and independent campaigns concurrently.
A fuzzing environment is mainly identified by the fuzzer and the target,
alongside the program and the args used when launching the program. The
other parameters listed in captainrc
also factor into the fuzzing environment,
but variations of them cannot co-exist in the same fuzzer/target
configuration.
As such, every built image is identified by the tag magma/fuzzer/target
. Each
images contains all the support files needed for compiling and running the
fuzzer and instrumenting the target (if needed) using the fuzzer’s own compiler.
Launching a campaign then simply becomes a matter of specifying which program to launch when running a container based on that image, as well as what command-line args to use. Targets (libraries) can have multiple programs that execute different parts of the code base, and allowing the flexibility to fuzz different programs is key to Magma’s operation.
Magma provides multiple programs as execution drivers for fuzzing. However, in a similar manner to seed files, a fuzzer could the provided execution drivers as a “hint” for how to fuzz the target, possibly using generated execution drivers (e.g., FuzzGen).
The Magma Dockerfile defines the build process for the Docker images, and its instructions are ordered in a manner that increases cache re-usability across builds.
The Dockerfile accepts the following build arguments:
Build Arg. | Description |
---|---|
fuzzer_name |
the name of the fuzzer, as listed in magma/fuzzers |
target_name |
the name of the target, as listed in magma/targets |
USER_ID , GROUP_ID |
the user and group ID of the magma user inside the image (should match external uid/gid when using shared volumes) |
canaries |
if set, specifies that canaries must be enabled |
fixes |
if set, specifies that fixes must be enabled (bugs disabled) |
isan |
if set, specifies that the IdealSanitization mode is used |
harden |
if set, specifies that instrumentation storage must be mprotect() ‘d |
Beyond that, the Dockerfile specifies the order of building the fuzzing environment:
{magma, fuzzer, target}/preinstall.sh
: installs the dependencies (runs as
root
)magma/prebuild.sh
: builds the Magma monitoring utility with an
un-instrumented compiler (gcc
/g++
)fuzzer/{fetch, build}.sh
: retrieves the fuzzer code and builds ittarget/fetch.sh
: retrieves the target codemagma/apply_patches.sh
: instruments the target with in-line canariesfuzzer/instrument.sh
: builds the target as defined by the fuzzer’s build
configuration (e.g., custom compiler, multiple binaries, etc…)Finally, the entry point is specified to be Magma’s magma/run.sh
script which
launches the monitoring utility in parallel with the fuzzer.
The Dockerfile creates the following directories and assigns their values to environment variables:
Env. Var. | Value | Description |
---|---|---|
$MAGMA_R |
/magma/ |
The directory which stores the needed configuration files (magma , fuzzers/my_fuzzer , targets/my_target ). |
$SHARED |
/magma_shared |
The path where the shared volume with the host is mounted. It contains fuzzer findings, logs, and monitor outputs. |
$OUT |
/magma_out |
The directory where compiled programs are stored. Every target build script must copy its output binaries here. |
Additionally, the Dockerfile also exports the following environment variables:
Env. Var. | Value | Description |
---|---|---|
$MAGMA |
/magma/magma |
The path to the directory with Magma’s support files inside the container. |
$FUZZER |
/magma/my_fuzzer |
The path to the fuzzer’s configuration directory inside the container. |
$TARGET |
/magma/my_target |
The path to the target’s configuration directory inside the container. |
${C,CXX,LD}FLAGS , $LIBS |
Check Dockerfile | Initial compiler and linker flags needed to support Magma’s instrumentation. |
Finally, the following environment variables must be assigned when running a Magma Docker container:
Env. Var. | Description |
---|---|
PROGRAM |
the name of the compiled program executable |
ARGS |
an AFL-style string of command-line arguments |
POLL |
the poll duration, in seconds, as defined in captainrc |
TIMEOUT |
the campaign duration, as defined in captainrc |
AFFINITY |
a comma-separated list of logical CPU cores which are allocated for this campaign |
Those environment variables can be provided to the container through the
command-line argument -e NAME=VALUE
when using docker-run
or
docker-create
.
Inside the container, the $FUZZER
directory stores the scripts needed to build
the fuzzer and compile the targets.
File Name | Description |
---|---|
preinstall.sh |
Installs the fuzzer’s dependencies. Must be run as root . |
fetch.sh |
Retrieves the source code for building the fuzzer. Typically stores it in $FUZZER/repo . |
build.sh |
Compiles the fuzzer binaries and support files (e.g., the afl_driver.cpp used to wrap libFuzzer stubs). |
instrument.sh |
Compiles the target, performing any required pre-processing or adding fuzzer-specific instrumentation. |
runonce.sh |
Runs the $OUT/$PROGRAM executable against the first argument ($1 ), simulating the conditions of the fuzzer’s runtime environment (e.g., memory limit, timeout, etc…). |
run.sh |
Launches the fuzzing campaign, reading seed files from $TARGET/corpus/$PROGRAM/ and writing results to $SHARED/findings . |
findings.sh |
Outputs a newline-separated list of pathnames to the fuzzer-generated test-cases. |
Inside the container, the $TARGET
directory stores the scripts needed to build
the target.
File Name | Description |
---|---|
preinstall.sh |
Installs the target’s dependencies. Must be run as root . |
fetch.sh |
Retrieves the source code for building the target. It must store it in $FUZZER/repo . |
build.sh |
Compiles the target binaries, making use of ${C,CXX,LD}FLAGS and $LIBS , and stores the program binaries in $OUT . |
In addition, the configrc
file in this directory specifies the valid
$PROGRAM
and $ARGS
values. A sample configrc
is listed here:
PROGRAMS=(pdf_fuzzer pdfimages pdftoppm)
pdfimages_ARGS="@@ /tmp/out"
pdftoppm_ARGS="-mono -cropbox @@"
This directory must also include the following subdirectories:
corpus/$PROGRAM/
: for every $PROGRAM
, a corpus of seed files must be
available at runtime (either provided as part of the configuration, or
generated in the target build.sh
script at compile-time).patches/setup/
: contains a set of patch files that prepare the target to be
compiled with Magma.patches/bugs/
: contains a set of patch files that forward-port bugs and
instrument the program with ground-truth canaries. Other subdirectories, like
patches/graveyard/
, can be added, but will only be used for book-keeping.Inside the container, the $MAGMA
directory stores the scripts needed to build
Magma and prepare its targets.
Just under the $MAGMA
directory, the following scripts are stored:
File Name | Description |
---|---|
preinstall.sh |
Installs Magma’s dependencies. Must be run as root . |
prebuild.sh |
Builds the Magma monitor using the default un-instrumented compiler (vanilla GCC/Clang). |
build.sh |
Compiles the Magma runtime library, $OUT/magma.o , needed for compiling and running the instrumentation. |
apply_patches.sh |
Applies the patch files stored in $TARGET/patches/bugs/ to the code base in $TARGET/repo/ . |
runonce.sh |
Runs the $OUT/$PROGRAM executable against the first argument ($1 ). Returns zero if no bugs or crashes are encountered, and non-zero otherwise. |
run.sh |
Prunes the $TARGET/corpus/$PROGRAM/ directory for faulty seeds, starts the monitor, and launches the campaign. |
The $MAGMA/src
directory contains the support files needed to compile the
Magma runtime, mainly support for canaries.
The canary.c
file defines the three main functions used in Magma:
Function Name | Description |
---|---|
magma_init |
Initializes the mmap() ‘d region used as storage for the global instrumentation statistics. |
magma_protect |
Toggles the storage page’s permissions between read-only and read-writable to protect the region from rogue overwrites. |
magma_log |
Logs the call to the canary. After the first occurrence of condition == true , magma_faulty is set, and no future canaries are logged. |
magma_protect()
is only called when MAGMA_HARDEN_CANARIES
is defined at
compile-time, by setting the HARDEN
parameter in captainrc
.
The MODE=1
parameter in captainrc
defines the compile-time macro
MAGMA_ENABLE_CANARIES
, which inserts calls to magma_log()
at bug locations.
However, some custom fuzzer configuration may require instrumentation to be
disabled (e.g., a fuzzer that compiles two versions of the target, one for
fuzzing and another for analysis). In that case, the fuzzer can define the
MAGMA_DISABLE_CANARIES
macro, which maintains the calls to magma_log()
, but
makes it behave like a NOP
.
When calling magma_log()
, the condition
must be constructed using the
MAGMA_AND
and MAGMA_OR
macros (which are aliases for the magma_and
and
magma_or
functions). These functions are written in such a way to minimize
information leakage through some fuzzer-specific coverage instrumentation. The
arch/
subdirectory contains the different implementations of these functions
depending on the build architecture. Custom implementations could be added as a
separate architecture, and then enabled by #include
-ing the definition file in
canary.h
.
The instrumentation implements a simple one-slot single-producer single-consumer
storage component that allows the monitor
to constantly pull the latest
results from the storage region without risk of data races, and without the
overhead and complexity of synchronization.
If MAGMA_STORAGE
is a defined environment variable at runtime, its value is
used as the path to file to be used as the storage object. Otherwise,
$SHARED/canaries.raw
is used, as defined in $MAGMA/build.sh
at compile-time.
The storage component is defined in storage.c
but is orthogonal to the core
operation of Magma.
The $MAGMA/src
directory also contains the code for the monitor
utility,
which is responsible for dumping runtime instrumentation statistics in a
parse-able format.
The utility is invoked as such:
monitor [--dump {raw|row|human}] [--fetch {file FILE|watch COMMAND}]
The --dump
argument can take the following values:
Value | Description |
---|---|
raw |
The raw memory bytes of the storage region. This is the default mode. |
row |
A CSV-style representation, where the first row defines header names, and the second row records counter values. BUG_R and BUG_T represent the counters for BUG being reached and triggered, respectively. |
human |
Each bug is printed on a separate line with the format: BUG reached REACHED_COUNT triggered TRIGGERED_COUNT |
The --fetch
argument can take the following values:
Value | Description |
---|---|
file |
Reads the FILE as a storage object. This is the default mode. |
watch |
Exports the path to a temporary file as a MAGMA_STORAGE environment variable and runs the COMMAND to collect instrumentation statistics for a single execution. |
The set of scripts in the captain
toolset is provided to allow automated
interaction with the Docker build system. Each script defines a set of
pre-requirements, which are essentially the launch arguments when running the
script, in the first few lines of comments.
File Name | Description |
---|---|
build.sh |
Builds the Docker image based for the $FUZZER/$TARGET configuration. |
start.sh |
Launches a single fuzzing campaign (useful for testing). |
extract.sh |
A post-processing script to extract PoC files from fuzzer-generated test-cases. |
run.sh |
Launches and manages multiple concurrent campaigns, building any missing images in the process. Requires captainrc . |
post_extract.sh |
Launches the extraction script, in case extraction was not done with run.sh (i.e., POC_EXTRACT was not set). Requires captainrc . |
common.sh |
Initializes a common set of functions and variables. Not to be executed out of context. |
This toolset is a work-in-progress and its documentation is not entirely stable.
Generates the experiment summary and outputs it as a JSON file.
Requirements:
pandas >= 1.1.0
Usage:
usage: exp2json.py [-h] [--workers WORKERS] workdir outfile
positional arguments:
workdir The path to the Captain tool output workdir.
outfile The file to which the output will be written, or - for
stdout.
optional arguments:
-h, --help show this help message and exit
--workers WORKERS The number of concurrent processes to launch.
The JSON object has the following format:
{
"FUZZER": {
"TARGET": {
"PROGRAM": {
"RUN": {
"reached": {
"BUGID": int(TIME_TO_REACH_BUG)
},
"triggered": {
"BUGID": int(TIME_TO_TRIGGER_BUG)
}
},
"RUN": ...
},
"PROGRAM": ...
},
"TARGET": ...
},
"FUZZER": ...
}
TIME_TO_REACH_BUG
and TIME_TO_TRIGGER_BUG
are represented in seconds, with
a resolution of POLL
seconds, since the beginning of the experiment.
This toolset is a work-in-progress and its documentation will be released when the scripts are ready to be publicized.