Frequently Asked Questions

We’ve compiled a list of the most frequently-asked questions:

What's up with the name?

Magma refers to the layers of molten rock burried beneath the surface of the earth. As opposed to lava which is found on the surface, magma is much deeper within the Earth's crust, and the naming is meant to imply that bugs injected in Magma are deeper and more complex, in comparison to the other prominent ground-truth fuzzing benchmark, LAVA-M.
What is the difference between a bug and a crash? Don't all bugs lead to crashes?

A bug is a diversion from intended program behavior. A bug may not always manifest as a crash. With Magma, we try to capture that diversion and record it as "triggered". This does not necessitate that a crash always follows a bug.
If a bug does not result in a crash, should we really consider it a bug?

A bug is just a diversion from intended behavior. We infer intended behavior from the patch submitted by the developers. When devs write a patch, they add the correct variable transformations and control flow statements to make the program execution match intended behavior. With Magma, we reverse that fix, and we add a check on those transformations and on the control flow to see if the diversion would occur. Magma does not discriminate between bugs based on their type. It includes bugs that are not memory-safety bugs. And even those that relate to memory safety may not always manifest as memory safety violations. It all depends on how the devs fixed the bug.
- My fuzzer has triggered a memory bug, but it could not detect it with ASan. Is this a false-positive?
- Do all bugs in Magma result in a crash? If not, then why?
This may be due to an "under-fit" trigger condition. The fixes written by the library developers are often more restricting than the cause for a crash. Magma extracts the bug condition from those fixes, instead of using the actual "crash" condition.

It is possible that the fix provided by the developers is underfitting to the original bug. In that sense, the trigger condition could cover cases that are not crashing, but that still violate the criteria set by the developers to fix the bug. Then, a bug can be triggered but not cause a crash.

Another reason could be that the code base had undergone changes that prevent the crash from happening at a later point after the trigger condition is evaluated. In that case, reproducing the bug may not yield the same results (crashing), but would still require satisfying complex conditions.
What does the term "forward-porting" mean, and how is it different from "back-porting"?

In contrast to the process of back-porting fixes from newer software versions to previous releases, we coin the term forward-porting for re-purposing bugs from previous releases and injecting them into later versions of the code.

An alternative approach to forward-porting bugs would be back-porting canaries. Given a library libfoo with a previous release A and the latest stable release B, the history of bugs fixed from A to B can be used to identify the bugs present in A, formulate oracles, and inject canaries in an old version of libfoo. However, when we use A as the code base for our target, we could potentially miss some bugs in the back-porting process. This increases the possibility that the instrumented version of A has bugs for which no ground-truth is being collected. In contrast, when we follow the forward-porting approach, B is used as a code base, ensuring that all known bugs are fixed, and the bugs we re-introduce will have ground-truth oracles. It is still possible that with the new fixes and features added to B, more bugs could have been re-introduced, but the forward-porting approach allows the benchmark to constantly evolve with each published bug fix.
Why is the forward-porting process in Magma manual?

An automated approach for injecting bugs would have resulted in an incomplete and error-prone technique, which would ultimately yield fewer lower-quality bugs. Instead, dedicating human resources for this purpose maximizes the chance for a chosen bug to be ported correctly into Magma. Thus, we opted for the latter to maximize Magma's utility.

During a semester project, two Bachelor students added over 60 bugs to Magma in three new targets. The students took less than 1.8 hours per bug on average to analyze the bug location, infer the trigger constraints, create the canary, and encode the necessary metadata. Injecting bugs into Magma requires no expert knowledge (the students are 3rd year Bachelor students without security experience). Several external collaborators also added new targets and new fuzzer configurations into Magma, and integrated Magma into their research workflow.

The main critique point of previous reviews is that bug insertion and forward-porting of bugs is not automatic. We argue that an automatic bug insertion technique will be strictly inferior to manual porting due to the difficulty and complexity of automatic forward porting. Creating an automatic technique will be extremely challenging due to the large amount of differences in bugs, bug types, bug locations, constraints required to trigger a bug, code modifications to encode these constraints, and, based on a bug report and a patch, carefully assessing where the bug location was.

All these challenges for an automatic technique are unsolved and would require large amount of time to solve, likely in the order of 5-10 person years (1-2 PhDs). Additionally, any automatic solution would likely require careful checking of the forward-ported bugs which would result in the same amount of time required to manually port the bugs.

While these problems are interesting (and should be looked at), we argue that they are orthogonal to Magma---a benchmark suite to test fuzzers. Commpared to a manual approach that costs less than 2 person hours per bug, an automatic approach will never be cost effective (or, arguably ever be useful). Given these downsides, we strongly favor the manual bug porting approach.
What is "ground truth"?

Ground truth in Magma represents the ability to to recognize faulty behavior and map it back to its root cause (the bug). We integrate ground truth into Magma targets through oracles and canaries that evaluate program state to detect if a fault has occurred.
Is Magma limited to evaluating grey-box fuzzers?

No. At its core, Magma is just a compilation of libraries with ground-truth bugs. The only requirement for evaluating a bug-finding tool with Magma is that it qualifies as a fuzzer. The defining trait of fuzzers, in contrast with other types of bug-finding tools, is the fact that they execute the target concretely on the host system. This allows Magma's instrumentation to export canary statistics from the program's environment and into the monitor.
What is a "canary"? What is the difference between an "oracle" and a "canary"?

An oracle evaluates the current program state and determines if it is faulty (i.e., if the bug has been triggered). A canary is responsible for reporting and exporting that knowledge, through Magma's runtime library, to be used by the monitor.

Throughout the code, the documentation, and the paper, the distinction between those terms is not too emphasized, and they are often used interchangeably when the distinction is not critical.
Which targets are suitable for use in Magma?

Magma's initial set of targets was chosen to cover different computational domains of applications and standards, to provide an all-around benchmark representative of a major portion of in-the-wild fuzzing targets.

Although the concept behind Magma is not exclusive to one title under which targets fall, Magma's implementation does limit the scope of real targets that can be added. Magma currently relies on inline instrumentation written in C, which then requires the targets to be written in C/C++. Moreover, Magma currently does not support multi-threaded targets, since its runtime library is not thread-safe.
How are bugs in Magma targets selected for porting?

No specific set of criteria was imposed on the bug selection process. However, throughout our porting efforts, we often prioritized more recent bug reports, since they correlate most closely to the latest code base, and are thus more likely to remain valid. Additionally, reports marked "critical" were also given a higher priority than others.

That said, there are no constraints on bug types. Bugs in Magma can be anything from typical memory safety violations to semantic bugs, allowing for a broad range of possible sanitization and fault detection techniques.
What is the difference between a bug being reached, triggered, and detected?

A reached bug refers to a bug whose oracle was called, implying that the executed path reaches the context of the bug, without necessarily triggering a fault. A triggered bug, on the other hand, refers to a bug that was reached, and whose triggering condition was satisfied, indicating that a fault occurred. Whereas triggering a bug implies that the program has transitioned into a faulty state, the symptoms of the fault may not be directly observable at the oracle injection site. When a bug is triggered, the oracle only indicates that the conditions for a fault have been satisfied, but this does not imply that the fault was encountered or detected by the fuzzer.

Another distinction is the difference between triggering and detecting a bug. Whereas most security-critical bugs manifest as a low-level security policy violation for which state-of-the-art sanitizers are well-suited — e.g., memory corruption, data races, invalid arithmetic — some classes of bugs are not easily observable. Resource exhaustion bugs are often detected after the fault has manifested, either through a timeout or an out-of-memory indication. Even more obscure are semantic bugs whose malfunctions cannot be observed without some specification or reference. Different fuzzing techniques have been developed to target such evasive bugs, such as SlowFuzz and NEZHA. Such advancements in fuzzer technologies could benefit from an evaluation which accounts for detection rate as another dimension for comparison.
Why does Magma not measure coverage as a metric?

The idea behind Magma is to obtain an accurate and meaningful measure of fuzzer performance. The rationale boils down to the following: if fuzzer A finds more bugs than fuzzer B, then A is superior to B. Coverage, on the other hand, is just an approximation to performance, based on the observation that the more code is covered, the more likely it is to execute buggy code. Hence, Magma focuses on bug counts and time-to-bug metrics as more direct surrogates for performance.
What does the POLL parameter affect? And why does the Magma monitor need to poll?

Canary information from Magma instrumentation is stored in a shared file that the monitor can access. To avoid the overhead and complexity of synchronization, the Magma monitor does not synchronously read the results. Instead, it polls the file, meaning that it reads its contents every POLL seconds.
When should I use the IdealSanitizer (ISan) configuration?

ISan is an early alarm system that crashes the program (with a SIGSEGV signal) when the bug trigger conditions are satisfied. It can be used when the detection capabilities of the fuzzer are of no interest to the evaluator, or when they can be evaluated separately in a post-processing step.

One such example is AFL. When using AFL with AddressSanitizer, it is possible to first run the Magma benchmark with ISan, collect crashing test-cases for all bugs, then re-compile the target without ISan and re-run it against the collected test-cases, filtering out the bugs which could not be detected by ASan. It is also important to take into account AFL's other fault detection techniques, like out-of-memory thresholds and execution timeouts. The [fuzzer]/run_once.sh scripts in Magma are intended to emulate the fuzzer's execution environment to detect faults.
How is the seed corpus for Magma chosen? Does it guarantee that all bugs will be found?

No. The corpus provided in Magma is only a starting set of seed files, usually sourced from the original library repository. The objective is to provide a common starting configuration for all fuzzers.
When using the captain/run.sh script, how should I select the value for WORKERS?

WORKERS specifies the number of logical cores (from 0 up to WORKERS-1) you wish to allocate for running the benchmark. Magma will utilize these cores to run multiple campaigns in parallel. When all allocated cores are busy or occupied, Magma queues up remaining campaigns and dispatches them to the next core that frees up.
I need to run scripts inside the Docker container. How can I do that?
In order to obtain an interactive shell session in a Docker container, you must execute the bash program inside a running container.

In the case that you already have an existing running container (e.g. a campaign that has not finished):
1. Obtain the container's ID with docker ps
2. Execute bash in a foreground TTY terminal:
```
docker exec -it <CONTAINER_ID> /bin/bash
```
Alternatively, if you want to launch a bash shell inside a new container, you can use the captain/start.sh script as follows:
```
cd captain
FUZZER=afl TARGET=php PROGRAM=exif ENTRYPOINT=/bin/bash ./start.sh
```
How do I obtain sudo access inside the container?

The Dockerfile describes how the magma user inside the image is added to the sudo users group. The default password is amgam.
How do I check if campaigns are still running? How do I kill their containers (prematurely)?
The captain toolset currently does not provide means by which to manually terminate containers. Its scripts have exit handlers which attempt a clean-up before the script exits. However, in case of a malfunction, you can still manually check the status of current active containers and kill/remove them:
```
docker ps
docker kill <CONTAINER_ID>
docker rm -f <CONTAINER_ID>
```

How do I remove all Magma containers and images from my system?

docker ps
docker rm -f `docker ps | grep magma | awk '{print $1}'`
docker rmi -f `docker image ls | grep magma | awk '{print $3}'`

Why do some target programs need arguments, whereas others do not?

Most programs that we include in Magma targets are derived from Google's OSS-Fuzz project, where the target developers write their own libFuzzer stubs. In Magma, we include a wrapper for those stubs to allow them to be fuzzed by AFL and its likes. This wrapper can either be launched with a file-name argument, which will be read and fed into the libFuzzer stub, or it can be launched without arguments, in which case it would be used by AFL for persistent fuzzing.

We also occasionally include other programs in Magma, including tools (e.g., tiffcp, pdfimages, ...) which require command-line arguments to properly consume the input. In that case, we provide the AFL-style arguments in the configuration, where @@ is replaced by the path to the fuzzer-generated test-case.
Where can I find which programs are included with a target, and the arguments they need?

Each target configuration (in targets/*) includes a configrc file which specifies the list of programs to fuzz, and the AFL-style arguments to pass to each program.
What does the REPEAT parameter signify? Why do I need multiple repetitions?

It indicates the number of times to repeat the same campaign over. Fuzzing is a highly stochastic process, and performance often varies based on the initial random seed that the fuzzer chose. As such, it is important not to consider the results of only one campaign as final, but to examine the fuzzer's overall trend through multiple restarts of the same campaign.
Can I select which bugs to inject into a target?

Magma's configuration scripts currently do not support custom bugs. However, you can still achieve this. When building a Magma Docker image, only the bugs inside the patches/bugs and patches/setup directories (in the target configuration directory) are applied, and all files in other subdirectories are ignored. So, to select which bugs to apply, simply make sure that only your chosen bugs are inside those directories, and move the undersired bugs somewhere else. Then, rebuild the image.
How can I resume or extend a fuzzing campaign?
This requires creating a new fuzzer configuration that resumes from an existing workdir. For this purpose, we've added an example config, afl_resume , which is a copy of afl where the run.sh script was modified to use the -i - flag in running AFL instead of using the seed corpus as input.

To resume work from a previous campaign, build the new configuration:
```
 FUZZER=afl_resume TARGET=libtiff ./build.sh 
```
Then, launch the campaign manually, specifying the old workdir (without emptying it):
```
 FUZZER=afl_resume TARGET=libtiff PROGRAM=tiffcp ARGS="-M @@ tmp.out" SHARED=./workdir POLL=5 TIMEOUT=24h ./start.sh 
```
The captain/run.sh script has terminated, but campaigns are still running. What's wrong?
It is likely that the script encountered an error while building the benchmark or processing parameters, and terminated prematurely.

To kill all campaigns:
```
 pkill -SIGTERM 'start\.sh' 
```
I have run my fuzzer for a long time, but it has found very few bugs or crashes. Am I doing something wrong?

Probably not. Bugs in Magma are real and complex, and we have yet to collect a comprehensive list of proof-of-concept test-cases for every bug. Keep an eye out on the Bugs and Reports pages for the latest status of bugs and statistics on them.
Where can I find a list of PoVs (proofs of vulnerability) to trigger bugs in Magma targets?

Any PoV dumps will be listed for download on the Bugs page.
Can all bugs in Magma be reached, triggered, and detected?

Bugs that are injected in Magma are not verified, since the process of manually crafting PoVs is arduous and requires domain-specific knowledge, both about the input format and the program or library, potentially bringing the bug-injection process to a grinding halt.

Instead, we inject bugs into the targets without first supplying PoVs, then we collect PoVs from the results of the campaigns. When available, we also extract PoVs from public bug reports.

This approach does not guarantee that all injected bugs can be used for evaluation, but it does make the development of the benchmark and contribution to it more streamlined and efficient, leaving it to the fuzzers to do all the heavy lifting.
What do the monitor output logs contain? Does every log file correspond to a new bug discovered?
Throughout the lifetime of the campaign, the monitor keeps track of the cumulative count of all bugs encountered. These log are not related to one specific crash or run. They are the culmination of the entire fuzzing process up until the timestamped point.

The monitor folder contains files whose names are timestamps, and whose contents are counters. The timestamps are in seconds, since the beginning of the campaign. The counters are just the number of times the bug has been reached/triggered since the beginning of the campaign.

Consider a timestamped log monitor/24100 which contains the following:
```
ABC123_R, ABC123_T, XYZ001_R, XYZ001_T
63453, 29060, 23, 3
```
This means that, until this timestamp, the fuzzer had generated 63453 files that reach ABC123, of which 29060 files trigger it. The fuzzer may not have saved all of them because it manages to deduplicate some crashes.
How can I save all inputs that trigger a bug, even if they don't crash with AddressSanitizer?

You can use ISan and fuzz until you encounter that bug. When using ASan only, the bugs that are triggered without causing a crash will not be saved in the crashes directory of the fuzzer. However, the Magma monitor will log it, thanks to the canaries. Compiling with ISan is not necessary to see whether an input triggers a bug. You can do that with monitor --fetch watch. But for obtaining the test cases that don't crash, you will need ISan, because the typical fuzzer would not have saved non-crashing cases.
I have found a PoV that triggers two bugs, but the `monitor` only reports one. Why?

This is intended behavior. It doesn't make sense to trigger more than one bug. Once a bug is triggered, the program moves into a faulty state where the canaries are disabled. Any bugs triggered after the first bug are not reliable. This is a design decision in Magma. After a bug is triggered, the state of the program is unknown. It can lead to other bugs being triggered only as a result of triggering the first bug.
I want to evaluate my fuzzer with Magma, but I do not have a public git repo. How can I modify the Docker image to use my local code?

Provide the source code as part of the fuzzer configuration directory (create a folder called `repo` and place the code there). Then create an empty fetch script.
How can Magma evaluate transformational fuzzers (e.g., T-Fuzz)?
When a canary is inserted at some line of code, it assumes that the input satisfied the conditions to reach that line of code in the original program. Then, to consider the bug triggered, an additional bug condition is supplied to the canary. Thus, the condition for triggering a bug is the AND of the reach condition and the bug condition: trigger = reach AND bug.

However, when the program is modified, the reach condition could have been violated, and the assumption made by the canary would be broken. The canary only evaluates the bug condition, and uses that as a trigger condition. Thus, if T-Fuzz reaches a bug by transforming the program, then triggers it through random mutation, the canary may record a false positive.

One way we suggest to address this would be:
- enable ISan so that the bug crashes directly when triggered (this makes evaluation easier)
- enable canaries in the original program
- disable canaries in the transformed program at runtime (but ISan would still be in effect)
This way, the transformed program could still crash when a bug is triggered, thus informing T-Fuzz to perform post-analysis. After post-analysis, the fuzzer would run the adapted crashing input and feed it to the original program (with canaries enabled). This way, bug triggers are only recorded in the original program after T-Fuzz creates/synthesizes valid crashing test cases for the original program.