HexHive PhD, MSc, BSc projects
This is a list of possible open, unassigned BSc or MSc research projects in the HexHive group for EPFL students.
Check out our list of completed projects to get an idea of past projects.
The projects are designed to be adjustable and scalable according to the type of BSc, MSc, or short PhD research project depending on the depth of the evaluation and exploration. For all projects we expect an open mind, good coding skills (especially in C/C++), and the willingness to learn. Previous experience with compilers (LLVM), build systems, and reverse engineering helps but is not required.
If you are interested in any of these topics then apply through our application form. All project applications have to go through this form and we will internally discuss all applications and then invite students for interviews. The first application deadline for projects in fall '24 is July 04 and the second deadline is August 31, 2024.
In the HexHive we welcome independent ideas from students as well, as long as they focus on the topics of system and software security, especially (but not exclusively) in sanitization of unsafe code, interactions between different components, mitigations, compiler-based security analyses, or fuzzing. So if you have an idea for your own project let us know and we can discuss! Reach out to the people most closest to your ideas and we'll let the ideas bubble up.
Android acropalypse
- Point of contact: Luca Di Bartolomeo
- Suitable for: Msc Semester Project / Thesis
- Keywords: Reverse Engineering, Static Analysis
You might have heard about the recent security disaster that is aCropalypse. Well, it turns out that the reason behind this bug is Google silently updating some Android’s API for opening files which causes files not to be truncated anymore when opening them.
This is pretty wild and we think that there might be many more applications of aCropalypse, not just cropped screenshots. This project is about writing tooling to automatically analyze Android apks and searching for potential alternative data leaks.
A candidate should be interested in:
- Android application reverse engineering
- Static analysis tooling for Android apks
ARM64 Kernel Driver Retrowriting
- Point of contact: Luca Di Bartolomeo
- Keywords: Retrowrite, binary rewriting, mobile reverse engineering
A common feature of the Android ecosystem are proprietary binary blobs. Vendors may not update these and may not compile them with the latest exploit mitigations. A particular cause of concern are kernel modules given their privileged access.
Hexhive’s Retrowrite project is a state-of-the-art binary rewriting tool that can retrofit mitigations to legacy binaries without the need for source code. This currently works on ARM64 and x86-64 platforms, and x86-64 in kernel mode. The goal of this project would be to target ARM64 kernel modules, with the ability to add for example kASAN. We would aim to:
- Identify kernel modules of particular interest, including open source modules to act as ground truth.
- Produce a framework to evaluate the effectiveness of binary rewriting these modules by exercising their functionality, using fuzzing where appropriate.
- Modify Retrowrite to support ARM64 kernel modules.
- Evaluate the implementation against ground truth targets and against targets of interest. Evaluate the cost of instrumentation passes.
Students should have a basic understanding of how Linux kernel modules are built and loaded, and a good grasp of Linux internals. Ambitious students may also have Android Internals knowledge and be interested in testing their work on Android hardware.
Benchmarking Fuzzers for Structured Text Input Software
- Point of contact: Chibin Zhang
- Suitable for: Master Thesis Project
- Keywords: fuzzing, benchmark, compilers, data analysis
Fuzzing is an effective technique for finding bugs in software. Prior works have created benchmarks to assess the performance of fuzzers. However, these benchmarks are biased towards targets that accept binary inputs and towards fuzzers that mutate at the byte level. Additionally, they suffer from saturation, meaning the performance differences between top fuzzers are often insignificant. It is a known issue that existing byte-level fuzzers do not perform well on targets accepting structured text inputs. Current fuzzing benchmarks do not include state-of-the-art structure-aware fuzzers, such as grammar fuzzers, in their baselines. This is due to the fact that these fuzzers typically require additional grammars, dictionaries, or large seed corpora. Furthermore, existing structure-aware fuzzers have been evaluated on a limited set of disparate targets, run with different specifications, making it challenging to compare their performance quantitatively or even qualitatively.
In this project, you will create an extensive benchmark for targets that accept structured text inputs. You are expected to integrate at least 8 structure/syntax-aware fuzzers and 16 new targets (latest version), along with the required grammars, dictionaries, and corpora. It is suggested to use the Nix build system, as its build configurations are written declaratively and build artifacts are deterministic. This choice is anticipated to streamline the benchmarking process and ensure reproducibility. You will then conduct fuzzing campaigns and analyze the results quantitatively. A potential focus could be assessing the impact of the provided grammars, dictionaries, and corpora on the performance of the fuzzers. The build, run, and analysis scripts will be open-sourced to facilitate future research.
Examples of interesting fuzzers and targets for integration:
- Fuzzers: AFL++ with cmplog and autodict, Token-level AFL, Gramatron, Nautilus, Grimoire, Superion, Polyglot, CSmith.
- Targets:
- All targets included in fuzzbench.
- Compilers/Interpreters/Assemblers accepting code inputs: clang, hotspot, python, php, ruby, v8, JavaScriptCore, SpiderMonkey.
- Document formats: html, postscript, word, rtf, roff, markdown.
- Data (interchange) formats and their processors: json, yaml, toml, xml, csv, tsv, jq, yq, sqlite.
Recommended Background:
- Completion of compiler and software security-related courses.
- Familiarity with NixOS and Nix-based build tools.
- Experience with fuzzing and triaging compiler/interpreter bugs.
Exploring Proprietary Android System Services
- Point of contact: Philipp Mao
- Suitable for: MSc project, MSc semester project
- Keywords: Android, reverse engineering, frida
Android system services (high-privileged userspace processes) are an important piece of Android’s security architecture. Smartphone vendors usually modify Android and add their own features, which often include additional system services. These system services are an interesting target for malicioius apps, since the services’ API is usually accessible to an app. While past research has predominantly focused on native (C++) system services, this project aims to investigate the system services that appear to be implemented in Java.
The objective of this project is to develop tools for analyzing proprietary Java system services. We aim to understand how these services operate, their privilege levels, and any potential vulnerabilities they may have. An ideal outcome of the project is a tool that can be deployed against a phone to then automatically analyze all system services.
Project tasks (in no particular order): - Writing frida hooks to dynamically analyze running proprietary Java services - Reverse-engineering proprietary Java services
Students interested in this project should have written at least one frida script to hook an Android app.
Investigating the RP2350 bootrom
- Point of contact: Florian Hofhammer
- Suitable for: MSc semester project
- Keywords: Emulation, dynamic binary instrumentation, re-hosting
The new RP2350 microcontroller has a wide range of capabilities, including Arm TrustZone-M for its Cortex-M33 cores, dual-architecture (RISC-V/Arm) support, special RPi-only peripherals (HSTX, PIO), etc. Furthermore, it supports secure boot and booting from encrypted flash. These security features are new additions in comparison to the features of the (older and less capable) RP2040 microcontroller.
In order to properly support all those features, not only the hardware but also the bootrom needed to be redesigned. Raspberry Pi luckily open-sourced the code for the bootrom, which allows for static analysis of the code. However, static analysis only gets us so far; dynamic analysis provides us with even more insights into its functionality and run-time values for variables, peripheral state, etc.
In this project, we aim to re-host the RP2350 bootrom into a virtual environment, which allows us to step through the bootrom’s code at runtime and interact with its runtime state.
An interested student ideally has experience with embedded development for microcontrollers and is comfortable with reading/understanding Arm and/or RISC-V assembly. Familiarity with emulation frameworks such as Unicorn is a plus.
Creating a cycle-accurate multi-architecture simulator
- Point of contact: Florian Hofhammer
- Suitable for: MSc semester project
- Keywords: Microarchitecture, simulation
r2wars is a game in which participants write small assembly bots that execute in the same address space. Whichever bot crashes first (e.g., by being overwritten by the competing bot) loses. This game is an adaptation of the original Corewars idea but with a twist: instead of being based on a programming language similar to assembly designed specifically for this kind of game, r2wars builds on top of the radare2 reverse engineering tooling and allows bots to be written in real-world architecture assembly (e.g., x86, Arm, RISC-V, Mips, …). However, while the supported ISAs are taken from the real world, the execution model is far from close to reality. Most instructions take the same amount of time, no matter the ISA or their complexity. In a real system, more complex instructions would execute in more cycles, and microarchitectural state such as pipeline state, caches, etc. would affect performance.
In this project, we aim to model r2wars “closer to the real world”, i.e., execute instructions in a cycle-accurate simulation. For this purpose, we can leverage a cycle-accurate simulator such as gem5.
The project requires familiarity with assembly code (at least reading and understanding) as well as basic knowledge about microarchitectural state.
SECCOMP implementation for double fetch protection
- Point of contact: Luca Di Bartolomeo
- Suitable for: MSc thesis
- Keywords: kernel security, data race protection, security policy
System call filtering is a crucial part of protection policies ubiquitous in cloud, desktop and mobile environments (Android, Docker, etc.). The existing SECCOMP filter system is unable to inspect arguments passed by reference since the user can modify the values in memory, resulting in a TOCTTOU exploit.
Midas is a novel mitigation for TOCTTOU bugs in the kernel, exploiting the user memory access API to provide double fetch protection. In this project, you will implement and evaluate SECCOMP filtering for system call arguments passed by reference, leveraging Midas to protect the kernel from the double fetch introduced in the process.
- This project requires:
- Expert experience in C development
- Experience with standard C/GNU build, development and debug tools (gdb, Makefiles)
- Understanding of OS principles
- Basic experience of OS coding/course project
- Understanding of the x86 architecture and assembly coding/debugging
Leveraging Static Analysis on Binaries to Uncover Time-of-Check-Time-of-Use Bugs
- Point of contact: Marcel Busch
- Suitable for: MSc semester project
- Keywords: software engineering, reverse engineering, binary analysis, static analysis
TOCTOU bugs can lead to severe memory corruptions. These memory corruptions might allow adversaries to compromise and take full control of the affected system. In this project, we want to port and adapt an exisiting binary static analysis to uncover TOCTOU bugs in proprietary real-world software.
A candidate should be interested in (and ideally already be familiar with):
- Python
- Ghidra/Ghidrathon and/or angr
- ARM assembly
- Static analysis (e.g., RDA)
Evaluation on Syscall Filtering Techniques
- Point of contact: Zhiyao Feng
- Keywords: syscall filtering, Linux, Android, 0-day exploit
It is common that the OS kernels (e.g., Android) do not do security updates for years, due to labor-intensive update costs and limited software support lifespan. This leaves those kernels vulnerable to exploits. To protect them, we can utilize syscall filtering techniques to block potentially malicious syscall sequences before causing any damage. So the kernels remain safe, even if they are not updated.
There are some existing techniques or features that can be used for this purpose, like Seccomp, Seccomp-cBPF, and Seccomp Notify provided by the Linux kernel, along with some methods from research papers. They offer various capabilities and trade-offs in filtering syscalls for certain vulnerabilities.
In this project, we will evaluate these syscall filtering techniques, by reproducing some known 0-day exploits, applying the syscall filtering techniques, and checking if the exploits can be successfully blocked.
A candidate should be proficient in C programming and have a good grasp of Linux internals.
Hyper-Cube2 for 64-bit Hypervisors
- Point of contact: Qiang Liu
- Suitable for: Master Semester Project
- Keywords: Blackbox, Virtual Device, Fuzzing
Virtual devices remain the main attack surface to hypervisors. Vulnerabilities in virtual devices lead to denial of service, data breaches, execution hijacking, and other security problems. Hyper-Cube was proposed to fuzz virtual devices. It is a blackbox fuzzer and has very high throughput. Hyper-Cube has great usability but suffers from the following two problems:
- Hyper-Cube is only compatible with hypervisors that support a x86 32-bit OS.
In this project, we aim to achieve three specific goals.
- Adjust Hyper-Cube OS to be 64-bit compatible.
- Optimize the implementation to achieve higher throughput.
- Analyze found bugs, report them, and help to fix.
We are building an extensive blackbox virtual device fuzzer based on Hyper-Cube. A candidate is required to have experience with programming in C, compiling with Clang, programming profiling (e.g., FlameGraph), and algorithm optimization. It is preferable but not necessary to have experience with fuzzing, the development of operating systems and virtual devices, ARM, Hyper-V, VMWare Esxi, and macOS.
Recommended readings:
Maintaining Magma: A Ground-Truth Fuzzing Benchmark
- Point of contact: Qiang Liu
- Suitable for: BS/Master semester Project
- Keywords: Fuzzing, Evaluation, Benchmark
Magma is a fuzzer evaluation framework that enables accurate performance measurements by leveraging ground-truth information on bugs in real software. Magma includes a library of real targets (e.g. libpng, libtiff, openssl, etc…) with real bugs that have been re-introduced into those targets based on previous bug reports and fix commits. By reverse-engineering the commit which fixed a certain bug, we can identify what the root cause of the bug was, reintroduce it, and add a check (a canary) to determine when that bug is triggered, based on program state information available at runtime (i.e., variable values).
As fuzzers are tuned and improved on a regular basis, the benchmark upon which they’re evaluated must equally be upgraded, to keep up with the progress and avoid becoming out-dated. To achieve this, new targets and bugs must be added frequently, and old targets and bugs must be checked again for relevance, in case some bugs become unreachable/untrigerrable, or in case the target’s source code has changed enough to disallow the reintroduction of some bug without reintroducing old code functionality.
For this project, you are expected to:
- Add a few new fuzzers to Magma
- Port existing bug oracles to recent targets
- Develop CI/CD to handle third-party testing requests
Other projects
Several other projects are possible in the areas of software and system security. We are open to discussing possible projects around the development of security benchmarks, using machine learning to detect vulnerabilities, secure memory allocation, sanitizer-based coverage tracking, and others.