HexHive PhD, MSc, BSc projects

This is a list of possible open, unassigned BSc, MSc, or PhD semester research projects in the HexHive group. The projects are designed to be adjustable and scalable according to the type of project (BSc, MSc, or PhD research project) depending on the depth of the evaluation and exploration. For all projects we expect an open mind, good coding skills (especially in C/C++), and the willingness to learn. Previous experience with compilers (LLVM), build systems, and reverse engineering helps but is not required.

If you are interested in any of these topics, then contact the HexHive group list or any of the doctoral students. Please start your email with "[(MSc|BSc) (Project|Thesis)]", e.g., "[MSc Project] LLVM-based code analysis". In the HexHive we welcome independent ideas from students as well, as long as they focus on the topics of system and software security, especially (but not exclusively) in sanitization of unsafe code, interactions between different components, mitigations, compiler-based security analyses, or fuzzing. So if you have an idea for your own project let us know and we can discuss!

Bluetooth Security
  • Point of contact: Daniele Antonioli
  • Keywords: Bluetooth, BR/EDR, BLE, Attacks, Defenses, Testing, RE, IoT

Bluetooth is a pervasive technology for low power wireless communications, and it is used every day by billions of devices, including smartphone, laptops, wearables, cars, medical devices, industrial devices, etc. Bluetooth is specified in an open standard and it provides two main stacks: Bluetooth BR/EDR (aka Classic) and Bluetooth Low Energy (BLE). One (1) vulnerability in the Bluetooth standard results into billions of exploitable devices (e.g. the KNOB attack).

We are interested in all security aspects of Bluetooth BR/EDR and BLE including attacks, defenses, reverse engineering of closed source (proprietary) software and hardware, static and dynamic analysis and testing/fuzzing. A project will likely cover one of the listed aspects but according to the student interest and progresses we can cover more.

A candidate student should be familiar with:

  • Bluetooth Host Controller Interface (HCI), e.g, HCI commands, HCI events, and the HCI protocol.
  • Bluetooth security mechanisms, e.g, pairing, authentication, session establishment, and association
  • Network analysis tools, e.g., ifconfig wireshark
  • Bluetooth tools such as hciconfig, hcitool

And ideally have some knowledge about:

  • Bluetooth host stack codebase and API, e.g., BlueZ
  • Bluetooth controller stack, e.g, LM and LMP
  • Internalblue RE toolkit
  • Bluetooth SoC and development boards

Recommended reading:

  • The KNOB is Broken: Exploiting Low Entropy in the Encryption Key (Daniele Antonioli et al)
  • Low Entropy Key Negotiation Attacks on Bluetooth and Bluetooth Low Energy (Daniele Antonioli et al)
  • Guide to Bluetooth Security (NIST)
Reverse-engineering CPU microarchitecture
  • Point of contact: Atri Bhattacharyya
  • Keywords: microarchitectural attacks, side channels

A number of microarchitectural side-channel attacks have emerged since 2018, of which Spectre and Meltdown surfaced in January. A critical step in all of these attacks has been a reverse-engineering effort to determine the exact behavior of components of the microarchitecture: the cache organization and replacement policy, the branch prediction algorithm and structures.

We would need to characterize the behavior of these components by designing experiments which target specific behavior, and can leak the structure of internal components. An example is the associativity of caches. An expt which leaks this information is the size of eviction sets. By accessing a set of memory addresses which we increase until we have cache evictions, we can figure out associativity. By modifying the access patterns, we can also infer the cache's eviction policy. A great example on reverse engineering branch predictors on older Intel processors is this blog post series.

We propose a semester project for masters students or dedicated bachelors students in which we tackle this challenge. An ideal candidate have taken a course on computer architecture and be familiar with the microarchitectural components. Experience with coding in x86 or ARM assembly will be very useful (but is not essential).

API flow graph sanitization
  • Point of contact: Nicolas Badoux
  • Keywords: sanitizer, API graph, component analysis, compiler analysis

Software components (e.g., libraries) often expose a complex API. When using a component, sequences of API calls will build up deep API state that is custom to the component. These API calls are often highly complex and dependent on each other. API call sequences for high level languages (Java) have been analyzed in software engineering but C/C++ libraries so far were neglected due to lack of type information.

In this project, we will build an API inference graph that, using a whole system analysis, collects all uses of a library (starting with a simple file-based search but potentially moving to a compiler-based analysis). This API graph encodes dependencies between individual API calls (e.g., read can only be called after open) and parameters for API calls (e.g., open may specify if the file can be written or not). The API graph encodes all possible API interactions and defines the parameter space for each method.

Based on the API graph we will then search for violations of the inferred values in real applications by implementing a sanitizer that tracks the API state and searches for violations of the valid state.

Transformational Fuzzing
  • Point of contact: Ahmad Hazimeh
  • Keywords: fuzzing, program transformation, exploitation

Fuzzing is experiencing a renaissance through coverage-guided greybox fuzzing. AFL and its descendants have been used to find thousands of security bugs. A fuzzing engine consists of an input generator and an execution engine. Coverage guided fuzzing collects coverage information (what parts of the program were executed how many times) from the execution engine to influence the input generator to increase coverage. Existing coverage-guided fuzzing engines often hit a so-called coverage wall where the required input changes are too complex and fuzzing no longer makes progress.

Instead of only mutating the program input, we have recently proposed transformational fuzzing that analyzes the program and mutates the program alongside the input to reach new coverage. Crashes in the mutated program then must be mapped back to the original program to ensure these are real crashes and not false positives, we call this component the crash analyzer.

After publishing the initial results we want to increase the precision of the mutation. The goal of this project are to develop a mutation engine and policy on how to mutate the program to ensure the best possible coverage expansion. Potential directions are the inference of non-conditional functions that may be pruned or co-dependent checks along an execution path that trigger violations. Orthogonally, mutational fuzzing should become context sensitive so that the program is mutated on the fly, depending on the execution context with the help of a feedback-guided process.

LLVM-based T-Fuzzing
  • Point of contact: Ahmad Hazimeh
  • Keywords: fuzzing, compiler-based transformation, exploitation

Similar to the previous project, this project extends T-Fuzz (see the first two paragraphs of the previous project proposal for context). The existing T-Fuzz prototype relies on binary rewriting and is restricted to if conditions (i.e., it cannot support switch statements). The goal of this project is to implement a T-Fuzzing engine on top of LLVM to support a flexible and (runtime) configurable way to readjust checks depending on a transformation policy. Each conditional program location would be extended with a switch that, when triggered would alternate the result.

Code pruning and debloating
  • Point of contact: Tesic Uros
  • Keywords: debloating, TCB reduction, code analysis, reflowing software

Current systems leverage libraries to facilitate code reuse across processes. If a program uses data structures or functions from a library then the runtime linker connects those data structures when the program is instantiated. The intention of using shared libraries was to reduce memory pressure across processes. Nowadays where the amount of loaded code is dwarfed by the amount of data it is time to revisit this assumption.

In this project we will develop a process to statically link binaries, removing any dependencies to other libraries. Libraries will be compiled as LLVM bitcode and inlined during the LTO/thinLTO process. We will measure performance of the newly compiled binaries as well as code reduction and security metrics, e.g., the increased precision of Control-Flow Integrity due to the decreased amount of targets.

Evaluation targets are: security (reducing the amount of code, target locations, e.g., for CFI, and code pointers in general) and performance (relative accesses instead of GOT access, inlining, map pages/TLB optimization)

Incremental fuzzing and sanitization

Dynamic software testing is currently an all or nothing approach. Software is compiled with sanitization and fuzzing and then evaluated. As software is developed in small steps, it would be advantageous to fuzz the delta that was added. This project will focus on developing a software testing environment that keeps track about areas that were already fuzzed or tested and which test cases had what coverage. This allows that, whenever a part of the software is changed, we can focus on that new part.

  • Develop a mechanism to track coverage across input and sanitizer checks
  • Correlate the coverage to specific areas of software
  • Focus testing on tests that explored the changed area and see if you can focus testing on only the changed part

Evaluation targets are common software repositories for open source projects.

Evaluating control-flow hijacking defenses
  • Point of contact: Tesic Uros
  • Keywords: systematization of knowledge, control-flow hijacking, security benchmark

Buffer overflows, use after free attacks, and generally control-flow hijacking due to memory safety and type safety errors have been the cause of countless security vulnerabilities that lead to exploitation of software. Over the past 20 years a large amount of mitigations has been proposed that mitigate control-flow hijacking attacks. The only solution to prohibit the exploitation of this attack vector is memory safety, all other defenses may leave some avenue for attack.

We will evaluate different control-flow hijacking mechanisms such as Control-Flow Integrity, executable but unreadable code, and other mechanisms, comparing them along different metrics. The goal of this project is to evaluate the power of the different mitigations and to rank them according to completeness, power, overhead, or other systematic weaknesses.

Deep trust chains may violate security expectations
  • Point of contact: Tesic Uros
  • Keywords: software ecosystem, components, dependency graph

Modern frameworks such as JavaScript, NodeJS/NPM, Docker, Python/PIP rely on a vast distribution of libraries with deep dependencies. Whenever you install a certain library or functionality, it may pull in a lot of other dependencies. The security of your application (and sometimes system) now depends on the security of all these additional libraries.

This project consists of two phases. In the first phase, you will compute the dependency graph across the system, marking individual dependency chains and evaluating common dependencies. In this phase we are interested in the number of dependencies and the different chains and we will evaluate different "hot" packages that are used across large amounts of installations.

In the second phase, we will evaluate the dependencies of common programs/libraries and study how these dependencies change over time. This allows us to assess the threat surface for given programs and evaluate given security risks.

Secure Memory Allocator
  • Point of contact: Ahmad Hazimeh
  • Keywords: memory allocation, memory corruption, heap protection

Memory allocators are primarily geared towards performance. Orthogonal to performance, memory allocators can harden a program against memory safety bugs. Use after free and other buffer overflows are common mistakes that compromise adjacent data structures on the heap. In this project we evaluate memory allocators and design a secure memory allocator alternative. The project consists of the following steps:

  • Design a benchmark for memory allocators
  • Record allocations/deallocations and writes of benchmarks, allow replay
  • Evaluate size classes according to object sizes For the analysis of existing programs, we will check number of allocation sites, number of alloc sizes, number of alloc sizes per site. As benchmarks we will leverage SPEC CPU2017 and parallel programs.
  • Make the allocator C++ aware! (according to type classes)
  • Use classes of allocations (depending on location/type) As part of the optimization, we will discuss how to store the metadata, e.g., per thread, per allocation size, or using a mixed data structure.

Discussion points for the memory allocator are: (i) if meta-data and malloc'd data is separated, (ii) (configurable) canaries between blocks, (iii) (configurable) overwrite on free, (iv) (configurable) no-reuse policy, (v) as a performance optimization: leverage guard pages and efficiently allocate them.

Leveraging Intel PT to enforce online dispatch safety

Intel PT is a technology that allows recording of control flow decisions and evaluation of them on a different thread. The idea of this project is to implement a security monitor on top of Intel PT. By following certain jumps, we can encode messages and send them to the monitor to keep track of them.

  • Encode object allocations into indirect (but controlled) calls to magic
  • location so that they show up in the PT trace.
  • Read out PT trace from second core to build up object graph
  • Translate virtual dispatch to include the object location that is dispatched on (i.e., including it in the PT trace)
  • Verify integrity on second core, pause before I/O or state system calls so that second core can catch up

This approach allows us to keep a shadow object graph in a validator process and check every virtual dispatch at low overhead. The policy is the same as CPI but at much lower overhead as pointer writes no longer need to be checked inline in the original process.

Binary vulnerability inference
  • Point of contact: Ahmad Hazimeh
  • Keywords: binary analysis, vulnerability study, pattern-based search, measurement study

Deployed programs may include third party code in inlined form or as customized libraries. Keeping these external code repositories up to date is challenging. Ideally, the program would be updated immediately with the underlying repository. Unfortunately, these updates may be forgotten. In this project, we want to study the prevalence of such misuses and forgotten updates.

  • First, we have to detect the library version and compilation settings by developing a binary analysis that leverages features of a binary to detect its version.
  • Classify vulnerabilities and vulnerable code according to code patterns by evaluating libraries and creating a ground truth data set of code patterns and vulnerabilities.
  • Search for the classified vulnerabilities and code patterns in binaries
  • Study on packaged/prepackaged libraries and track vulnerability provenance

Note that this project could also focus on Android applications.

Information Hiding: A New Hope
  • Point of contact: Ahmad Hazimeh
  • Keywords: information hiding, kernel, systems security

Many mitigations require some form of meta data to check the integrity of the process and to detect violations. This defines two privilege domains with the application data and the security data. Bugs in the code that computes on application data should not be allowed to access or modify the security data. Information hiding allows low-overhead fault isolation by separating the two types of data and making it unlikely that the data can be accessed. As there is no actual separation, correctly guessing the address of the secure data would allow application code to overwrite secure data. Information leaks are therefore a problem, guessing oracles or allocation oracles probe for the secret area.

  • Study existing attacks against information hiding and their weaknesses.
  • Create a kernel module and protocol that allows the allocation of safe metadata areas (the high bits on x64 are reserved to the kernel; in this vast address space we can hide some memory pages; these pages can be used to store the secure metadata). Allow allocation of memory areas in high level memory. Extend syscall of clone to allocate safe stack when creating a new thread Add syscall to allocate a new hidden memory area Add syscall to disable this feature (on per-thread basis)
  • Develop a compiler pass to move meta data to this memory area.
  • Evaluate: safe stack, OpenSSL to store keys, CFIXX, or type safety

Key idea: Allocate the hidden memory areas in a safe space. High level memory reserved for kernel is the perfect place, most is unmapped and traps. The attacker cannot probe for this area.

Other projects

Several other projects are possible in the areas of software and system security. We are open to discuss possible projects around the development of security benchmarks, using machine learning to detect vulnerabilities, secure memory allocation, sanitizer-based coverage tracking, and others.