Skip to content
Snippets Groups Projects
README.md 15.90 KiB

Fetch Skips Hardening

This repository houses the artifact for a CC'24 paper titled “From low-level fault modeling (of a pipeline attack) to a proven hardening scheme”.


Principle and what this is

“Fetch skips” is fault model coined by Alshaer et al. [2023] which describes one common way microprocessors react to a glitch in their clock input. A typical model for this would be “instruction skip”, i.e. just skip an instruction in the execution of a program. Fetch skips are more precise and involve skipping or repeating 4 bytes of code, which can produce more complex effects for unaligned and variable-sized instructions. This is of course a major problem for security, as basically any incorrect execution can lead to abuse.

This repository is a research project on protecting against fetch skips. It contains a modified compiler (LLVM/Clang 12), linker (GNU ld 2.40), emulator (QEMU 8.0) and processor simulator (Gem5 22.1) which implement a combined software/hardware countermeasure. The main result of the paper is a proof that running a program protected by these tools on a minimally-extended processor prevents exploitation of fetch skips by ensuring that every attack causes the program to stop or crash within a few instructions.

In addition to the compiler/linker for generating protected programs, we use the emulator to simulate attacks and experimentally check the security claims, and the simulator to evaluate performance impact. A subset of programs from the MiBench benchmark suite is used.

How to reproduce results from the paper

To get straight to reproduced results on an x86_64 machine, no questions asked, download the compressed Docker image and run the following commands. For details see below.

TODO: Link to Zenodo.

% xz -d --stdout cc24-fetch-skips-hardening.tar.xz | sudo docker load
% sudo docker run -it localhost/cc24-fetch-skips-hardening
root@(container):~# make all_REF all_FSH run_REF run_FSH
root@(container):~# make -j$(nproc) campaigns
root@(container):~# TODO

The Docker image is just a build of this repository on Ubuntu 22.04; see Detailed description for an explanation of the contents. To build natively without using Docker, please check the Manual build instructions and the Dockerfile as a reference.

The first step is to build a reference version of the benchmark programs (make all_REF) without enabling Fetch Skips Hardening, and then protected versions (make all_FSH) using this project's compiler and linker passes. To verify that the protected programs still work as intended, we run both versions (make run_REF run_FSH) and check that the outputs are identical.

The second step is to run fault injection campaigns (make -jN campaigns). This uses a modified QEMU to emulate the effect of the fault and check that programs correctly stop or crash before the end of the attacked block. This fact is proven in the paper for single-fault injections (and proven up to the absence of checksum collisions for multi-fault injections) so the expected result is 100% fault resistance. See in Technical notes for an explanation on how to read the outputs if you're interested.

The same command also runs injection campaigns on the reference (non-protected) programs to collect statistics about the percentage of attacks that result in a crash within the attacked block, as a baseline comparison. Predictably, these campaigns result in a lot of security "bypasses" since the countermeasure isn't active.

TODO: Performance simulations

TODO: Generating figures

Detailed description

This repository contains the following tools as submodules:

  • llvm-property-preserving: A Clang/LLVM mod by Son Tuan Vu [2021]. We ended up not using the mod here, so think of this as LLVM 12. We added the Xccs extension and a hardening pass to the RISC-V back-end and emitter.
  • binutils-gdb: The usual GNU toolchain. We added a new relocation type to precompute checksums of regions of code once they have been relocated.
  • qemu: We extended the emulator to support Xccs instructions/exceptions, and to simulate fetch errors by substituting bits during translation. We use it to validate security.
  • gem5: We extended the simulator to recognize Xccs instructions (in a non-faulty situation). We use it to validate performance. I also hacked it to replace 64-bit RISC-V instructions with their 32-bit counterparts.

Other files used in the build process include:

  • elf32lriscv_ccs.x: A linker script for hardened programs. All it does is separate hardened code (all .o files except the runtime) from other code (the runtime and libraries) so that hardened code can be loaded at 0x40000 instead of the usual 0x10000.
  • elf32lriscv_ref.x: A linker script for reference (non-hardened) programs. It does even less, just separating user and library code within .text so that the campaign injection script is able to attack user code only. This makes campaigns much shorter and more comparable to the campaigns performed against hardened programs.
  • riscv_cc_REF, riscv_cc_FSH: Wrappers around reference (non-hardened) and fetch-skips-hardened compilers.

Both linker scripts can be diffed against the original, which can be found at ./riscv-custom/riscv32-unknown-elf/lib/ldscripts/elf32lriscv.x where it is placed when the custom binutils in the binutils-gdb folder is installed.

Other files used in the testing process include:

  • mibench: Programs from the MiBench benchmark suite. We target the Industrial, Network and Security applications. The source files are original but the Makefiles are basically new.
  • riscv_qemu_REF, riscv_qemu_FSH: Wrappers around QEMU and QEMU-with-FSH-support.
  • fault.py: Script for running fault injection campaigns (details inside).
  • fault_summary.py: TODO.
  • TODO: Generating figures.

The Makefile just contains a few top-level commands for using the project.

Technical notes

Reading the output of the fault injection script

Below in an excerpt from the fault campaign script's output (running in parallel).

[patricia 44.6%] 0x40770:s32,1... CCS_VIOLATION
[basicmath 48.5%] 0x41358:s32,1... NOT_REACHED
[patricia 44.7%] 0x40774:s32,1... CCS_VIOLATION
[patricia 44.8%] 0x40778:s32,1... CCS_VIOLATION
[patricia 44.9%] 0x4077c:s32,1... SIGILL
[susan 48.6%] 0x43ef8:s32,1... NOT_REACHED (predicted)
[patricia 44.9%] 0x40780:s32,1... CCS_VIOLATION

Each line corresponds to a faulted execution. The bracketed section indicates the program being run and the campaign's progress. The fault description follows; 0x40770:s32,1 for instance indicates injecting a single 32-bit skip fault at PC 0x40770. Then comes the exit status, which is usually NOT_REACHED (if the attacked PC is not reached during the entire execution), CCS_VIOLATION (attack detected by the countermeasure), or a crash signal. Green exit statuses means no security vulnerability, red statuses a security bypass.

Executions where the targeted PC is not reached take the longest, because there is no early exit/crash. In addition, a second execution is needed to check whether PC was actually reached or not (by injecting an illegal instruction at that address). Attacks that are not reached are also mostly uninteresting. Two mechanisms are in place to accelerate simulations by avoiding these unneeded executions:

  1. Prediction: when the script believes the targeted PC is likely not reached it will try the illegal instruction first to save one execution. If that guess is correct the script will print "(predicted)".
  2. Not-reached output file: the script will produce *-notreached.txt files in the output folder where it records PC values that are not reached. This way, only the first campaign deals with them. This is why s32,2 and s&r32 are so much faster than s32,1.

The results are summarized in out/ in files such as out/basicmath-campaign-fsh-ex-s32-1.txt:

= 272364
setting,EXITED,CCS_VIOLATION,CCS_BYPASSED,NOT_REACHED,SILENT_REPLACE,SIGSEGV,SIGILL,SIGTRAP,OTHER
fsh-ex-s32-1,0,1543,0,833,0,115,60,3,1
# OTHER for (266136, 's32,1'):
# summary of faults to be injected:
#   00040f98: s32 (k=1)
# /root/riscv_qemu_FSH: line 5: 50623 Bus error               "${ROOT}"/prefix/bin/qemu-riscv32 -cpu rv32-fsh "$@"

The first line indicates the campaign's progress and is used for resuming gracefully if the script is ever interrupted. The next two lines summarize the results, importantly in the absence of EXITED and CCS_BYPASSED outcomes (the red ones). Any non-conventional result is finally reported with a comment, which here includes a case of crash by SIGBUS.

TODO: Explain aggregate CSV file

Reading the output of performance simulations

TODO: Explain output of Gem5 simulations

False-positive QEMU “bugs”

The fault injection campaign script prints a result for each execution, such as CCS_VIOLATION or NOT_REACHED. When it doesn't recognize a result, it prints OTHER and logs the parameters along with the stdout/stderr of the QEMU invocation to the log file. On some machine there are many of these and they appear to be segfaults or assertion errors within QEMU itself, but this is mostly a red herring. The TL;DR is that QEMU is sometimes unable to catch exceptions from the emulated programs and crashes itself instead.

QEMU's control flow during execution is rather complicated due to its use of long jumps and the sort-of-concurrent nature of signal handling. The main mechanism can be summarized like this:

  1. When QEMU start running a fragment (block) of emulated code it calls cpu_exec(), which calls cpu_exec_setjmp() to set up a long jump buffer.
  2. If emulated code raises an exception or invokes a syscall, the long jump buffer is used to unwind back to cpu_exec_setjmp() and make the fragment return an appropriate result code. Note how this means that the SIGSEGV handler (like others) is instructed to go find the jump buffer and use it, and it would be a shame if the associated stack frame was gone by then.
  3. Once the fragment finishes, the result code (success, interrupted/killed by signal, syscall...) is checked and appropriate handling is performed; this includes running syscalls and handling exceptions. The handling exceptions part is why programs that segfault when emulated have a QEMU error report and not the kernel's default "Segmentation fault" message.
  4. Go back to 1 to execute the next fragment.

The problem is the following. Syscalls are emulated after the block ends, so if a syscall invocation crashes, the signal handler goes to fetch the jump buffer from cpu_exec_setjmp() which doesn't exist anymore because the fragment is done executing. Usually this results in QEMU failing its cpu == current_cpu assertion. Sometimes this results in a crash of the QEMU process itself.

At least 3 bugs I investigated led back to this:

  • brk() failing to add memory because the heap starts after .data and I had placed .text_css (which is read-only) sometime after .data, leading to a privilege segfault. This caused a long jump to the expired jump buffer and then later failing the cpu == current_cpu assertion.
  • open() failing to open files due to my glibc using different syscall numbers and different values for open flags than QEMU expected, with the same outcome.
  • A faulted program trying to brk((void *)3) leading to a segfault in the syscall emulation code and then failing that same assertion.

Manual build

Property-preserving LLVM

The compiler transforms the program into a protected form and is the core of the countermeasure. Pull the llvm-property-preserving submodule and build it with CMake. We configure to install in the prefix/ folder of this repo.

% git submodule update --init llvm-property-preserving
% cd llvm-property-preserving
% mkdir build && cd build
% cmake -G Ninja -DLLVM_ENABLE_PROJECTS="clang;lldb" -DLLVM_TARGETS_TO_BUILD="RISCV" -DCMAKE_INSTALL_PREFIX=../prefix -DCMAKE_BUILD_TYPE=Release -DLLVM_USE_LINKER=lld -DBUILD_SHARED_LIBS=ON -DLLVM_PARALLEL_LINK_JOBS=1 ../llvm
% ninja install

RISC-V GNU toolchain

In order to compile and link useful C programs, we need both standard library headers, the standard library, and the C runtime for the RISC-V target. Grab the 32-bit RISC-V toolchain from riscv-collab/riscv-gnu-toolchain, e.g. riscv32-elf-ubuntu-22.04-nightly-2023.01.31-nightly.tar.gz. Extract it and rename the riscv folder to riscv-custom (we're going to replace the linker).

% wget "https://github.com/riscv-collab/riscv-gnu-toolchain/releases/download/2023.01.31/riscv32-elf-ubuntu-22.04-nightly-2023.01.31-nightly.tar.gz"
% tar -xzf "riscv32-elf-ubuntu-22.04-nightly-2023.01.31-nightly.tar.gz"
% mv riscv riscv-custom
% rm "riscv32-elf-ubuntu-22.04-nightly-2023.01.31-nightly.tar.gz"

Custom linker

The countermeasure relies on computing checksums of fragments of code, which is only possible after relocation in the linker. So we use a slightly-modified linker. Pull the binutils-gdb submodule and build it.

% git submodule update --init binutils-gdb
% cd binutils-gdb
% mkdir build && cd build
% ../configure --prefix="$(realpath ../../riscv-custom)" --target="riscv32-unknown-elf"
% make -j4
% make install

Custom QEMU

We use QEMU to emulate the hardware support of the countermeasure and the injection of fetch skip attacks. Pull the qemu submodule and build it.

% git submodule update --init qemu
% cd qemu
% mkdir build && cd build
% ../configure --target-list=riscv32-linux-user
% ninja install

gem5 simulator

We can simulate the performance impact of the countermeasure using a processor simulator. Pull the gem5 submodule.

% git submodule update --init gem5
% cd gem5
% pip install --user -r requirements.txt
% scons build/RISCV/gem5.opt -j$(nproc)

Note: I was unsuccessful in getting a clean build on Arch; Ubuntu seems to be the most reasonable target. If you have more recent tools than Ubuntu make sure to use the develop branch. I suggest using the Docker setup (official instructions) as a fallback.

Generating the Docker image

The Docker image for this project is generated from the source files in this repository (including unstaged changes). Make sure all submodules are pulled. QEMU only builds out-of-git when using a release tarball, so we generate that first. We also clean any generated from the mibench folder, which will get copied.

% (cd qemu && scripts/archive-source.sh ../qemu.tar)
% make distclean
% podman build -t cc24-fetch-skips-hardening .

One way to export the image is then to save it and compress it.

% podman save cc24-fetch-skips-hardening:latest > cc24-fetch-skips-hardening.tar
% xz -vk -T0 cc24-fetch-skips-hardening.tar

After running the tests in a container, get reference results like so.