Skip to content
Snippets Groups Projects
README.md 19.8 KiB
Newer Older
# Fetch Skips Hardening

_This repository houses the artifact for a [CC'24](https://conf.researchr.org/home/CC-2024) paper titled “From low-level fault modeling (of a pipeline attack) to a proven hardening scheme”._

- [Principle and what this is](#principle-and-what-this-is)
- [How to reproduce results from the paper](#how-to-reproduce-results-from-the-paper)
- [Detailed description](#detailed-description)
- [Technical notes](#technical-notes)
- [Manual build](#manual-build)
- [Generating the Docker image](#generating-the-docker-image)

---

## Principle and what this is

“Fetch skips” is fault model coined by Alshaer et al. [[2023](https://hal.science/hal-04273995v1)] which describes one common way microprocessors react to a glitch in their clock input. A typical model for this would be “instruction skip”, i.e. just skip an instruction in the execution of a program. Fetch skips are more precise and involve skipping or repeating 4 bytes of code, which can produce more complex effects for unaligned and variable-sized instructions. This is of course a major problem for security, as basically any incorrect execution can lead to abuse.

This repository is a research project on protecting against fetch skips. It contains a modified compiler (LLVM/Clang 12), linker (GNU ld 2.40), emulator (QEMU 8.0) and processor simulator (Gem5 22.1) which implement a combined software/hardware countermeasure. The main result of the paper is a proof that running a program protected by these tools on a minimally-extended processor prevents exploitation of fetch skips by ensuring that every attack causes the program to stop or crash within a few instructions.

In addition to the compiler/linker for generating protected programs, we use the emulator to simulate attacks and experimentally check the security claims, and the simulator to evaluate performance impact. A subset of programs from the [MiBench benchmark suite](https://vhosts.eecs.umich.edu/mibench/) is used.

## How to reproduce results from the paper

To get straight to reproduced results on an x86\_64 machine, no questions asked, download the compressed Docker image and run the following commands. For details see below.
Sebastien Michelland's avatar
Sebastien Michelland committed
Download the image from Zenodo at: https://zenodo.org/records/10440364
% xz -d --stdout cc24-fetch-skips-hardening.tar.xz | sudo docker load
% sudo docker run -it localhost/cc24-fetch-skips-hardening
root@(container):~# make all_REF all_FSH run_REF run_FSH
root@(container):~# make -j$(nproc) campaigns
root@(container):~# make -j$(nproc) simulations
root@(container):~# make plots
% sudo docker cp (container):/root/out/campaigns.png .
% sudo docker cp (container):/root/out/perf.png .
You can then compare the products in `out/` with the reference products provided in `out-reference/`, or extract images (`out/campaigns.png` for Figure 9 and `out/perf.png` for Figure 10) from the container (`docker cp`) to view them and compare with originals. Below is the expected result.

_The performance metrics differ from the original Figure 10: see [Technical notes](#technical-notes) below._

![](out-reference/campaigns.png)

![](out-reference/perf.png)
The Docker image is just a build of this repository on Ubuntu 22.04; see [Detailed description](#detailed-description) for an explanation of the contents. To build natively without using Docker, please check the [Manual build](#manual-build) instructions and the [Dockerfile](Dockerfile) as a reference.
The first step is to build a reference version of the benchmark programs (`make all_REF`) without enabling Fetch Skips Hardening, and then protected versions (`make all_FSH`) using this project's compiler and linker passes. To verify that the protected programs still work as intended, we run both versions (`make run_REF run_FSH`) and check that the outputs are identical.
The second step is to run fault injection campaigns (`make -jN campaigns`). This uses a modified QEMU to emulate the effect of the fault and check that programs correctly stop or crash before the end of the attacked block. This fact is proven in the paper for single-fault injections (and proven up to the absence of checksum collisions for multi-fault injections) so the expected result is 100% fault resistance. See in [Technical notes](#technical-notes) for an explanation on how to read the outputs if you're interested.

The same command also runs injection campaigns on the reference (non-protected) programs to collect statistics about the percentage of attacks that result in a crash within the attacked block, as a baseline comparison. Predictably, these campaigns result in a lot of security "bypasses" since the countermeasure isn't active.

The third and last major step is to run performance simulations in Gem5 to compare the runtime of original and protected programs (`make -jN simulations`). We do this in a scenario where no fault is injected, since in an attack scenario performance cannot be measured due to the absence of a recovery mechanism in the countermeasure.
Finally, `make plots` will run 3 scripts. `summary.py` will generate CSV files in `out/` that aggregate test and simulation results, and two plot scripts will generate `out/campaigns.png` and `out/perf.png` which are used in the paper (except that rendering will not use the LaTeX backend if LaTeX is not installed, such as in the Docker image).
## Detailed description

This repository contains the following tools as submodules:

- [`llvm-property-preserving`](https://gricad-gitlab.univ-grenoble-alpes.fr/michelse/llvm-property-preserving): A Clang/LLVM mod by Son Tuan Vu [[2021](https://theses.hal.science/tel-03722753v1/)]. We ended up not using the mod here, so think of this as LLVM 12. We added the Xccs extension and a hardening pass to the RISC-V back-end and emitter.
- [`binutils-gdb`](https://gricad-gitlab.univ-grenoble-alpes.fr/michelse/binutils-gdb): The usual GNU toolchain. We added a new relocation type to precompute checksums of regions of code once they have been relocated.
- [`qemu`](https://gricad-gitlab.univ-grenoble-alpes.fr/michelse/qemu): We extended the emulator to support Xccs instructions/exceptions, and to simulate fetch errors by substituting bits during translation. We use it to validate security.
- [`gem5`](https://gricad-gitlab.univ-grenoble-alpes.fr/michelse/gem5): We extended the simulator to recognize Xccs instructions (in a non-faulty situation). We use it to validate performance. I also hacked it to replace 64-bit RISC-V instructions with their 32-bit counterparts.

Other files used in the build process include:

- `elf32lriscv_ccs.x`: A linker script for hardened programs. All it does is separate hardened code (all `.o` files except the runtime) from other code (the runtime and libraries) so that hardened code can be loaded at `0x40000` instead of the usual `0x10000`.
- `elf32lriscv_ref.x`: A linker script for reference (non-hardened) programs. It does even less, just separating user and library code within `.text` so that the campaign injection script is able to attack user code only. This makes campaigns much shorter and more comparable to the campaigns performed against hardened programs.
- `riscv_cc_REF`, `riscv_cc_FSH`: Wrappers around reference (non-hardened) and fetch-skips-hardened compilers.

Both linker scripts can be diffed against the original, which can be found at `./riscv-custom/riscv32-unknown-elf/lib/ldscripts/elf32lriscv.x` where it is placed when the custom binutils in the `binutils-gdb` folder is installed.

Other files used in the testing process include:

- `mibench`: Programs from the [MiBench benchmark suite](https://vhosts.eecs.umich.edu/mibench/index.html). We target the Industrial, Network and Security applications. The source files are original but the Makefiles are basically new.
- `riscv_qemu_REF`, `riscv_qemu_FSH`: Wrappers around QEMU and QEMU-with-FSH-support.
- `fault.py`: Script for running fault injection campaigns (details inside).
- `summary.py`: Script for aggregating security and performance test results.
- `plot_campaigns.py`, `plot_performance.py`: Scripts for generating figures with matplotlib based on aggregated results.

The Makefile just contains a few top-level commands for using the project.

## Technical notes

**Differences with paper's results in performance simulations**

Gem5's performance results differ between the paper's Figure 10 and this repository due to inconsistent memory behavior with reference binaries. The TL;DR is that reference programs built here are slower due to increased memory latency in the simulator, but not hardened programs, which results in smaller overheads by comparison.

We could not easily root cause this difference. However, we believe this does not undermine the general performance claims made in the paper because (1) these differences are more favorable for our countermeasure, and (2) the performance overhead for similar countermeasures is in the x3-x5 range.

Here are a few details.

- The most apparent difference is the amount of code read from memory. Take for instance `dijsktra` with cache: the original paper version read a total of 36 kB of code, the reference output provided here reads 107 kB. The latter is closer to the behavior of the hardened program, which reads 126 kB (the expected increase being one 8-byte CCS/checksum pair for every basic block).
- We confirmed that the reference linker script is not responsible (splitting `.text` in 2 or even putting user code at `0x30000` yields a 0.2% slowdown, but the full difference is 12.2%).
- `susan` is a clear outlier but appears to suffer from the same symptoms (ie. increased memory stalls).

The paper version of `dijkstra_small_REF` is included in `out-reference/`. It can be tested by symlinking to it from `mibench/network/dijkstra`. As this binary is nearly identical to all other versions we tested (with multiple compiler commits, linker scripts, etc.) we theorize that a subtlety in Gem5's memory timing model is playing tricks on us.

**Reading the output of the fault injection script**

Below in an excerpt from the fault campaign script's output (running in parallel).

```
[patricia 44.6%] 0x40770:s32,1... CCS_VIOLATION
[basicmath 48.5%] 0x41358:s32,1... NOT_REACHED
[patricia 44.7%] 0x40774:s32,1... CCS_VIOLATION
[patricia 44.8%] 0x40778:s32,1... CCS_VIOLATION
[patricia 44.9%] 0x4077c:s32,1... SIGILL
[susan 48.6%] 0x43ef8:s32,1... NOT_REACHED (predicted)
[patricia 44.9%] 0x40780:s32,1... CCS_VIOLATION
```

Each line corresponds to a faulted execution. The bracketed section indicates the program being run and the campaign's progress. The fault description follows; `0x40770:s32,1` for instance indicates injecting a single 32-bit skip fault at PC 0x40770. Then comes the exit status, which is usually `NOT_REACHED` (if the attacked PC is not reached during the entire execution), `CCS_VIOLATION` (attack detected by the countermeasure), or a crash signal. Green exit statuses means no security vulnerability, red statuses a security bypass.

Executions where the targeted PC is not reached take the longest, because there is no early exit/crash. In addition, a second execution is needed to check whether PC was actually reached or not (by injecting an illegal instruction at that address). Attacks that are not reached are also mostly uninteresting. Two mechanisms are in place to accelerate simulations by avoiding these unneeded executions:

1. Prediction: when the script believes the targeted PC is likely not reached it will try the illegal instruction first to save one execution. If that guess is correct the script will print "(predicted)".
2. Not-reached output file: the script will produce `*-notreached.txt` files in the output folder where it records PC values that are not reached. This way, only the first campaign deals with them. This is why `s32,2` and `s&r32` are so much faster than `s32,1`.

The results are summarized in `out/` in files such as `out/campaigns/basicmath-campaign-fsh-ex-s32-1.txt`:

```
= 272364
setting,EXITED,CCS_VIOLATION,CCS_BYPASSED,NOT_REACHED,SILENT_REPLACE,SIGSEGV,SIGILL,SIGTRAP,OTHER
fsh-ex-s32-1,0,1543,0,833,0,115,60,3,1
# OTHER for (266136, 's32,1'):
# summary of faults to be injected:
#   00040f98: s32 (k=1)
# /root/riscv_qemu_FSH: line 5: 50623 Bus error               "${ROOT}"/prefix/bin/qemu-riscv32 -cpu rv32-fsh "$@"
```

The first line indicates the campaign's progress and is used for resuming gracefully if the script is ever interrupted. The next two lines summarize the results, importantly in the absence of `EXITED` and `CCS_BYPASSED` outcomes (the red ones). Any non-conventional result is finally reported with a comment, which here includes a case of crash by `SIGBUS`.

The aggregate file `out/campaigns.csv` collects this information in a straightforward format.

**Reading the output of performance simulations**

Gem5 produces results for each simulation in a folder. Here, these are named `out/m5out/<program>_<cache>_<type>` where `cache` indicates whether the instruction cache was enabled and `<type>` whether the reference (REF) or hardened (FSH) binary was executed. Here we use the simplest metric, which is the total execution time reported in `stats.txt` as the `finalTick` value on line 3.

The aggregate file `out/perf.csv` collects the `finalTick` values for each program and cache/type configuration in a single table.

A related performance file (but generated by `summary.py`, not Gem5) is `out/size.csv`, which lists the size of the program's code in the reference and hardened binaries.
**False-positive QEMU “bugs”**

The fault injection campaign script prints a result for each execution, such as `CCS_VIOLATION` or `NOT_REACHED`. When it doesn't recognize a result, it prints `OTHER` and logs the parameters along with the stdout/stderr of the QEMU invocation to the log file. On some machine there are many of these and they appear to be segfaults or assertion errors _within QEMU itself_, but this is mostly a red herring. The TL;DR is that QEMU is sometimes unable to catch exceptions from the emulated programs and crashes itself instead.

QEMU's control flow during execution is rather complicated due to its use of long jumps and the sort-of-concurrent nature of signal handling. The main mechanism can be summarized like this:

1. When QEMU start running a fragment (block) of emulated code it calls `cpu_exec()`, which calls `cpu_exec_setjmp()` to set up a long jump buffer.
2. If emulated code raises an exception or invokes a syscall, the long jump buffer is used to unwind back to `cpu_exec_setjmp()` and make the fragment return an appropriate result code. Note how this means that the SIGSEGV handler (like others) is instructed to go find the jump buffer and use it, and it would be a _shame_ if the associated stack frame was gone by then.
3. Once the fragment finishes, the result code (success, interrupted/killed by signal, syscall...) is checked and appropriate handling is performed; this includes running syscalls and handling exceptions. The handling exceptions part is why programs that segfault when emulated have a QEMU error report and not the kernel's default "Segmentation fault" message.
4. Go back to 1 to execute the next fragment.

The problem is the following. Syscalls are emulated _after_ the block ends, so if a syscall invocation crashes, the signal handler goes to fetch the jump buffer from `cpu_exec_setjmp()` _which doesn't exist anymore because the fragment is done executing_. Usually this results in QEMU failing its `cpu == current_cpu` assertion. Sometimes this results in a crash of the QEMU process itself.

At least 3 bugs I investigated led back to this:
- `brk()` failing to add memory because the heap starts after `.data` and I had placed `.text_css` (which is read-only) sometime after `.data`, leading to a privilege segfault. This caused a long jump to the expired jump buffer and then later failing the `cpu == current_cpu` assertion.
- `open()` failing to open files due to my glibc using different syscall numbers and different values for open flags than QEMU expected, with the same outcome.
- A faulted program trying to `brk((void *)3)` leading to a segfault in the syscall emulation code and then failing that same assertion.

The compiler transforms the program into a protected form and is the core of the countermeasure. Pull the [`llvm-property-preserving`](https://gricad-gitlab.univ-grenoble-alpes.fr/michelse/llvm-property-preserving) submodule and build it with CMake. We configure to install in the `prefix/` folder of this repo.
```bash
% git submodule update --init llvm-property-preserving
% cd llvm-property-preserving
% mkdir build && cd build
% cmake -G Ninja -DLLVM_ENABLE_PROJECTS="clang;lldb" -DLLVM_TARGETS_TO_BUILD="RISCV" -DCMAKE_INSTALL_PREFIX=../prefix -DCMAKE_BUILD_TYPE=Release -DLLVM_USE_LINKER=lld -DBUILD_SHARED_LIBS=ON -DLLVM_PARALLEL_LINK_JOBS=1 ../llvm
In order to compile and link useful C programs, we need both standard library headers, the standard library, and the C runtime for the RISC-V target. Grab the 32-bit RISC-V toolchain from [`riscv-collab/riscv-gnu-toolchain`](https://github.com/riscv-collab/riscv-gnu-toolchain/releases), e.g. `riscv32-elf-ubuntu-22.04-nightly-2023.01.31-nightly.tar.gz`. Extract it and rename the `riscv` folder to `riscv-custom` (we're going to replace the linker).

```bash
% wget "https://github.com/riscv-collab/riscv-gnu-toolchain/releases/download/2023.01.31/riscv32-elf-ubuntu-22.04-nightly-2023.01.31-nightly.tar.gz"
% tar -xzf "riscv32-elf-ubuntu-22.04-nightly-2023.01.31-nightly.tar.gz"
% mv riscv riscv-custom
% rm "riscv32-elf-ubuntu-22.04-nightly-2023.01.31-nightly.tar.gz"
```

**Custom linker**

The countermeasure relies on computing checksums of fragments of code, which is only possible after relocation in the linker. So we use a slightly-modified linker. Pull the [`binutils-gdb`](https://gricad-gitlab.univ-grenoble-alpes.fr/michelse/binutils-gdb) submodule and build it.

```bash
% git submodule update --init binutils-gdb
% cd binutils-gdb
% mkdir build && cd build
% ../configure --prefix="$(realpath ../../riscv-custom)" --target="riscv32-unknown-elf"
% make -j4
```

**Custom QEMU**

We use QEMU to emulate the hardware support of the countermeasure and the injection of fetch skip attacks. Pull the [`qemu`](https://gricad-gitlab.univ-grenoble-alpes.fr/michelse/qemu) submodule and build it.

```bash
% git submodule update --init qemu
% cd qemu
% mkdir build && cd build
% ../configure --target-list=riscv32-linux-user
```

**gem5 simulator**

We can simulate the performance impact of the countermeasure using a processor simulator. Pull the [`gem5`](https://gricad-gitlab.univ-grenoble-alpes.fr/michelse/gem5) submodule.

```bash
% git submodule update --init gem5
% cd gem5
% pip install --user -r requirements.txt
% scons build/RISCV/gem5.opt -j$(nproc)
% ln -s ../../gem5/build/RISCV/gem5.opt ../prefix/bin
```

Note: I was unsuccessful in getting a clean build on Arch; Ubuntu seems to be the most reasonable target. If you have more recent tools than Ubuntu make sure to use the `develop` branch. I suggest using the Docker setup ([official instructions](https://www.gem5.org/documentation/general_docs/building)) as a fallback.

## Generating the Docker image

The Docker image for this project is generated from the source files in this repository (including unstaged changes). Make sure all submodules are pulled. QEMU only builds out-of-git when using a release tarball, so we generate that first. We also clean any generated from the `mibench` folder, which will get copied.

```bash
% (cd qemu && scripts/archive-source.sh ../qemu.tar)
% podman build -t cc24-fetch-skips-hardening .
```

One way to export the image is then to save it and compress it.

```bash
% podman save cc24-fetch-skips-hardening:latest > cc24-fetch-skips-hardening.tar
% xz -vk -T0 cc24-fetch-skips-hardening.tar
```

After running the tests in a container, get reference results like so.

```bash
% podman cp $containerID:/root/out out-reference
```