This is Part 6 of my Cortex‑M7 without hardware series.

Parts 1–5 used Renode to emulate an STM32F746 and GDB to poke around the running firmware. Renode is excellent — full peripheral models, scripted test benches, a rich ecosystem — but it is not small. The installer pulls in Mono, a pile of .NET assemblies, and a handful of native libraries. If all you want is “does my firmware boot and produce the right output?”, that is a lot to install.

QEMU covers that narrower use case with a single binary that is already in most distribution package managers. This post shows what changes when you swap Renode for QEMU, focusing on the three things that genuinely differ: the target board (and therefore the linker script), ARM semihosting, and the launch command.

TL;DR: Swap stm32f746mps2-an500, move flash to 0x00000000 in the linker script, enable semihosting, and run: qemu-system-arm -machine mps2-an500 -cpu cortex-m7 -nographic -semihosting-config enable=on,target=native -kernel your.elf

Why QEMU

QEMU (qemu-system-arm) is available as a single package on practically every Linux distribution:

# Debian / Ubuntu
sudo apt install qemu-system-arm

# Fedora
sudo dnf install qemu-system-arm

# Arch
sudo pacman -S qemu-system-arm

Also on macOS (Homebrew) and Windows (MSYS2/Chocolatey), QEMU is similarly one-install.

No extra runtimes, no download scripts, no version pinning ceremony. For CI or a minimal dev container this matters a lot.

The trade-off is that QEMU has far fewer ARM Cortex-M peripheral models than Renode. There is no STM32F746 machine in QEMU. What it does have is the MPS2-AN500 board — a Cortex-M7 FPGA image from Arm’s MPS2+ development platform. 1 It has enough RAM and flash to run bare-metal firmware and, crucially, it supports semihosting (like Renode), which is the mechanism this project uses for output.

Target board: MPS2-AN500

The Renode series targeted the STM32F746. That choice was driven by Renode’s detailed STM32 peripheral models and GDB integration. For QEMU the closest equivalent Cortex-M7 machine is mps2-an500.

The memory map is different:

RegionSTM32F746 (Renode)MPS2-AN500 (QEMU)
FLASH0x08000000, 1 MB0x00000000, 4 MB
RAM0x20000000, 320 KB0x20000000, 4 MB

The RAM base address stays the same, but flash moves to 0x00000000. This is the single most important change from the linker script perspective.

Changes to the linker script

The Renode project’s linker.ld had flash at 0x08000000. For MPS2-AN500 it becomes:

/* Memory layout for MPS2-AN500 */
MEMORY
{
    FLASH (rx)  : ORIGIN = 0x00000000, LENGTH = 4M
    RAM   (rwx) : ORIGIN = 0x20000000, LENGTH = 4M
}

ENTRY(Reset_Handler)

/* Export a symbol for the initial MSP value (top of RAM). */
_estack = ORIGIN(RAM) + LENGTH(RAM);

SECTIONS
{
    .isr_vector :
    {
        . = ALIGN(8);
        KEEP(*(.isr_vector))
        . = ALIGN(8);
    } > FLASH

    .text : {
        *(.text*)               /* Code */
        *(.rodata*)             /* Read-only data */
    } > FLASH
}

Two things worth noting beyond the address change:

  1. _estack is now exported as a linker symbol (_estack = ORIGIN(RAM) + LENGTH(RAM)). The Renode project hardcoded the stack top directly in startup.c as 0x20000000 + 320 * 1024. Exporting it from the linker script is cleaner — the startup code just references the symbol and the linker keeps the memory layout in one place.

  2. No .data or .bss sections. The new main.cpp has no globals that need copying or zeroing, so those sections are simply not needed. The linker script stays minimal.

Changes to startup.c

The only changes from the Part 2 startup is how the initial stack pointer entry is computed, so instead of a magic number it uses the _estack symbol defined by the linker script. Also the declarations are grouped with comments.

#include <stdint.h>

/* Linker symbols defined in linker.ld */
// top of RAM
extern uintptr_t                                        _estack;

/* C/C++ entrypoints */
// Main
extern int main(void);

void                                                    Reset_Handler(void);
void                                                    Default_Handler(void);

__attribute__((section(".isr_vector"))) const uintptr_t vector_table[] = {
  (uintptr_t) &_estack,        // Initial stack pointer
  (uintptr_t) Reset_Handler,   // Reset handler
  (uintptr_t) Default_Handler, // NMI
  (uintptr_t) Default_Handler, // HardFault
};

void Reset_Handler(void)
{
  main();
}

void Default_Handler(void)
{
  while (1)
    ;
}

Everything else is identical to Part 2.

Semihosting

The Renode series demonstrated output by attaching GDB and inspecting memory. That works, but it requires a running GDB session. Semihosting gives you a simpler path: the firmware itself can print strings and signal a clean exit, and QEMU handles both without any debugger attached.

How it works

ARM semihosting is a debug protocol integrated into many ARM architectures, including the Cortex-M, that allows communication between an embedded target and a host computer. 2 When the CPU executes BKPT #0xAB, QEMU (or a hardware debugger) intercepts the trap. At that point:

  • r0 holds an operation number (what to do)
  • r1 holds a pointer to the arguments

QEMU performs the operation on the host side and returns the result in r0. The firmware never touches a UART register. The full low-level call looks like this:

[[nodiscard]] static int32_t semihosting_call(uint32_t op, const void* arg)
{
  int32_t result;
  asm volatile("mov r0, %[op]  \n" // r0 = operation number
               "mov r1, %[arg] \n" // r1 = pointer to arguments
               "bkpt #0xAB     \n" // trap into QEMU / debugger
               "mov %[res], r0 \n" // r0 = return value
               : [res] "=r"(result)
               : [op] "r"(op), [arg] "r"(arg)
               : "r0", "r1", "memory");
  return result;
}

As this is an important function, I have added [[nodiscard]] keyword, so that compiler will issue a warning about ignoring the return value. However it is possible to discard it with cast to (void) as seen later. 3

The "memory" clobber tells the compiler that the inline assembly may read or write memory arbitrarily, preventing it from reordering or caching values across the call. 4

SYS_WRITE0 (op 0x04)

Prints a null-terminated string. r1 points directly to the string — no length needed, QEMU walks to the null terminator itself. 2

static void sh_print(const char* msg)
{
  (void) semihosting_call(0x04, msg); // Discard the value with (void)
}

SYS_EXIT_EXTENDED (op 0x20)

Signals QEMU to shut down. r1 points to a two-element array: {reason, exit_code}. The reason 0x20026 is ADP_Stopped_ApplicationExit, the standard “application finished normally” code. 2

[[noreturn]] static void sh_exit()
{
  // The exit call takes a struct: {reason, exit_code}.
  // 0x20026 = ADP_Stopped_ApplicationExit (normal termination).
  // 0 = Success
  const uint32_t params[2] = {0x20026, 0};
  (void) semihosting_call(0x20u, params); // Discard the value with (void)
  __builtin_unreachable();
}

Without sh_exit() QEMU would keep running after main returns — or more precisely, it would spin in whatever garbage follows the ret instruction. Calling sh_exit() makes the process terminate cleanly with a zero exit code, which is what you want in CI.

[[noreturn]] is a C++11 attribute that tells the compiler this function never returns, letting it skip generating any return path after the call. 5

__builtin_unreachable() after the semihosting call handles the case where the compiler cannot statically prove the asm never returns — without it, the compiler would warn about a [[noreturn]] function that might fall through. 6

The new main.cpp

Putting it all together:

int main()
{
  sh_print("Hello from Cortex-M7 on QEMU!\n");
  sh_print("Exiting...\n");
  sh_exit();
  sh_print("Will not print!\n");
}

Print, exit, done. The last sh_print is unreachable (and most likely will be optimized away) but is a good sanity check at the source level that indicates that sh_exit() terminates.

Building and running

Clone the project and configure it:

git clone \
  --depth 1 \
  --branch blog-minimal-0.1.2 --single-branch \
  https://gitlab.com/sorhanp/cortex-m7-qemu.git \
  cortex-m7-qemu-blog-minimal

cd cortex-m7-qemu-blog-minimal

cmake --preset arm-none-eabi-debug

Build and launch QEMU in one step via the CMake utility target:

cmake --build --preset arm-gcc-debug-build --target run-qemu

Expected output:

Hello from Cortex-M7 on QEMU!
Exiting...

And then QEMU exits. No window, no interactive session — it just runs to completion.

What the QEMU command does

The run-qemu CMake target expands to:

qemu-system-arm \
  -machine mps2-an500 -cpu cortex-m7 \
  -nographic -monitor none \
  -semihosting-config enable=on,target=native \
  -kernel cortex-m7-qemu.elf

Flag by flag:

  • -machine mps2-an500 -cpu cortex-m7 — selects the board and core
  • -nographic -monitor none — suppresses the GUI window and the QEMU monitor prompt; all I/O goes to the terminal
  • -semihosting-config enable=on,target=native — enables semihosting and routes output to the host process (native), so sh_print writes to stdout 7
  • -kernel — loads the ELF, sets up the entry point, and starts execution

Renode vs QEMU — quick comparison

RenodeQEMU
InstallationMono + dependenciesSingle package
STM32 peripheral modelsYesNo
Board used hereSTM32F746MPS2-AN500
GDB supportYesYes (-gdb tcp::3333)
SemihostingYesYes
Scripted test benchesYes (.resc)Limited
Good forDeep peripheral testingQuick boot/output checks

Both tools support GDB, so the debugging session from Part 5 would work on QEMU too — by using -gdb tcp::3333 -S flag to pause at reset and wait for a debugger to attach.

Next

In Part 7 I will start actual hardening of the system by adding things that are not present, such as .data or .bss sections to linker script. Stay tuned!

References