Chapter 12: The Minimal Machine Model
Every x86_64 virtual machine carries baggage from 1981. The IBM PC used a Zilog 8253 timer, an Intel 8259A interrupt controller, and a National Semiconductor 16550 UART. Later revisions cascaded a second 8259A for more IRQ lines and added an Intel 8042 to manage keyboards. These chips persisted through PC/AT, through the ISA bus, through PCI, through ACPI, and into the UEFI era. Modern Linux kernels probe for them on every x86_64 boot. They are not optional: the kernel's TSC calibration path gates on the presence of a legacy PIC, and the early console driver writes to the UART before any device tree or ACPI table has been parsed.
So the question for a VMM trying to offer the smallest safe attack surface is
not "which devices do we want?" but "which devices can we remove without
breaking the kernel?" For Firecracker the answer is: no PCI bus, no ACPI
power management, no BIOS or firmware — but yes to a 16550A UART at COM1,
yes to an i8042 stub that can reset the CPU, and yes to the three interrupt
controllers KVM provides in-kernel for free. This chapter traces why those
choices are forced, how the VMM routes I/O to the right emulator, and where
QEMU's microvm machine type arrives at the same destination via a different
road.
The Devices the Kernel Still Expects
16550A UART: The Console That Runs Before the Console
Linux's earlycon driver is the first piece of kernel code that can write to
a human-readable output. It runs before the page allocator, before printk's
ring buffer, and before any PCI or USB enumeration. The only hardware it can
target is a memory-mapped or I/O-port UART at a fixed, known address. For
x86_64, that address is 0x3F8 — COM1, as it has been since the IBM PC.
The kernel boot parameter earlycon=uart8250,io,0x3f8 tells the early console
driver to use PIO mode at that address. The kernel's serial console, activated
later by console=ttyS0,115200, uses the same register layout. Neither path
negotiates the UART's location: they assume COM1, assume GSI 4, and start
writing to the THR register at offset 0 from the base address.
Firecracker satisfies this expectation with the vm-superio crate
(rust-vmm/vm-superio), which emulates the 16550A register file. The
constants are in src/vmm/src/device_manager/legacy.rs:
SERIAL_PORT_ADDRESS = 0x3f8, SERIAL_PORT_SIZE = 0x8, and COM1_GSI = 4.
The emulated UART presents itself as a 16550A by setting bits 7:6 of the
Interrupt Identification Register (IIR) to 0b11 on every IIR read
(IIR_FIFO_BITS = 0b1100_0000 in vm-superio/src/serial.rs). That is
exactly the check serial8250_config_port() in 8250_port.c performs — it
tests IIR bits 7:6 for 0b11 to classify the device as 16550A-compatible. The internal
FIFO is 64 bytes (FIFO_SIZE = 0x40). The default baud divisor is
DEFAULT_BAUD_DIVISOR_LOW = 0x0C with DEFAULT_BAUD_DIVISOR_HIGH = 0x00:
divisor 12 at the 1.8432 MHz base clock gives 9600 bps, though the guest
kernel ignores the actual baud rate because the host-side serializer does not
depend on it.
The register map is eight bytes wide:
| Offset | Name | Direction | Notes |
|---|---|---|---|
| 0 | RBR / THR | R / W | Receive buffer / transmit holding; DLAB_LOW when LCR bit 7 set |
| 1 | IER / DLAB_HIGH | R / W | Interrupt Enable (bits 3:0 valid); divisor high byte when DLAB set |
| 2 | IIR / FCR | R / W | IIR on read: bits 7:6 = 11 signals 16550A |
| 3 | LCR | R / W | Line Control; bit 7 = DLAB (Divisor Latch Access Bit) |
| 4 | MCR | R / W | Modem Control (5 bits) |
| 5 | LSR | R | Line Status; bit 5 = THR empty (earlycon polls this before each byte) |
| 6 | MSR | R | Modem Status: CTS=0x10, DSR=0x20, RI=0x40, DCD=0x80 |
| 7 | SCR | R / W | Scratch Register |
The earlycon path never enables interrupts. It polls LSR bit 5 ("THR empty")
before writing each byte to offset 0. The interrupt-driven path — used once the
full serial driver loads — sets IER_RDA_BIT = 0b0000_0001 to be notified of
received data and IER_THR_EMPTY_BIT = 0b0000_0010 to be notified when the
transmit holding register drains. Interrupts are delivered via a Trigger
trait backed by a Linux eventfd; KVM translates the eventfd signal into an
IRQ injection on GSI 4.
One production wrinkle: Firecracker's design document notes that the serial
console is disabled in production builds because it may expose guest data through
timing side channels. The emulated register file remains, but guest-side output
is not forwarded to the host. Development and debugging images re-enable it via
the boot configuration. From Firecracker v1.8.0 onward, the UART is also
described in the ACPI DSDT as _SB_.COM1 with EISA HID PNP0501, so an ACPI-
aware guest can discover it without a kernel command-line hint.
i8042: A Controller Present Only to Reboot
The Intel 8042 was the PS/2 keyboard controller in every PC/AT. It scanned
keys, debounced signals, and shared the CPU IRQ line (GSI 1) with the PS/2
mouse. In a microVM, none of that matters. No human is typing into a Firecracker
instance. The i8042 is emulated for one reason: the Linux kernel, when asked to
reboot with the kernel parameter reboot=k, resets the CPU by writing command
0xFE to the i8042 command port at 0x64.
Firecracker's i8042 implementation is in
src/vmm/src/devices/legacy/i8042.rs. It occupies five bytes starting at
I8042_KDB_DATA_REGISTER_ADDRESS = 0x060: the data register at 0x60 and
the status/command register at offset 4 (OFS_STATUS = 4), which maps to
0x64. IRQ GSI 1 (KBD_EVT_GSI = 1) is wired for the keyboard port; the
PS/2 mouse (IRQ 12) is not emulated.
The command set is minimal:
| Constant | Value | Effect |
|---|---|---|
CMD_READ_CTR |
0x20 |
Read the control register |
CMD_WRITE_CTR |
0x60 |
Write the control register |
CMD_READ_OUTP |
0xD0 |
Read the output port |
CMD_WRITE_OUTP |
0xD1 |
Write the output port |
CMD_RESET_CPU |
0xFE |
Signal CPU reset; triggers the reset_evt eventfd |
When CMD_RESET_CPU arrives, Firecracker signals a reset_evt eventfd.
The VMM's event loop receives that signal and shuts down the VM process
gracefully. From the guest's perspective, the CPU stops; from the host's
perspective, the firecracker binary exits cleanly. The control register
keeps two bits set permanently: CB_POST_OK = 0x04 (Power-On Self Test
passed) and CB_KBD_INT = 0x01 (keyboard interrupt enabled), which is
sufficient to prevent the Linux keyboard driver from looping waiting for
POST completion.
The only scan codes the emulator produces are for Ctrl+Alt+Del:
KEY_CTRL = 0x0014, KEY_ALT = 0x0011, KEY_DEL = 0xE071. The emulator
is not a general PS/2 keyboard; it cannot type. From v1.8.0 onward, it appears
in the ACPI DSDT as _SB_.PS2_ with HID PNP0303, with I/O resources at
0x0060 (size 1) and 0x0064 (size 1, the latter described in the source as
"Fake a command port so Linux stops complaining").
PICs, PIT, and LAPIC: KVM's In-Kernel Devices
Three more legacy devices are present in every Firecracker VM, but Firecracker does not emulate them itself — KVM does, in the kernel, before any userspace instruction executes. Firecracker's own design document describes them this way: "In addition to the Firecracker provided device models, guests also see the Programmable Interrupt Controllers (PICs), the I/O Advanced Programmable Interrupt Controller (IOAPIC), and the Programmable Interval Timer (PIT) that KVM supports."
Creating the interrupt fabric. A single ioctl,
KVM_CREATE_IRQCHIP (_IO(0xAE, 0x60)), instantiates two cascaded i8259A
PICs — master at ports 0x20/0x21, slave at ports 0xA0/0xA1 — and an
IOAPIC. It also arranges for every subsequently-created vCPU to have a Local
APIC. After this call, GSIs 0–15 are routed to both the PIC and the IOAPIC;
GSIs 16–23 go to the IOAPIC only. Critically, KVM_CREATE_IRQCHIP must be
called before KVM_CREATE_VCPU on x86_64; the ordering is enforced by KVM
and documented in the KVM API under §4.24.
Firecracker checks for the required KVM capabilities — KVM_CAP_PIT2 and
KVM_CAP_PIT_STATE2 — during startup in src/vmm/src/arch/x86_64/kvm.rs.
Missing either capability aborts VM creation. Then setup_irqchip() in
src/vmm/src/arch/x86_64/vm.rs calls create_irq_chip() followed immediately
by create_pit2(kvm_pit_config { flags: KVM_PIT_SPEAKER_DUMMY, ..Default::default() }).
The PIT. KVM_CREATE_PIT2 (_IOW(0xAE, 0x77, struct kvm_pit_config))
creates the i8254 Programmable Interval Timer. Counter 0 at port 0x40 drives
system timer IRQ 0; counter 1 at 0x41 is vestigial (originally DRAM refresh);
counter 2 at 0x42 gates the PC speaker via port 0x61. The tick rate is
PIT_TICK_RATE = 1193182 Hz (defined in include/linux/timex.h). The
KVM_PIT_SPEAKER_DUMMY flag in the flags field tells KVM to emulate the
speaker port at 0x61 in-kernel, avoiding a userspace VM exit every time
Linux probes it.
Why the PIT matters for TSC calibration. The PIT is not just a timer; it
is the ruler against which the kernel measures the TSC's frequency.
arch/x86/kernel/tsc.c contains two calibration functions,
pit_calibrate_tsc() and quick_pit_calibrate(). Both gate counter 2 via
port 0x61, program the measurement latch via port 0x43, then poll the PIT
MSB in a tight loop (up to CAL_PIT_LOOPS = 1000 or CAL2_PIT_LOOPS = 5000
iterations) measuring elapsed TSC ticks against a 10 ms window
(CAL_MS = 10). The formula is:
kHz = ((t2 - t1) * PIT_TICK_RATE) / (latch * 1000)
where CAL_LATCH = PIT_TICK_RATE / (1000 / CAL_MS).
Both functions guard on has_legacy_pic(). quick_pit_calibrate() returns 0
immediately when no legacy PIC is present. pit_calibrate_tsc() instead falls
back to a udelay-based wait but still skips the PIT measurement loop. Either
way, removing the PIT pushes the kernel onto CPUID-based or udelay-based
frequency estimation — slower and less precise. Preserving the PIT preserves
the fast, accurate TSC calibration path.
Snapshot and restore. Because the interrupt controllers are in-kernel,
saving their state requires ioctls. Firecracker's save_state() calls
get_irqchip() with KVM_GET_IRQCHIP three times — once each for
KVM_IRQCHIP_PIC_MASTER, KVM_IRQCHIP_PIC_SLAVE, and
KVM_IRQCHIP_IOAPIC — writing into a struct kvm_irqchip { chip_id; chip }.
The restore path calls set_irqchip() with KVM_SET_IRQCHIP three times in
the same order. This symmetric protocol is one reason Firecracker's snapshotting
is fast: no in-process state machine needs serializing, only three kernel
structures.
Fixed MMIO addresses. The IOAPIC sits at IOAPIC_ADDR = 0xFEC0_0000 and
the LAPIC at APIC_ADDR = 0xFEE0_0000 (from src/vmm/src/arch/x86_64/layout.rs).
KVM also needs a protected TSS at 0xFFFB_D000 (the KVM TSS region), which is
placed outside any guest-accessible memslot.
GSI allocation. With the full interrupt fabric in place, Firecracker
allocates GSIs as follows (from layout.rs):
| Range | Owner |
|---|---|
| GSI 0–4 | Reserved; COM1 = GSI 4, i8042 keyboard = GSI 1, timer IRQ 0 = GSI 0 |
| GSI 5–23 | virtio-mmio device slots |
| GSI 24–4095 | MSI (PCIe, since v1.13.0) |
The MMIO Bus and Device Routing
Having enumerated the legacy devices, the next question is mechanical: when the
guest executes IN 0x3F8, AL or writes to a virtio queue-notify register, how
does the instruction reach its emulator?
Exits to Userspace
Under VMX, two classes of instruction produce exits that land in the VMM.
An IN or OUT instruction generates exit reason 30
(EXIT_REASON_IO_INSTRUCTION). A guest memory access to an address with no
backing EPT mapping generates an EPT violation (exit reason 48) or EPT
misconfiguration (exit reason 49); if no in-kernel handler claims the fault,
KVM promotes it to a KVM_EXIT_MMIO and returns to userspace.
Both classes surface through KVM_RUN (_IO(0xAE, 0x80)). After KVM_RUN
returns, the VMM reads kvm_run.exit_reason. For I/O port accesses it is
KVM_EXIT_IO = 2; for MMIO accesses it is KVM_EXIT_MMIO = 6. The sub-structs
in the kvm_run page describe what happened:
/* KVM_EXIT_IO */
struct {
__u8 direction; /* KVM_EXIT_IO_IN=0, KVM_EXIT_IO_OUT=1 */
__u8 size; /* operand size: 1, 2, or 4 bytes */
__u16 port; /* I/O port number */
__u32 count; /* repetition count for INS/OUTS */
__u64 data_offset; /* byte offset into kvm_run mapping to the data buffer */
} io;
/* KVM_EXIT_MMIO */
struct {
__u64 phys_addr; /* guest physical address */
__u8 data[8]; /* up to 8 bytes of data */
__u32 len; /* access width in bytes */
__u8 is_write; /* 1 = write, 0 = read */
} mmio;
Source: include/uapi/linux/kvm.h in torvalds/linux.
MMIO space is not configured explicitly. Any guest physical address (GPA) range
not covered by a KVM_SET_USER_MEMORY_REGION memslot
(_IOW(0xAE, 0x46, struct kvm_userspace_memory_region)) produces
KVM_EXIT_MMIO on access. Firecracker calls KVM_SET_USER_MEMORY_REGION to
map RAM (and, from v1.8.0, the page holding the RSDP); everything else is MMIO
by omission.
Firecracker's Software Bus
The exit reason alone does not route the access. The VMM needs a data structure
that maps port or address ranges to device emulators. Firecracker implements its
own Bus in src/vmm/src/vstate/bus.rs. It is a
RwLock<BTreeMap<BusRange, Weak<dyn BusDeviceSync>>>: a sorted tree mapping
half-open address ranges to device references. Lookup uses the B-tree's
predecessor operation — range(..= BusRange::new(addr, 1)).next_back() — and
then checks that the address falls within the range's end; the whole lookup is
O(log n) in the number of registered devices.
Each vCPU holds two buses: pio_bus: Option<Arc<Bus>> for I/O port accesses
(x86_64 only) and mmio_bus: Option<Arc<Bus>> for MMIO accesses. The vCPU
run loop dispatches exits like this:
(Source: src/vmm/src/vstate/vcpu.rs.)
Devices are stored as Weak<dyn BusDeviceSync> — the bus does not keep devices
alive; the VMM's owner struct holds the Arc. The BusDevice trait exposes
read(&mut self, base: u64, offset: u64, data: &mut [u8]) and
write(&mut self, base: u64, offset: u64, data: &[u8]) -> Option<Arc<Barrier>>.
An unresolved address is logged as a warn! but returns
VcpuEmulation::Handled, so an out-of-range access does not crash the VM.
The relationship between the exit path, the bus, and the device emulators looks like this:
(IN/OUT or memory access)"] subgraph kernel_trap["KVM in-kernel — no VM exit"] pic["i8259 PIC
ports 0x20/0x21, 0xA0/0xA1"] pit["i8254 PIT
ports 0x40–0x43"] end kvm["KVM_RUN returns
exit_reason = KVM_EXIT_IO or KVM_EXIT_MMIO"] vcpu["vCPU run loop
in vcpu.rs"] subgraph pio_bus["pio_bus (PIO) — userspace emulation"] uart["16550A UART
0x3F8–0x3FF"] i8042["i8042 stub
0x60–0x64"] end subgraph mmio_bus["mmio_bus (MMIO) — userspace emulation"] vtmmio["virtio-mmio slots
0xC000_0000+"] ioapic["IOAPIC
0xFEC0_0000"] apic["LAPIC
0xFEE0_0000"] end guest -- "PIC/PIT port access" --> kernel_trap guest -- "other IN/OUT or MMIO" --> kvm kvm --> vcpu vcpu -- "IoIn / IoOut" --> pio_bus vcpu -- "MmioRead / MmioWrite" --> mmio_bus
The Fast Path: KVM_IOEVENTFD
Virtio queue kick notifications would generate a KVM_EXIT_MMIO on every
VIRTIO_MMIO_QUEUE_NOTIFY write (register offset 0x050 from the device's
base address). At high I/O rates, round-tripping through KVM_RUN for each
kick is expensive. KVM_IOEVENTFD (_IOW(0xAE, 0x79, struct kvm_ioeventfd))
bypasses this. It registers an eventfd with KVM for a specific address range;
when the guest writes to that range, KVM signals the eventfd in-kernel and
immediately re-enters the guest. The device backend thread wakes from a
read(eventfd_fd) and processes the queue without the VMM loop ever seeing the
exit. This is how Firecracker wires virtio queue kicks: driver writes
VIRTIO_MMIO_QUEUE_NOTIFY → KVM signals eventfd → backend thread drains the
queue → no userspace round-trip.
The Physical Address Map
With RAM, MMIO, and the fixed IOAPIC and LAPIC addresses laid out, the guest physical address space for a typical Firecracker VM looks like this:
flowchart TB
subgraph gpa["Guest Physical Address Space (x86_64)"]
lo["0x0000_0000 – RAM<br/>(up to ~3 GiB)"]
mmio32["0xC000_0000 – 32-bit MMIO window (1 GiB)<br/>virtio-mmio slots, 4 KiB each"]
ioapic_block["0xFEC0_0000 – IOAPIC"]
lapic_block["0xFEE0_0000 – LAPIC"]
hi_ram["0x1_0000_0000 – High RAM<br/>(if guest > 3 GiB)"]
mmio64["256 GiB – 64-bit MMIO window (256 GiB)"]
end
Each virtio-mmio slot is 4 KiB (MMIO_LEN = 0x1000), starting at
BOOT_DEVICE_MEM_START = 0xC000_0000 for the first (boot) device and
MEM_32BIT_DEVICES_START = 0xC000_1000 for subsequent ones. There is no PCI
configuration space, no PCIe ECAM window, and no Option ROM area — just RAM,
virtio slots, and the two APIC regions.
What Firecracker Drops
No BIOS
Classical x86 boot requires the CPU to start in 16-bit real mode, read a
boot sector from disk, hand control to a bootloader, and eventually transition
to 64-bit protected mode — all mediated by a BIOS ROM that QEMU or other VMMs
typically provide as bios.bin or OVMF.fd. Firecracker skips the entire
stack.
Instead, Firecracker constructs a struct boot_params (the "zero page" of the
Linux x86 boot protocol, currently version 2.15 as of kernel 5.5) and places it
at guest physical address 0x7000. It sets the %rsi register to point to
that address and jumps directly to the 64-bit kernel entry point at the load
address plus 0x200 — the standard offset for a bzImage's 64-bit entry stub.
No real-mode code executes, no BIOS ROM is mapped, and the boot_params fields
for legacy I/O devices are not used; the serial console, for instance, is
configured entirely via kernel command line, not via setup_header.
Alternatively, Firecracker can use the Xen PVH Direct Boot ABI (added in
Firecracker v1.12.0). In the PVH path, Firecracker writes an hvm_start_info
structure at PVH_INFO_START = 0x6000 with magic value
XEN_HVM_START_MAGIC_VALUE = 0x336e_c578 in %rbx. The kernel must be
compiled with CONFIG_PVH=y, available since Linux 5.0; the ELF binary then
contains a PVH entry point in a PT_NOTE segment. Both paths eliminate
firmware entirely; they differ only in the handshake structure the kernel
expects to find before its first instruction.
No PCI Bus
Firecracker's virtio devices use the MMIO transport defined in virtio
specification §4.2 (OASIS virtio 1.2, Committee Specification 01), not the
PCI transport. There is no PCI host bridge, no PCI configuration space
mechanism (neither CF8/CFC port-IO nor PCIe ECAM MMIO), and no PCI enumeration.
The MMIO transport has no self-describing enumeration: a device at a given
address does not announce its type or existence on the bus. Discovery happens
through side channels. Before Firecracker v1.8.0, this meant kernel command-
line slugs of the form virtio_mmio.device=512@0xC0001000:6, one per device,
injected into the kernel command line by the VMM at boot. From v1.8.0 onward,
an ACPI DSDT table enumerates virtio devices with their MMIO addresses and
assigned GSIs, so the guest kernel does not need the command-line hint.
PCI was added as an opt-in in Firecracker v1.13.0 via --enable-pci. When
enabled, VirtIO devices use a PCI VirtIO transport instead; MMIO remains the
default. Skipping the PCI host bridge and configuration space mechanism removes
a substantial slice of emulated attack surface and eliminates the enumeration
overhead that PCI scanning adds to early boot.
No ACPI Power Management
Firecracker added basic ACPI table support in v1.8.0: an FADT, XSDT, MADT (for
the LAPIC and IOAPIC), and DSDT describing virtio and legacy devices. The RSDP
pointer sits at RSDP_ADDR = 0x000E_0000. But the FAQ states the boundary
explicitly: "Firecracker does not virtualize power management (e.g. there is no
ACPI PM support)."
ACPI S3 (suspend to RAM), S4 (hibernate), and S5 (soft-off) are not available.
Reboot is handled by the i8042 CMD_RESET_CPU = 0xFE path described above.
Shutdown initiated from inside the guest — for instance, poweroff — does not
trigger a clean power-off sequence. The Firecracker process continues running
until an external caller sends the SendCtrlAltDel API event, which injects a
Ctrl+Alt+Del scan code sequence into the guest, ultimately causing the kernel
to reboot via the i8042 path. Before v1.8.0, device enumeration used an
MPTable; from v1.8.0 that path is deprecated, with removal planned for v2.0.
The attack surface that remains is deliberately auditable: the UART, i8042, and
PIT emulators each fit in a few hundred lines of Rust. Chapter 13 covers the
jailer process, which further constrains what the VMM can reach even if one
of those emulators is compromised.
QEMU microvm: The Same Destination, Different Road
QEMU's microvm machine type (-machine microvm) was introduced in QEMU 4.2
in late 2019. The QEMU documentation describes it as "a machine type inspired
by Firecracker and constructed after its machine model," and the structural
similarity is clear: a single ISA bus, no PCI by default, no ACPI in the
original release, legacy devices kept only where necessary. The differences
are mostly of degree, and understanding them sharpens what is genuinely
necessary in any minimal machine model versus what is a Firecracker-specific
choice.
The Bus Fabric
QEMU microvm's only bus is a single ISA bus. Onto that bus, a set of legacy
devices can be optionally attached: the i8259 PIC pair, the i8254 PIT, an
MC146818 RTC, and one ISA serial port. The LAPIC and IOAPIC are always present
when KVM is in use; kernel-irqchip=split is the default KVM irqchip mode
for microvm, meaning the LAPIC lives in KVM's kernel module while the IOAPIC
is handled by QEMU userspace.
virtio-mmio Slots
QEMU microvm provides 8 virtio-mmio transport slots by default
(mms->virtio_num_transports = 8 in hw/i386/microvm.c). Each slot is 512
bytes wide (smaller than Firecracker's 4 KiB per slot) at a base address of
VIRTIO_MMIO_BASE = 0xfeb00000. Slot i sits at 0xfeb00000 + i * 512. The
default IRQ base is mms->virtio_irq_base = 5, so slots 0–7 use GSIs 5–12.
With a secondary IOAPIC (ioapic2), the virtio IRQ base moves to
IO_APIC_SECONDARY_IRQBASE (24), the slot count grows to IOAPIC_NUM_PINS
(24), and PCIe (when enabled) takes IRQs 12–15. Other fixed MMIO addresses
from include/hw/i386/microvm.h: the ACPI Generic Event Device (GED) at
0xfea00000 on IRQ 9, optional xHCI USB at 0xfe900000 on IRQ 10, and the
PCIe ECAM window at 0xe0000000 (size 256 MiB) with a MMIO window at
0xc0000000 (size 512 MiB) when PCIe is on.
Firmware
This is where QEMU microvm and Firecracker diverge most sharply. QEMU microvm
supports direct kernel loading via -kernel — the QEMU documentation
describes it as a machine type that "needs to be run using a host-side kernel
and, optionally, an initrd image." But the firmware stub still executes.
In hw/i386/microvm.c, x86_bios_rom_init() is called unconditionally as
long as IGVM mode is not active: with ACPI disabled it maps qboot.rom, with
ACPI enabled it maps bios-microvm.bin. The -kernel flag tells QEMU where
to load the kernel image, but it does not suppress the ROM. The guest CPU
starts in the firmware stub, which then hands off to the kernel.
Firecracker takes the opposite approach: the VMM constructs boot_params
directly, sets %rsi to point to it, and jumps to the 64-bit kernel entry
point. No ROM is mapped, no firmware code executes, and the guest CPU's first
instruction is the kernel's own. qboot is purpose-built for speed — it
typically adds only a few tens of milliseconds — but it still represents
firmware-controlled code running in the guest before the kernel. Firecracker
eliminates that phase entirely.
ACPI
QEMU 4.2 shipped microvm without ACPI. QEMU 5.2 added it. The tables are
compact: APIC at 78 bytes, DSDT at 482 bytes, FACP at 268 bytes — under 1 KiB
total, growing to roughly 3,130 bytes for the DSDT when PCIe is enabled (per
Gerd Hoffmann's 2020 blog post on kraxel.org). The DSDT declares each active
virtio-mmio slot with its MMIO address and GSI, so no command-line slugs are
needed.
When ACPI is disabled, QEMU handles device discovery by patching the guest
kernel command line automatically: microvm_get_mmio_cmdline() in
hw/i386/microvm.c appends virtio_mmio.device=512@0x<addr>:<irq> for each
active slot. This behavior is controlled by the machine option
auto-kernel-cmdline (on by default).
Shutdown
Without ACPI PM and without a PS/2 keyboard (both of which are optional in
microvm), there is no standard shutdown path. QEMU microvm's recommended
approach is a CPU triple-fault, which QEMU treats as a reboot or shutdown
trigger. The kernel parameter reboot=t prioritizes the triple-fault path.
This is the mirror image of Firecracker's reboot=k strategy: both avoid
ACPI PM, but Firecracker routes through the i8042 while QEMU microvm, when the
i8042 is absent, routes through a deliberate fault.
Side by Side
| Property | QEMU microvm | Firecracker |
|---|---|---|
| PCI bus | None (QEMU docs describe microvm as having no PCI/PCIe) | None by default; optional PCIe (v1.13.0+) |
| ACPI | Added in QEMU 5.2; includes PM framework | Added in v1.8.0; no PM |
| Firmware | qboot.rom (no ACPI) or bios-microvm.bin (with ACPI) |
None; kernel loaded directly |
| virtio transport | virtio-mmio; 8 slots at 0xfeb00000, 512 B each |
virtio-mmio; 4 KiB slots from 0xC000_0000 |
| ISA serial | Optional; always firmware-visible | Present; disabled in production builds |
| Shutdown | CPU triple-fault (reboot=t) |
i8042 CMD_RESET_CPU=0xFE (reboot=k) |
| Device enumeration | Command-line injection or ACPI DSDT | Command-line injection, MPTable (deprecated), or ACPI DSDT (v1.8.0+) |
| Introduced | QEMU 4.2, late 2019 | Open-sourced November 2018 (v0.11.0) |
The structural gap is the firmware stub — a few tens of milliseconds of vendor-controlled code that QEMU microvm runs before every kernel, and that Firecracker never maps at all. The next chapter examines what the jailer does with the attack surface that remains after the machine model has been stripped this far down.
Sources And Further Reading
-
Firecracker legacy device manager (constants, ACPI AML for COM1 and PS/2):
src/vmm/src/device_manager/legacy.rshttps://github.com/firecracker-microvm/firecracker/blob/main/src/vmm/src/device_manager/legacy.rs -
Firecracker i8042 emulator (command constants, status/control bits, scan codes):
src/vmm/src/devices/legacy/i8042.rshttps://github.com/firecracker-microvm/firecracker/blob/main/src/vmm/src/devices/legacy/i8042.rs -
vm-superio 16550A emulation (
FIFO_SIZE,IIR_FIFO_BITS, register map):vm-superio/src/serial.rshttps://github.com/rust-vmm/vm-superio/blob/main/vm-superio/src/serial.rs -
Firecracker x86_64 memory layout (MMIO base, slot size, GSI ranges, IOAPIC/LAPIC addresses):
src/vmm/src/arch/x86_64/layout.rshttps://github.com/firecracker-microvm/firecracker/blob/main/src/vmm/src/arch/x86_64/layout.rs -
Firecracker irqchip and PIT setup (
setup_irqchip,KVM_PIT_SPEAKER_DUMMY):src/vmm/src/arch/x86_64/vm.rshttps://github.com/firecracker-microvm/firecracker/blob/main/src/vmm/src/arch/x86_64/vm.rs -
Firecracker required KVM capabilities check:
src/vmm/src/arch/x86_64/kvm.rshttps://github.com/firecracker-microvm/firecracker/blob/main/src/vmm/src/arch/x86_64/kvm.rs -
Firecracker
Busstruct andBusDevicetrait:src/vmm/src/vstate/bus.rshttps://github.com/firecracker-microvm/firecracker/blob/main/src/vmm/src/vstate/bus.rs -
Firecracker vCPU run loop and exit dispatch:
src/vmm/src/vstate/vcpu.rshttps://github.com/firecracker-microvm/firecracker/blob/main/src/vmm/src/vstate/vcpu.rs -
Firecracker design document (device list, thread model, production serial disable): https://github.com/firecracker-microvm/firecracker/blob/main/docs/design.md
-
Firecracker FAQ (
reboot=k, "6 emulated devices," no ACPI PM, boot-time guarantee): https://github.com/firecracker-microvm/firecracker/blob/main/FAQ.md -
Firecracker v1.8.0 release notes (ACPI tables added; RSDP placement; MPTable deprecated): https://github.com/firecracker-microvm/firecracker/releases/tag/v1.8.0
-
Firecracker v1.12.0 release notes (PVH boot mode added;
CONFIG_PVH=y, Linux 5.0+): https://github.com/firecracker-microvm/firecracker/releases/tag/v1.12.0 -
Firecracker v1.13.0 release notes (optional PCI transport via
--enable-pci): https://github.com/firecracker-microvm/firecracker/releases/tag/v1.13.0 -
KVM UAPI header (
KVM_EXIT_IO,KVM_EXIT_MMIO,kvm_runsub-structs,KVM_CREATE_IRQCHIP,KVM_CREATE_PIT2,KVM_IOEVENTFD): https://github.com/torvalds/linux/blob/master/include/uapi/linux/kvm.h -
KVM API documentation (§4.24
KVM_CREATE_IRQCHIP, §4.71KVM_CREATE_PIT2, §4.59KVM_IOEVENTFD, §4.35KVM_SET_USER_MEMORY_REGION): https://docs.kernel.org/virt/kvm/api.html -
Linux
timex.h(PIT_TICK_RATE = 1193182 Hz): https://github.com/torvalds/linux/blob/master/include/linux/timex.h -
Linux
tsc.c(pit_calibrate_tsc,quick_pit_calibrate,CAL_PIT_LOOPS,has_legacy_pic): https://github.com/torvalds/linux/blob/master/arch/x86/kernel/tsc.c -
Linux serial console documentation (
earlycon,uart8250,io,0x3f8): https://docs.kernel.org/admin-guide/serial-console.html -
Linux x86 boot protocol v2.15: https://www.kernel.org/doc/html/v6.1/x86/boot.html
-
OASIS virtio 1.2 specification (§4.2 MMIO transport, §4.2.1 discovery, §4.2.2 register map): https://docs.oasis-open.org/virtio/virtio/v1.2/cs01/virtio-v1.2-cs01.html
-
Linux virtio-mmio UAPI header (all register offsets including
VIRTIO_MMIO_QUEUE_NOTIFY=0x050): https://raw.githubusercontent.com/torvalds/linux/master/include/uapi/linux/virtio_mmio.h -
QEMU microvm documentation: https://www.qemu.org/docs/master/system/i386/microvm.html
-
QEMU
hw/i386/microvm.c(slot count, base address, IRQ base, firmware selection,microvm_get_mmio_cmdline): https://github.com/qemu/qemu/blob/master/hw/i386/microvm.c -
QEMU
include/hw/i386/microvm.h(fixed MMIO addresses: GED, xHCI, PCIe window): https://github.com/qemu/qemu/blob/master/include/hw/i386/microvm.h -
Gerd Hoffmann (QEMU maintainer), "QEMU microvm and ACPI" (ACPI table sizes, firmware selection,
bios-microvm.bin): https://www.kraxel.org/blog/2020/10/qemu-microvm-acpi/ -
Stefano Garzarella (QEMU developer), boot time measurement methodology for microvm and qboot (phase-by-phase tracing, virtme tooling): https://stefano-garzarella.github.io/posts/2019-08-24-qemu-linux-boot-time/
-
rust-vmm
vm-devicebus abstractions (IoManager,DevicePio,DeviceMmio): https://github.com/rust-vmm/vm-device/blob/main/src/bus/mod.rs https://github.com/rust-vmm/vm-device/blob/main/src/device_manager.rs