Chapter 17: MMDS — The MicroVM Metadata Service

A microVM boots knowing almost nothing about itself. The kernel image was baked ahead of time; the root filesystem is a generic snapshot; the init process has no way to ask the hypervisor which instance it is, what its role is, or what credentials it should use. Someone has to tell it. The question is how.

The obvious approach — pack the information into the boot environment as kernel command-line arguments or environment variables inside the initial RAM disk — breaks in any fleet where the same image runs across thousands of machines with different identities. A cloud instance cannot encode its own hostname, role, or ephemeral credentials at image-build time. AWS learned this the hard way building EC2, and their solution, the Instance Metadata Service at 169.254.169.254, became the de-facto standard for injecting per-instance configuration into a running virtual machine without rebuilding the image. When Firecracker was designed to run Lambda functions at scale, the Firecracker team needed the same capability — but they needed it without adding a privileged listener on the host network, without a sidecar process, and without allowing the guest to modify its own metadata.

What they built is the MicroVM Metadata Service, or MMDS. It is a JSON key-value store embedded in the VMM process, served to the guest over a purpose-built HTTP stack that lives entirely inside firecracker, and written exclusively by the host over the Unix-domain API socket. The guest reads; the host writes. That asymmetry is load-bearing: it is the entire security property of the design.

Where MMDS Lives

MMDS has three components, and all three run inside the single firecracker process — outside the KVM hardware boundary, in the VMM thread alongside the virtio-net, virtio-block, virtio-vsock, and virtio-balloon device emulators. There is no separate MMDS process, no sidecar, no additional thread.

The first component is the host-side API handler, which accepts PUT /mmds, PATCH /mmds, GET /mmds, and PUT /mmds/config from the operator over the Firecracker Unix-domain socket.

The second is the data store, a single serde_json::Value tree (data_store field on struct Mmds in src/vmm/src/mmds/data_store.rs) that holds whatever JSON the host has placed there. By default the store is capped at 51,200 bytes (50 KiB); the --mmds-size-limit CLI flag raises or lowers that ceiling, defaulting to the value of --http-api-max-payload-size when unset.

The third is Dumbo, a minimalist IPv4/TCP/HTTP network stack in src/vmm/src/dumbo/. Dumbo intercepts outbound Ethernet frames from the guest's virtio-net ring — before any of them reach the host TAP device — and synthesizes HTTP responses from the data store entirely in process.

flowchart TB subgraph host["Host (operator)"] api["firecracker API socket
(Unix-domain)"] end subgraph fc["firecracker process (VMM thread)"] handler["Host API handler
PUT/PATCH/GET /mmds"] store["data_store: serde_json::Value
(≤ 51,200 bytes)"] dumbo["Dumbo
ARP + IPv4 + TCP + HTTP
src/vmm/src/dumbo/"] net["virtio-net emulator"] handler --> store dumbo --> store net --> dumbo end subgraph guest["Guest (VM)"] gnet["virtio-net driver"] end api --> handler gnet -->|"Ethernet frames"| net dumbo -->|"intercepted before TAP"| gnet net -->|"unmatched frames"| tap["TAP device → host network"]

The critical property visible in the diagram is the position of Dumbo: it sits between the guest's virtio-net ring and the TAP device, not between the TAP device and the host network. Frames that look like MMDS traffic never reach TAP. Frames that do not match pass through unmodified.

The Data Store

The data store is deliberately simple. It holds a single JSON tree — not a typed schema, not a relational model, just a serde_json::Value. The host populates it with PUT /mmds to replace the whole tree, or PATCH /mmds to apply a JSON Merge Patch (RFC 7396) against the existing contents. Under RFC 7396 semantics, a null value in the patch removes the corresponding key from the store; any non-null value replaces or creates it.

Only JSON objects and strings are supported when a guest reads them. Arrays, numbers, and booleans may exist in the store, but if the guest requests a path that resolves to one, Dumbo returns 501 Not Implemented unconditionally — the UnsupportedValueType arm in mod.rs always maps to StatusCode::NotImplemented regardless of mode. When imds_compat is enabled, the response body is formatted as plain-text in IMDS style rather than a JSON envelope, but the 501 is not conditional on that flag.

The error variants on MmdsDatastoreError are: DataStoreLimitExceeded, NotFound, NotInitialized, TokenAuthority(TokenError), and UnsupportedValueType. NotInitialized is returned when the guest queries the store before the host has written anything — the store starts empty, and an empty store is different from a store that has been explicitly populated with an empty object.

Configuring MMDS

Before the microVM boots, the operator sends a PUT /mmds/config request over the API socket to attach MMDS to one or more network interfaces and choose a version. The MmdsConfig struct (in src/vmm/src/vmm_config/mmds.rs, annotated #[serde(deny_unknown_fields)] — unknown fields return HTTP 400) has four fields:

Field Type Default
version V1 or V2 V1 (deprecated)
network_interfaces list of interface IDs required
ipv4_address IPv4 address 169.254.169.254
imds_compat boolean false

PUT /mmds/config is pre-boot only. The network_interfaces list names which of the guest's virtio-net devices will have a Dumbo instance attached. A guest interface not listed in network_interfaces has no associated Dumbo; its frames addressed to 169.254.169.254 pass through to TAP unmodified, which is why the production host-setup guide mandates firewall rules on every TAP interface regardless of configuration (see the trust model section below).

Sending version: V1 — or omitting the field, since V1 is the default — causes the API server to append the deprecation warning "MmdsV1 is deprecated. Use V2 instead." to the response body. V1 has been deprecated since Firecracker v1.1.0. The design doc does not commit to a specific removal version beyond noting it will happen in a future major release.

When imds_compat is true, all guest responses are formatted as plain-text in EC2 IMDS style regardless of the Accept header. This allows existing EC2 metadata clients inside the guest — tools that expect 169.254.169.254 to behave like AWS — to work without modification.

How Dumbo Intercepts Guest Traffic

Frame interception runs in src/vmm/src/mmds/ns.rs, in the function that processes each Ethernet frame coming off the virtio-net receive queue. Two lightweight speculative checks decide whether a frame belongs to MMDS:

For ARP frames, test_speculative_tpa() checks whether the Target Protocol Address matches the configured MMDS IPv4 (default [169, 254, 169, 254], constant DEFAULT_IPV4_ADDR in ns.rs).

For IPv4 frames, test_speculative_dst_addr() checks whether the destination IP matches.

Frames that match either check are consumed by Dumbo. All others are forwarded to TAP unchanged. Non-TCP frames that match the MMDS IP — ICMP pings, UDP datagrams — are silently absorbed and counted in the mmds.rx_accepted_unusual metric; they get no response.

ARP

When a guest ARP request for the MMDS IP arrives, detour_arp() records the sender's MAC and IP, then synthesizes an ARP reply announcing the fake MMDS MAC address 06:01:23:45:67:01 (constant DEFAULT_MAC_ADDR). The host's real TAP MAC is never exposed. ARP replies are queued ahead of pending TCP segments in Dumbo's write pipeline, so the first HTTP request never stalls waiting for an unresolved ARP entry. ARP frames follow the standard 28-byte IPv4-over-Ethernet layout (ETH_IPV4_FRAME_LEN = 28): hardware type 0x0001, operation 0x0002 for the reply.

TCP

Dumbo implements passive TCP only — it never opens a connection. The state machine is deliberately stripped of anything that is not needed for serving short HTTP requests to a cooperating guest:

Key constants from the source:

Constant Value Source
MSS_DEFAULT 536 bytes (RFC 879 minimum) tcp/mod.rs
RCV_BUF_MAX_SIZE 2,500 bytes tcp/endpoint.rs
CONNECTION_RTO_PERIOD 1,200,000,000 cycles (~300 ms at 4 GHz) tcp/endpoint.rs
CONNECTION_RTO_COUNT_MAX 15 retransmissions, then RST tcp/endpoint.rs
DEFAULT_TCP_PORT 80 ns.rs
DEFAULT_MAX_CONNECTIONS 30 ns.rs

If the receive buffer fills before a complete HTTP request arrives — possible for unusually large request headers — the connection is reset. Initial sequence numbers are generated via xor_pseudo_rng_u32(); in fuzz builds they are fixed at 0x12345678 to make traces reproducible.

IPv4 packets emitted by Dumbo carry TTL = 1 (DEFAULT_TTL: u8 = 1 in pdu/ipv4.rs). IP options are not supported; the header is always the standard 20-byte form.

HTTP Methods

Dumbo accepts only GET and PUT from the guest. Any other method returns 405 Method Not Allowed with the header Allow: GET, PUT. The only valid target for PUT is /latest/api/token (constant PATH_TO_TOKEN in token.rs); a PUT to any other path returns 404 Not Found. There is no code path by which a guest PUT invokes put_data() or patch_data() on the store. The guest can issue a PUT, but it gets a session token back, never the ability to modify what it reads.

Guest HTTP status codes:

Code Condition
200 OK Successful GET or token PUT
400 Bad Request Malformed headers, invalid TTL, bad URI
401 Unauthorized Missing or invalid/expired token (V2 only)
404 Not Found JSON pointer path not found; PUT to non-token path
405 Method Not Allowed Any method other than GET or PUT
413 Payload Too Large Data store size limit exceeded
501 Not Implemented Unsupported value type (array, number, or boolean at the requested path)

URI normalization collapses consecutive / slashes iteratively until stable before resolving the path as a JSON pointer into the data store tree.

V1 and V2: The Session Token Model

The fundamental difference between MMDS V1 and V2 is whether the guest must prove it is the legitimate tenant of the VM before reading metadata. V1 requires nothing. V2 requires a session token obtained from the MMDS itself before each batch of reads.

The threat V2 closes is server-side request forgery (SSRF). A guest workload — particularly a serverless function — might run code that an attacker controls. Without token gating, that code can curl http://169.254.169.254/ and read the metadata unconditionally. With V2, the code still can — nothing prevents in-guest code from minting its own token — but the token is cryptographically bound to the specific firecracker instance, so it cannot be relayed to a different VM and used there.

V1 was the default from MMDS's introduction. V2 entered developer preview in Firecracker v0.23.0 and was promoted to GA in v1.1.0, at which point V1 was deprecated. The v1.0.0 release tightened V2 enforcement: session token validation was made mandatory for V2 GET requests, and X-Forwarded-For on any PUT to the token endpoint became an unconditional rejection.

Acquiring a V2 Token

The V2 flow is a three-step session protocol:

sequenceDiagram participant G as Guest workload participant D as Dumbo (VMM thread) participant S as data_store G->>D: PUT /latest/api/token (TTL: 60s) Note over G,D: Header: X-metadata-token-ttl-seconds: 60 Note over D: mint AES-256-GCM token, expiry = now_ms + 60000 D-->>G: 200 OK — body: base64 token G->>D: GET /latest/meta-data/hostname Note over G,D: Header: X-metadata-token: base64-token D->>S: resolve JSON pointer /latest/meta-data/hostname S-->>D: "my-vm-001" D-->>G: 200 OK — body: my-vm-001 G->>D: GET /latest/meta-data/placement Note over G,D: Header: X-metadata-token: base64-token D->>S: resolve JSON pointer /latest/meta-data/placement S-->>D: {"region":"us-east-1"} D-->>G: 200 OK — body: {"region":"us-east-1"}

Step one: the guest sends PUT http://169.254.169.254/latest/api/token with the header X-metadata-token-ttl-seconds: <N>, where <N> is the requested lifetime in seconds. Firecracker v1.13.0 added aliases for EC2 IMDS compatibility: the equivalent EC2 header X-aws-ec2-metadata-token-ttl-seconds: <N> is also accepted. TTL bounds are enforced: minimum 1 second, maximum 21,600 seconds (six hours), defined as MIN_TOKEN_TTL_SECONDS: u32 = 1 and MAX_TOKEN_TTL_SECONDS: u32 = 21600 in token.rs. A TTL outside this range returns 400 Bad Request.

Step two: Dumbo mints a token and returns it as a plaintext base64 string in the response body. The token response echoes back whichever TTL header variant the guest sent, matching EC2 IMDS behavior.

Step three: the guest presents the token on subsequent GET requests via X-metadata-token: <token> (or its EC2 alias X-aws-ec2-metadata-token: <token>). A missing or expired token in V2 mode returns 401 Unauthorized.

The X-Forwarded-For header on any PUT to the token endpoint is unconditionally rejected with 400 Bad Request. This blocks the simplest SSRF relay: a guest trying to obtain a token on behalf of a different host by routing the request through a forwarding proxy.

Under V1, both the rx_no_token and rx_invalid_token metrics are incremented when a token is absent or invalid, but the request proceeds normally with a 200 OK. Under V2, the same metrics are incremented and the request returns 401. These two metrics were added in Firecracker v1.13.0.

Token Cryptography

The token is not a random opaque nonce that Firecracker stores in a table. It is a self-contained sealed credential: anyone who holds the right key can verify it without a round-trip to a store. The structure (struct Token, #[repr(C)], in token.rs) encodes everything needed for validation in 36 raw bytes:

Field Length Purpose
IV 12 bytes (IV_LEN = 12) Per-token randomized nonce
Payload 8 bytes (PAYLOAD_LEN = size_of::<u64>()) AES-256-GCM encrypted expiry timestamp
Tag 16 bytes (TAG_LEN = 16) GCM authentication tag
Total 36 bytes raw / 48 characters base64

The cipher is AES-256-GCM via the aws-lc-rs crate, using RandomizedNonceKey::new(&AES_256_GCM, &key). The 256-bit key (KEY_LEN: usize = 32) is generated fresh when the TokenAuthority is created — once per microVM — and held entirely in the VMM process. The guest never sees it.

The payload is the expiry timestamp: expiry_ms = now_ms + (ttl_seconds * 1000), where now_ms is the monotonic clock at mint time. To validate, Dumbo decrypts the payload and checks that expiry_ms > current_monotonic_ms. The GCM tag covers the ciphertext; forgery is computationally infeasible.

The Additional Authenticated Data (AAD) passed to AES-GCM is "microvmid={instance_id}". This binds each token to the specific Firecracker instance that minted it. A token from one microVM presented to a different microVM fails GCM authentication because the AAD does not match; the tag is invalid.

Incoming tokens longer than 70 characters (TOKEN_LENGTH_LIMIT = 70) are rejected before any decryption attempt. The actual emitted token is always 48 characters (the base64 encoding of 36 bytes), so this limit provides a clean denial-of-service guard: a guest cannot cause Firecracker to do arbitrarily large decryption work by submitting a long string.

Key Rotation

TokenAuthority tracks how many tokens it has minted in num_encrypted_tokens: u32. When that counter reaches u32::MAX (4,294,967,295 tokens), the authority re-seeds with a fresh 256-bit random key, resets the counter to zero, and invalidates all previously issued tokens. In practice this threshold is unreachable: at one token per second it would take over 136 years per microVM. The rotation path exists because AES-GCM nonce reuse with the same key would be catastrophic; exhausting the nonce space — which RandomizedNonceKey is designed to detect — triggers safe regeneration rather than silent failure.

The Trust Model

The design document states the rule plainly:

"MMDS related API requests come from the host, which is considered a trusted environment, so there are no checks beside the kind of validation done by HTTP server and serde-json."

"guest traffic should be treated as untrusted, and firewall rules should be put in place at the host-level to prevent guests from accessing restricted IPv4 addresses on the host."

These two statements define the full trust boundary. The host has unconditional write access to the data store via the Unix-domain socket and faces no authentication challenge from MMDS. The guest has read access to whatever the host has placed there and cannot write or modify a byte of it.

What the Guest Cannot Do

The guest cannot reach the Firecracker API socket — that is a file on the host filesystem, inaccessible from inside the VM. The AES-256-GCM key used to mint and verify tokens exists only inside the VMM process; the guest cannot extract it. The host TAP device's real IP and MAC are never revealed: Dumbo's ARP reply gives the guest only the fake MAC 06:01:23:45:67:01. The guest cannot discover data not explicitly placed in the store by the host, cannot determine whether non-TCP packets it sent to 169.254.169.254 were absorbed or dropped, and cannot cause Dumbo to modify the data store via any guest-accessible HTTP method.

Dumbo Is Not a Security Boundary

This point deserves stating directly because Dumbo looks like a firewall. It is not. Dumbo intercepts frames when the guest uses the right interface and the right destination IP. A guest interface not listed in MmdsConfig.network_interfaces has no Dumbo attached; if a guest sends an ARP request for 169.254.169.254 on that interface, the frame reaches the host TAP device and continues into the host network.

The production host-setup guide (docs/prod-host-setup.md) mandates host-level packet filtering to close this gap. Before running any Firecracker workload:

Safety note: the commands below modify host network filtering rules and require root privileges on the host.

# nftables
nft add rule firecracker filter iifname "tap*" ip daddr 169.254.169.254 counter drop

# iptables-nft
iptables-nft -I FORWARD -i tap+ -d 169.254.169.254 -j DROP

These rules drop any frame arriving from a TAP interface that is destined for 169.254.169.254, preventing a guest from reaching any host listener at that address regardless of how it routes the packet. Without them, a guest could potentially bypass Dumbo entirely and reach a host service at the link-local address.

How V2 Closes Cross-VM Token Relay

Consider the attack: a compromised guest A extracts a token from B's metadata service by somehow causing B to issue one and then forwarding it to A's MMDS endpoint. Because every TokenAuthority holds a distinct AES-256-GCM key and a distinct instance ID in the AAD, A's TokenAuthority cannot decrypt a ciphertext minted by B's key. The GCM tag fails verification and the token is rejected with 401 Unauthorized. The X-Forwarded-For rejection on PUT adds a second layer: the simplest mechanical relay — routing the token PUT through an HTTP proxy — fails before the cryptographic check even runs.

Metrics

MmdsMetrics in src/vmm/src/logger/metrics.rs tracks MMDS health as SharedIncMetric counters flushed on read:

Metric Meaning
rx_accepted Frames rerouted to MMDS
rx_accepted_err Errors handling a rerouted frame
rx_accepted_unusual Non-TCP frames destined for MMDS IP (consumed, no response)
rx_bad_eth Frames not parseable as Ethernet
rx_invalid_token GET with invalid or expired token
rx_no_token GET with no token
rx_count Total successful receives
tx_bytes Total bytes sent to guest
tx_count Total successful sends
tx_errors Send errors
tx_frames Total frames sent
connections_created TCP connections accepted by Dumbo
connections_destroyed TCP connections cleaned up

In V1 mode, rx_invalid_token and rx_no_token increment but the request succeeds. In V2 mode the same increment accompanies a 401. Distinguishing the two via metrics is how an operator can measure V1 traffic patterns before migrating to V2.

Snapshot and Restore

Firecracker's snapshot mechanism persists MmdsNetworkStackState (in src/vmm/src/mmds/persist.rs): the MAC address, the IPv4 address as a u32, the TCP port as a u16, the MMDS version, and the network interface configuration. What it does not persist is the data store contents or the token authority. On restore, a fresh TokenAuthority is created with a new random key, which immediately invalidates any V2 tokens that were valid at snapshot time. The data store starts empty. Operators must re-populate the store via PUT /mmds after restoring a snapshot — this is an explicit requirement in the user guide, not an implementation detail.

This is a natural consequence of the data store living as an in-memory serde_json::Value. It was never written to disk, so there is nothing to restore. Keeping TCP connection state alive across a snapshot/restore without replaying the data store is the point: network identity persists so the guest's connections do not stall; the host re-provides content deliberately, not by accident.

Version History

Firecracker version Change
v0.22.0 PUT /mmds/config pre-boot endpoint added; allow_mmds_requests per-interface flag removed from the network interface API
v0.23.0 MMDS V2 introduced as developer preview via optional version field on PUT /mmds/config; default remains V1
v1.0.0 V2 session token enforcement made mandatory for GET requests; X-Forwarded-For on PUT /latest/api/token rejected unconditionally regardless of header casing
v1.1.0 MMDS V2 promoted to GA; V1 deprecated; --mmds-size-limit CLI flag added; MMDS version persisted in snapshots
v1.13.0 imds_compat field added; AWS EC2 header aliases added; rx_invalid_token and rx_no_token metrics added; token response echoes TTL header

The asymmetric read/write model — host writes, guest reads, no exceptions — is what lets fleet operators inject arbitrary per-instance configuration without trusting the workload. Chapter 18 examines the other side of that relationship: how the guest signals intent back to the host via virtio-vsock, and when an out-of-band channel like that is worth the complexity.

Sources and Further Reading