Question 1

What is actually running today?

Accepted Answer

A Multiboot ELF that boots in QEMU or on bare-metal x86, brings up serial and VGA, walks the PCI bus, drives an Intel e1000 NIC, and answers HTTP/1.1 requests through a llama.cpp-shaped v1 API. The inference engine itself is on the roadmap; the kernel and the network stack ship now.

Question 2

Which models will cllm support?

Accepted Answer

The API surface mirrors llama.cpp, so the natural target is anything llama.cpp loads (GGUF-format weights for Llama-family architectures, Mistral, Qwen, Gemma, Phi, and so on). Until the llama.cpp inference path is wired into the kernel, no specific model has been benchmarked end-to-end.

Question 3

Which hypervisors are supported?

Accepted Answer

QEMU is the development and primary supported target. The kernel is Multiboot-compliant, so any Multiboot loader will boot it — that includes GRUB on bare metal. Other hypervisors are unverified.

Question 4

What about GPUs?

Accepted Answer

GPU support is roadmap, not shipped. The documentation includes an analysis of how a CUDA ggml backend could be integrated into a unikernel (PCIe access, PTX kernel embedding, host/device memory management), but that work is not yet in the build.

Question 5

What is the network stack?

Accepted Answer

Raw Ethernet frames over the e1000 driver, with a minimal IPv4 + TCP implementation and an HTTP/1.1 subset on top — enough to route the v1 endpoints. There is no socket API in the traditional sense; the server is the packet processing loop.

Question 6

How do I build and run it?

Accepted Answer

Install gcc with -m32 support, make, and qemu-system-i386. Clone the repo, run make run, and serial output appears on your terminal. Ctrl-A X exits QEMU. There are debug, VGA, and GDB-attached variants as well.

Question 7

Is it production-ready?

Accepted Answer

No. The infrastructure scaffolding (kernel, drivers, HTTP) is in place; the inference engine is not. Treat cllm as a working substrate that the llama.cpp integration will land on top of.

Question 8

Why a unikernel instead of just a small Linux?

Accepted Answer

A unikernel removes the kernel/userspace boundary, the scheduler, and every page of the host OS that is not part of the inference path. The target is a single tens-of-kilobytes ELF that boots in milliseconds and spends every cycle on math.

Method	Path	Handler	Status
POST	/v1/completions	handle_v1_completions	wired
POST	/v1/chat/completions	handle_v1_chat_completions	wired
POST	/v1/embeddings	handle_v1_embeddings	wired
GET	/v1/models	handle_v1_models	wired
POST	/tokenize	handle_tokenize	wired
POST	/detokenize	handle_detokenize	wired

A bare-metal C unikernel for serving large language models.

Six pieces, one ELF.

The kernel is the application

PCI enumeration + e1000 NIC driver

llama.cpp-compatible v1 surface

Custom 4 MB heap and string subset

Multiboot ELF — QEMU or bare metal

GDB-attached boots in one command

From Multiboot entry to HTTP response.

HTTP endpoints, in ring 0.

What is done, what is next.

Where cllm sits in the ecosystem.

cllm vs llama.cpp

cllm vs vLLM

Questions engineers ask first.

Boot it, break it, read the source.