# cllm > cllm is a Multiboot-compliant unikernel written in C that boots directly on bare metal or in QEMU and serves LLM inference over HTTP. There is no host operating system: the kernel is the application. The kernel contains a custom libc subset, PCI enumeration, an Intel e1000 NIC driver, an HTTP/1.1 server with REST endpoints, and a llama.cpp-compatible API surface. Inference engine integration, GPU/CUDA support, streaming tokens, and vLLM-style optimizations are on the public roadmap and not yet shipped. cllm is built by Cognisoc. Current target is x86 (i386, Multiboot) in QEMU; bare-metal boot is supported by the same Multiboot entry point. The HTTP API mirrors llama.cpp's v1 endpoints (completions, chat completions, embeddings, models, tokenize, detokenize). No model inference is performed yet — the engine integration is on the roadmap. ## Docs - [Home](https://cllm.cognisoc.com/): Overview, components, and roadmap - [About](https://cllm.cognisoc.com/about/): Why a unikernel for LLM serving - [Blog](https://cllm.cognisoc.com/blog/): Architecture notes and engineering writing - [Documentation](https://docs.cognisoc.com/cllm/): Full reference docs ## Compare - [vs llama.cpp](https://cllm.cognisoc.com/compare/llama-cpp/): cllm reuses llama.cpp's API surface but runs without a host OS - [vs vLLM](https://cllm.cognisoc.com/compare/vllm/): vLLM is a Python serving engine on Linux; cllm is a C unikernel ## Optional - [RSS](https://cllm.cognisoc.com/rss.xml): Blog feed - [Source](https://github.com/cognisoc/cllm): GPL repository