# cllm

> cllm is a Multiboot-compliant unikernel written in C that boots directly on bare metal or in QEMU and serves LLM inference over HTTP. There is no host operating system: the kernel is the application. The kernel contains a custom libc subset, PCI enumeration, an Intel e1000 NIC driver, an HTTP/1.1 server with REST endpoints, and a llama.cpp-compatible API surface. Inference engine integration, GPU/CUDA support, streaming tokens, and vLLM-style optimizations are on the public roadmap and not yet shipped.

cllm is built by Cognisoc. Current target is x86 (i386, Multiboot) in QEMU; bare-metal boot is supported by the same Multiboot entry point. The HTTP API mirrors llama.cpp's v1 endpoints (completions, chat completions, embeddings, models, tokenize, detokenize). No model inference is performed yet — the engine integration is on the roadmap.

## Docs

- [Home](https://cllm.cognisoc.com/): Overview, components, and roadmap
- [About](https://cllm.cognisoc.com/about/): Why a unikernel for LLM serving
- [Blog](https://cllm.cognisoc.com/blog/): Architecture notes and engineering writing
- [Documentation](https://docs.cognisoc.com/cllm/): Full reference docs

## Compare

- [vs llama.cpp](https://cllm.cognisoc.com/compare/llama-cpp/): cllm reuses llama.cpp's API surface but runs without a host OS
- [vs vLLM](https://cllm.cognisoc.com/compare/vllm/): vLLM is a Python serving engine on Linux; cllm is a C unikernel

## Optional

- [RSS](https://cllm.cognisoc.com/rss.xml): Blog feed
- [Source](https://github.com/cognisoc/cllm): GPL repository