Skip to main content
All reference architectures
Defense ICP 2

Local AI assistant for air-gapped development

A reference architecture for fully offline operation with Ollama: OS-level sandboxing as a compliance argument, with no data leaving the machine.

Design target: an assistant approvable for classified development environments, with a kernel-level policy an assessor can inspect.

Reference architecture: a target scenario Kernex is designed to serve, not a report from a customer deployment. Metrics are design targets.

The scenario

The development environment is classified. No external network connections. No cloud services. No telemetry. The AI assistant either runs entirely on-device or it does not run at all.

This eliminates every cloud-based AI coding tool, every hosted API, and every framework that phones home for licensing or analytics. This reference architecture describes the fully local configuration Kernex is built for.

How Kernex runs fully offline

Kernex supports Ollama as a provider via --provider ollama. Ollama runs local models (Llama, Mistral, CodeLlama, and others) on-device. The kx CLI connects to localhost:11434 with no external network calls.

The full stack is local:

The target enforcement model: network access for the agent process is declared at startup, the allowlist contains only 127.0.0.1:11434, and the OS enforces it so outbound connections to anything else fail at the syscall level. Filesystem sandboxing (Seatbelt on macOS, Landlock on Linux) ships today; OS-enforced egress allowlists are on the sandbox roadmap and are the design bar this scenario is specified against.

Sandboxing as a compliance argument

A security approval process for an environment like this requires demonstrating that the tool cannot exfiltrate data even if the model is prompted to try. That is the argument this architecture is designed to make concretely: the policy is inspectable, the enforcement is at the kernel level, and the restriction applies to the model’s tool calls, not just to well-behaved code paths.

The pitch to an approving officer is that the sandboxing is a property of the tool, not a claim about the vendor.

Operational details

Models run on local GPU. Response latency is higher than cloud alternatives, typically 3-8 seconds per turn depending on model size and hardware. For the intended use (code review, documentation, architecture questions), that trade is acceptable.

Memory persistence works identically to the cloud-connected configuration. Facts and conversation history accumulate in ~/.kx/projects/{project}/.

All reference architectures