Raft-grade locks, leases & config with an LLM operations agent in the control plane — never the data plane.
Built & illustrated by Abhishek Aditya.
Ed. 0.1 · alpha · 2026
Every Kubernetes control loop, every Cassandra ring, every distributed lock rests on a coordination service. They are battle-tested, yet operationally brutal: split-brain incidents, lease starvation, slow-follower cascades, and endless consensus-parameter tuning consume disproportionate on-call effort. Postmortems lag the incidents that produced them.
Recent proposals to inject LLM reasoning into the consensus data plane have been rightly rejected by practitioners. Consensus protocols (Paxos, Raft, ZAB) rest on a few narrow correctness invariants and are acutely sensitive to non-determinism. A non-deterministic language model in the commit path breaks the guarantee that two replicas applying the same log entry compute the same state. That guarantee is the safety.
The right place for an LLM agent is the control plane: observing, recommending, and documenting, never deciding. AEGIS is the open-source artifact that makes this argument concrete and testable.
The data plane is deterministic Raft: Apache Ratis on Java 25, exposed over gRPC, with leader-stamped wall-clock time so that TTL math is reproducible across replicas.
The control plane is an agentic Python sidecar (LangGraph, OpenRouter-backed) that observes telemetry, recommends config changes as GitHub PRs, and drafts postmortems as GitHub Issues.
The two share a telemetry surface. They do not share a mutation surface. A human holds every merge bit.
An open loop says "anomaly → PR → hope it helped." AEGIS closes it without ever touching the data-plane invariant.
Before a PR is opened, a verifier replays the exact chaos trace in an ephemeral sandbox cluster under both the current and proposed config, and embeds the before/after delta in the PR. A static safety envelope rejects any patch that violates a consensus-safety constraint, so unsafe configs are structurally impossible to propose.
safety envelope & RAG root-cause, shipped · counterfactual sandbox verification on the roadmap.
Five nodes, one elected leader, on-disk log + snapshot. The leader replicates an append-only log to its followers; a quorum acknowledgement commits each entry. Kill the leader and a new term elects a successor.
Every command carries leader-stamped wall-clock time in its proto envelope, so lease and TTL math is identical on every replica: determinism by construction.
A complete coordination service plus an operations agent confined to the control plane — backed by a 99-test agent suite and a reproducible benchmark.
Docker + Compose v2; Python 3.10+ for the agents. The aegis-*.sh scripts bring up a live 5-node cluster with observability and the dashboard, run the test suites, and tear it all down. ./aegis.sh is the interactive launcher.
Open the repo ↗# 1 · clone $ git clone https://github.com/Abhishek-Aditya-bs/Aegis && cd Aegis # 2 · bring up a 5-node cluster + observability + dashboard $ ./aegis-up.sh --cluster locks # or: --cluster kv → dashboard http://localhost:4400 → grafana http://localhost:3000 (admin/admin) → prometheus http://localhost:9090 # 3 · run the test suites (offline, $0) $ ./aegis-test.sh --fast # agents + benchmark + chaos ✓ 99 agent tests pass · benchmark baseline 33/33 # 4 · score ConsensusOps-Bench with a real model (~$0.01) $ agents/.venv/bin/python benchmark/run.py --diagnoser rag \ --provider openrouter --model google/gemini-3.1-flash-lite ✓ classification 1.000 · root-cause top-1 0.970 · top-3 1.000 # tear down (add --volumes to wipe Raft state) $ ./aegis-down.sh --cluster all
AEGIS ships ConsensusOps-Bench — 33 labelled Raft incidents scoring anomaly classification and ranked root-cause. The deterministic baseline is already near-perfect, and a one-cent LLM-with-retrieval run on a cheap Gemini Flash model matches it exactly. The determinism lives in the free classifier; the metered model is confined to advisory diagnosis.
Perfect classification across all six anomaly classes. Root-cause top-1 misses exactly once — a cascade where a proximate election storm masks a disk-bound follower — and recovers it at top-3. Both columns share the same deterministic classifier; the LLM never classifies.
Yes. The reasoning earns its keep where it helps: classifying anomalies from noisy multi-signal telemetry, drafting tuning PRs with rationale and rollback, and writing readable postmortems from a tool-bounded view. The agent simply never gets to mutate consensus. That separation is the contribution.
Apache Ratis is battle-tested in Apache Ozone and IoTDB. The novelty is the agentic ops layer and the control-plane invariant, not the consensus algorithm. Reinventing Raft is a different project.
A human closes the PR, and the cluster is unchanged. The safety envelope rejects unsafe patches before a PR is even opened. The agent's only mutation pathway is the review queue; bad proposals become logged evidence for the paper.
Determinism. The replay test that anchors the classifier needs the same answer every run. The LLM genuinely earns its keep in diagnosis and postmortem narration — comparison, lesson-extraction, and ranked root-cause — and that path is opt-in.