Cloud-Native Patterns for AI Engineers
Book 1 of the Full-Stack AI Engineering Series. The infrastructure layer. Production is not deployment. It is the architecture of trust under load.
AI reached production in regulated finance faster than the infrastructure built to govern it. A model call that moves money, extends credit, or clears a name is not a feature. It is a regulated event that happens to produce text.
There are two rooms in every AI program. In the first, the model answers and everyone agrees the future has arrived. In the second, a regulator asks what the model received, which version served it, where the data travelled, and what it cost. The first room is crowded. The second is nearly empty. This book lives in the second room.
Data residency, the audit trail, latency, and cost are not four governance problems. They are one decision, made once per request, that a regulated institution cannot afford to make in four hundred different places. NexusCore is the gateway that makes it once: the routing and observability layer between every application and the pool of models behind it, deciding which model may answer under which budget, and recording the decision as evidence.
The gateway owns five responsibilities and refuses three. It owns model selection, residency, cost attribution, the audit record, and the escalation boundary. It refuses prompt authorship, the user experience, and the purpose of the feature. The application owns the question. The gateway owns the institution's accountability in the answering.
Reads the request, selects the smallest model the request can trust, and records why. Routing is not optimization. It is the architecture of restraint.
Research anchorA versioned, signed, promotable policy that names the model pool and the boundaries the optimizer may never cross.
Policy as configSmall, mid-tier, frontier. The frontier model is reserved, never the default that nothing in the architecture argues against.
Latency as a control surface, not a model trick. The SLO sets the budget; the budget selects the decode strategy.
Research anchorSigned artifacts, provenance records, promotion control. Three lifecycles defended, one plane: the one that learns, the one that serves, the one that merely persists.
Research anchorComplete, immutable, queryable. Compliance is a property the system already had before the auditor arrived.
Written for readers who think in service level objectives, error budgets, and percentiles. The bar on AI is deliberately low: if you know that a model takes a prompt and produces text, non-deterministically, you have enough to begin. It is written for the engineer you used to manage, and the one you are now.
A gateway is not plumbing. It is the place where an institution decides what it is allowed to think, and how much that thought may cost.LLM Systems in Production
Three books, one fictional regulated fintech, Nebula Financial, and three systems that are not three products but three faces of one platform, each owning a layer of the stack.
The book is in draft. Leave a name and a working email, and you get one note when Book 1 publishes. No list, no noise, no second message.
One note when LLM Systems in Production publishes. Until then, the Enterprise Playbook is out now.
Speed without governance is debt. Governance is the architecture that lets speed compound.