Dispatch № 42Architecture & governance9 min read · paper

A Pattern Language for Production LLM Platforms.

Governed routing, agent orchestration, and AI-native delivery in regulated environments, unified by a single rule.

By Nabeel K. 17 June 2026 9 min read

A working demonstration of a language model answers a question and turns a pipeline green. A production system inside a regulated institution answers the same question under a different set of obligations. It has to be able to say which model produced the answer and why, what the call cost, what data it touched, whether a human reviewed it, and how the decision would be reconstructed for an auditor a year later. The distance between the demonstration and that obligation is not a matter of scale. It is a change in what the system is for. The demonstration optimizes a response. The production system governs a decision.

I have spent this paper, and the book series behind it, treating that distance as an architectural problem rather than an engineering inconvenience. What follows is a short tour of the result: a pattern language of seventeen named patterns, organized into three layers, and held together by one rule that is small enough to state in a sentence and strong enough to predict where these systems fail. The full text is below as a paper to download. This dispatch is the map, not the territory.

The paper · PDF

A Pattern Language for Production LLM Platforms

Full text · 17 patterns · three layers · CC BY-NC-ND 4.0

Get the paper →

The one rule

A boundary is a clause the optimizer may not cross.

Every production platform makes two kinds of decision, and most of its trouble comes from confusing them. An optimization decision improves an objective: lower latency, lower cost, higher quality, fewer tests run. A boundary decision fixes a constraint the optimizer is not permitted to relax for any gain: a residency rule, a least-privilege scope, a human-review threshold, an evidence record that must exist. Name the two clearly and they stop competing. The optimizer is set free inside the boundary, and the boundary does not move when the optimizer pushes against it.

Nearly every pattern in the language is a way of stating a boundary so precisely that an optimizer can be turned loose within it. That is the whole trick. It is also why these systems can be efficient and accountable at the same time, two properties the field keeps insisting are a trade. They are only a trade when the boundary and the optimizer are written in the same clause, so that improving one quietly erodes the other.

"A boundary is a clause the optimizer may not cross. Everything else is optimization." The governing invariant

The rule survives the obvious objection, that boundaries shift over time and that an optimizer might even propose new ones. A boundary may evolve, but only through the same governed promotion any policy change follows, never through the optimizer relaxing it at run time for a local gain. When a boundary is loosened or tightened, that is itself a reviewed, recorded act, and the optimizer then works against the new boundary exactly as it did against the old. An optimizer may surface a candidate constraint it believes would lower risk, but a candidate is a proposal, not a boundary, until it passes through promotion and is recorded. Boundaries are not static. They move only by governance, and never by optimization.

The architecture

One decision, seen at three altitudes.

A production AI platform is not three systems stacked on one another. It is one control path observed from three heights, each answering a question a platform team must answer in order.

Layer 1

InfrastructureWhich model may serve this request, under which latency, cost, and risk budget, and how is that decision recorded?

Routing record

Layer 2

ApplicationHow do specialized agents compose into a workflow that can be inspected, reviewed, and improved without going opaque?

Trajectory record

Layer 3

OperationsHow do models, agents, and code reach production on one platform that holds authority by degrees rather than all at once?

Delivery record

Fig. 1 · Three layers, one control path. Each governs a decision and writes its own record; the three records compose into one account an institution can defend.

Each layer has the same internal shape, because the rule applies to all three. At the infrastructure layer, the boundary is the routing policy and the residency and risk constraints it encodes; the optimizer is the router that chooses, inside that policy, the cheapest model that will still meet the quality and latency target. At the application layer, the boundary is the capability contract each agent declares and the human-review threshold; the optimizer is the planner deciding how to decompose and execute the task. At the operations layer, the boundary is the delivery guardrail and the trust tier; the optimizer is the pipeline deciding which tests to run and when to act without waiting for a person. Three records, written at three altitudes, are the same idea repeated: a decision, its justification, and the evidence that it happened.

The seventeen patterns fill in that shape. They are not seventeen inventions. Learned routing, speculative decoding bound to a latency target, agents split into planner and executor and verifier and generator, workflows modeled as inspectable graphs, structure-only logging that keeps the shape of a run without its payloads, tiered human review, golden paths, policy-bounded delivery, benchmark-before-authority: each already exists in the literature, and the paper cites the work it draws from. The contribution is not any one of them. It is the claim that they share one internal shape, and that stating that shape out loud is what lets a platform raise its autonomy without losing the ability to account for it.¹

The composition

Follow one decision through all three layers.

The patterns are stated layer by layer, but their worth is in how they compose, and the clearest way to see that is to follow a single incident. The scenario is deliberately fictional. A teaching institution, Nebula Financial, exists only to give the patterns a concrete pressure to resolve. It deploys nothing, runs no benchmark, and reports no measurement.

A transaction is flagged as potentially fraudulent. The infrastructure layer routes the analysis. A governed routing policy recognizes the request as carrying regulated customer data and constrains it to a permitted model tier inside the required jurisdiction, then writes a routing record naming the policy version and the basis for the route. None of that is an optimization choice. The residency rule is a boundary, and the router optimizes only within it.

The task is more than a single call, so it enters the application layer as a workflow. A planner decomposes the investigation; an executor gathers the transaction history under a capability contract that forbids any state-changing action; a verifier checks the assembled case against the institution's disclosure rules; a generator drafts the finding. A structure-only trajectory record captures the plan, the tool calls, and the verifier's corrections, without retaining the customer payloads. Because the finding may lead to a regulated action, that node carries a full-supervisory-review tier: a human signs before anything leaves the system, and throughput pressure does not lower the bar.

The investigation exposes a gap in the institution's own controls, and the fix is a change to the platform itself. It enters the operations layer through a governed pipeline. The change follows a golden path. An agent proposes the remediation, bounded by a delivery guardrail that records what it decided and forbids it from acting past a suggestion until a human raises its trust tier. The never-skip test set runs alongside the learned selection around it, and the agent that proposed the fix had been measured against real failures before it was trusted with even suggestion authority. A delivery record is written.

The loop closes. From three records written at three layers, the institution can reconstruct who routed what, which agents investigated it under which contracts and review, and how the fix reached production and under whose authority. That reconstruction is the product. The text the model generated along the way is almost incidental.

Adoption

Autonomy is earned, not assumed.

Not every setting needs every pattern. The language is a menu ordered by the strength of the accountability requirement. A startup adopts a governed routing policy and a golden path and defers the rest; an unregulated enterprise takes the infrastructure patterns and a capability contract; a regulated institution adopts all seventeen, because each boundary it omits is a question it cannot later answer.

The order matters more than the count. A platform that reaches full autonomy without first making its records, contracts, and review tiers first-class has built power it cannot account for. The safe sequence is a ladder: a routing layer that turns a model call into a governed event, then an agent layer that composes calls into inspectable workflows, then a governance layer that makes evidence and contracts first-class across both, and only then an autonomous platform that raises an agent's authority by degrees, each increase earned against evidence and bounded by budget.

How these systems fail

A pattern fails not when it is absent but when its boundary is set wrong. A routing policy updated often and reviewed loosely drifts until sensitive traffic flows somewhere no one intended. A trust tier raised after every success and never lowered after a failure ratchets agents past any latitude the evidence justifies. An audit store that records everything records nothing findable. Each failure is the same confusion wearing a new disguise: a boundary and an optimizer mistaken for one another. That is why naming the rule is a safeguard, not a slogan.

The honest part

An architecture to be tested, not a result to believe.

This is a reference architecture and a pattern language, not an experiment. It reports no benchmark and claims no measured result. Where a number appears in the paper, it is a worked illustration and is labeled as such. The patterns earn their place by resolving a stated set of forces under stated constraints, and by their grounding in the cited literature, not by a study this work does not run. The limits are stated plainly in the paper: there is no empirical evaluation, the unifying scenario is fictional, and generalization beyond regulated finance is argued rather than shown. For each pattern, the paper also records the quantity a controlled study would measure and the direction the pattern predicts it should move, so that a claim that predicts nothing measurable is not allowed to hide as one that does.²

The conceptual artifacts behind the patterns, the routing policies, the capability contracts, the delivery guardrails, the golden paths, are developed in full in the companion Full-Stack AI Engineering Series, across infrastructure, application, and operations. Production is not deployment. In a regulated institution a model call is a governed decision that happens to produce text, an agent action is an audit record that happens to do work, and a deployment is a change that has to be defended. This language is one way to build for that. A platform can raise its autonomy as fast as it can prove its judgment, provided every decision is bounded by policy and recorded as evidence.

Download · the paper

Get the full paper.

Tell me where you are reading from and the download unlocks below. The paper is licensed CC BY-NC-ND 4.0: share it with attribution, but not for commercial use and not as a modified version.

Thank you

Your download is ready.

Thanks, there. The paper is yours below, under CC BY-NC-ND 4.0. Attribution: Dr. Nabeel A. Khan, nabeelkhan.com.

Download the paper (PDF) →

Notes

On novelty. None of the seventeen mechanisms is new in isolation. The language composes the routing, decoding, agent, and delivery research it cites; the claim is the composition under one invariant, not the parts. The works it draws on are listed in the references below and in the full paper.
On the institution. The patterns assume an organization that values the ability to reconstruct and defend a decision above raw throughput. Where that value does not hold, several patterns lose their justification, because the boundaries they protect are not required.

References

The work this language draws from.

The pattern language composes established research; it does not replace it. These are the twenty-seven sources cited in the paper, listed as they appear there. The full text, with each citation in place, is in the downloadable paper above.

Workstream: A Local-First Developer Command Center for the AI-Augmented Engineering Workflow. arXiv:2604.17055, 2026.
Daman Arora, Atharv Sonwane, Nalin Wadhwa, Abhav Mehrotra, Saiteja Utpala, Ramakrishna Bairi, Aditya Kanade, and Nagarajan Natarajan. MASAI: Modular Architecture for Software-engineering AI Agents. arXiv:2406.11638, 2024.
Mohammad Baqar, Saba Naqvi, and Rajat Khanda. AI-Augmented CI/CD Pipelines: From Code Commit to Production with Autonomous Decisions. arXiv:2508.11867, 2025.
Yu-Neng Chuang, Leisheng Yu, Guanchu Wang, Lizhe Zhang, Zirui Liu, Xuanting Cai, Yang Sui, Vladimir Braverman, and Xia Hu. Confident or Seek Stronger: Uncertainty-Based On-Device LLM Routing. arXiv:2502.04428, 2025.
Jingzhi Fang, Yanyan Shen, Yue Wang, and Lei Chen. Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation. arXiv:2503.16893, 2025.
Taher A. Ghaleb. When AI Agents Touch CI/CD Configurations: Frequency and Success. arXiv:2601.17413, 2026.
Kaiyu Huang, Hao Wu, Zhubo Shi, Han Zou, Minchen Yu, and Qingjiang Shi. AdaSpec: Adaptive Speculative Decoding for Fast, SLO-Aware Large Language Model Serving. arXiv:2503.05096, 2025.
Kunal Jain, Anjaly Parayil, Ankur Mallick, Esha Choukse, Xiaoting Qin, Jue Zhang, Íñigo Goiri, Rujia Wang, Chetan Bansal, Victor Rühle, Anoop Kulkarni, Steve Kofsky, and Saravan Rajmohan. Intelligent Router for LLM Workloads: Workload-Aware Load Balancing. arXiv:2408.13510, 2024.
Wittawat Jitkrittum, Harikrishna Narasimhan, Ankit Singh Rawat, Jeevesh Juneja, Congchao Wang, Zifeng Wang, Alec Go, Chen-Yu Lee, Pradeep Shenoy, Rina Panigrahy, Aditya Krishna Menon, and Sanjiv Kumar. Universal Model Routing for Efficient LLM Inference. arXiv:2502.08773, 2025.
Satyadhar Joshi. A Review of Generative AI and DevOps Pipelines: CI/CD, Agentic Automation, MLOps Integration, and Large Language Models. SSRN Electronic Journal, 2025. SSRN:5290005.
Sandeep Reddy Kaidhapuram. Human-in-the-Loop (HITL) Orchestration for Agentic Use-Cases: A Practical Framework for Supervising Autonomous AI Agents in Production Environments. International Journal of Computer Techniques, 12(6), 2025.
Baolin Li, Yankai Jiang, Vijay Gadepally, and Devesh Tiwari. LLM Inference Serving: Survey of Recent Advances and Opportunities. arXiv:2407.12391, 2024.
Zhuofeng Li, Haoxiang Zhang, Seungju Han, Sheng Liu, Jianwen Xie, Yu Zhang, Yejin Choi, James Zou, and Pan Lu. In-the-Flow Agentic System Optimization for Effective Planning and Tool Use. arXiv:2510.05592, 2025.
Qiqi Lin, Xiaoyang Ji, Shengfang Zhai, Qingni Shen, Zhi Zhang, Yuejian Fang, and Yansong Gao. Life-Cycle Routing Vulnerabilities of LLM Router. arXiv:2503.08704, 2025.
Xing Liu, Lizhuo Luo, Ming Tang, Chao Huang, and Xu Chen. FlowSpec: Continuous Pipelined Speculative Decoding for Distributed LLM Inference. arXiv:2507.02620, 2025.
Raian Latif Nabil, Hao-Nan Zhu, and Cindy Rubio-González. CI-Bench: A Framework for Evaluating Large Language Model Tools on CI Failures. Proc. IEEE/ACM 48th International Conference on Software Engineering (ICSE), Demonstrations, 2026.
Boye Niu, Yiliao Song, Kai Lian, Yifan Shen, Yu Yao, Kun Zhang, and Tongliang Liu. Flow: Modularized Agentic Workflow Automation. Proc. International Conference on Learning Representations (ICLR), 2025. arXiv:2501.07834.
Gabriele Oliaro, Zhihao Jia, Daniel Campos, and Aurick Qiao. SuffixDecoding: Extreme Speculative Decoding for Emerging AI Applications. arXiv:2411.04975, 2024.
OpsLevel. The 2025 Ultimate Guide to Building a High-Performance Developer Portal. 2025. opslevel.com.
Harshad Pitkar. Platform Engineering and Developer Experience: A Systematic Review of Concepts, Benefits and Future Directions. World Journal of Advanced Engineering Technology and Sciences, 18(2):241–248, 2026. doi:10.30574/wjaets.2026.18.2.0112.
Red Hat. Why Developer Portals Matter More in the Age of AI Agents. Red Hat Blog, 2025. redhat.com.
Ranajoy Sadhukhan, Jian Chen, Zhuoming Chen, Vashisth Tiwari, Ruihang Lai, Jinyuan Shi, Ian En-Hsu Yen, Avner May, Tianqi Chen, and Beidi Chen. MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding. arXiv:2408.11049, 2024.
Jimmy Song. The Second Half of Cloud Native: The Era of AI-Native Platform Engineering Has Arrived. 2025. jimmysong.io.
Aniket Abhishek Soni, Milan Parikh, Rashi Nimesh Kumar Dhenia, Jubin Abhishek Soni, Ayush Raj Jha, and Sneja Mitinbhai Shah. Reinforcement Learning for Dynamic Workflow Optimization in CI/CD Pipelines. arXiv:2601.11647, 2026.
xDevOps. DevOps and SRE AI Platforms: 2025 and 2026 Atlas. GitHub Pages, 2025. xdevops-ai.github.io.
Heming Xia, Zhe Yang, Qingxiu Dong, Peiyi Wang, Yongqi Li, Tao Ge, Tianyu Liu, Wenjie Li, and Zhifang Sui. Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding. Findings of the Association for Computational Linguistics: ACL 2024, pages 7655–7671, 2024.
Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu, Fengwei Teng, Xionghui Chen, Jiaqi Chen, Mingchen Zhuge, Xin Cheng, Sirui Hong, Jinlin Wang, Bingnan Zheng, Bang Liu, Yuyu Luo, and Chenglin Wu. AFlow: Automating Agentic Workflow Generation. Proc. International Conference on Learning Representations (ICLR), 2025. arXiv:2410.10762.

Written by

Nabeel K.

Enterprise AI architect and governance advisor. Founder of Simplification and Director, Solutions Architect at iSystematic, advising regulated enterprises on governed production AI: routing, agent orchestration, LLMOps, and AI governance. See how to work with me →

© 2026 Nabeel A. Khan. The paper "A Pattern Language for Production LLM Platforms" and this article are licensed under CC BY-NC-ND 4.0, Attribution-NonCommercial-NoDerivatives. You may share them with credit to the author; you may not sell them or distribute modified versions. The frameworks, patterns, and named systems described here are the intellectual property of the author.