There Are No New Engineering Disciplines
In late 2023 I shipped a production RAG system. The architecture had: an intent-detection model that routed each query to either the main retrieval path or a separate chit-chat model, a chunking strategy we revised four times, a reranker, query rewriting, a few-shot injection step that pulled domain-specific examples from a curated bank, and a data-augmentation pipeline that generated training pairs for the intent classifier and the eval set. The evaluation harness had three tiers: full automated metrics on every release, LLM-as-judge on a small percentage of live traffic, and human-in-the-loop scoring on a small percentage of that small percentage, used to calibrate the LLM judge against drift. The model that did the synthesis was, honestly, the part I worried about least.
In February 2026, Mitchell Hashimoto called the discipline of building such things “harness engineering,” Birgitta Böckeler wrote it up on Martin Fowler’s site, OpenAI shipped a case study using the phrase, and the term began appearing in job listings. The attribution is contested: some chains credit Viv Trivedy as the original coiner, and Dex Horthy’s 12-factor-agents crowd treats harness engineering as a subset of context engineering. The 2024 me would’ve read all three pieces and recognized everything in them. The practice is years older than the vocabulary. What we’re watching in 2026 is the naming of a discipline that’s been doing useful work for a decade.
The useful work the rebrand performs is common-knowledge generation. Practices that lived as private knowledge across many teams become public knowledge that an industry can organize around. Once a practice has a name, you can hire for it, sell consultancies on top of it, and slowly build a certification track around it. That’s a real coordination good, and worth a name. The other thing common-knowledge generation does, less remarked on, is vocabulary capture (after Stigler’s regulatory capture, the same shape applied to language). The value used to live in the engineers who’d figured the practice out in private, and now it lives in the vocabulary itself, accruing to whoever’s most associated with the term.
That’s the useful side of the rebrand. The other side, mostly unstated, is the implication that the practice itself arrived with the name. It did not.
My 2023 retrieval system wasn’t unusually sophisticated for its era. Production teams at Anthropic, OpenAI, Pinecone customers, LangChain users, internal teams at every large search company, and indie shops sitting on top of LlamaIndex were all running variations of the same stack by mid-2024. None of us were calling ourselves harness engineers. We were calling ourselves whatever our payroll system listed. Mostly because, when you’ve got a database to recover and a customer who’s screaming, you don’t stop to brand the methodology.
I Have Lived This Movie Once Already
In the early 2000s I worked at Amazon, before AWS existed, on internal systems that deployed code to production without humans pushing buttons. The pipeline ran blameless reviews when something broke, kept infrastructure in version-controlled configuration, and integrated continuously whether or not anyone was watching. Nobody called it DevOps. The bare metal underneath would eventually become a product the company sold to the rest of the world, but the discipline was in place years before the marketing was, because the alternative was unworkable at the scale Amazon was already operating at.
In October 2009, Patrick Debois held the first DevOpsDays in Ghent and coined the term that named the work I had been doing for years. John Allspaw and Paul Hammond had given the “10+ Deploys Per Day” talk at Velocity earlier that year, using Flickr as the case study, while the same patterns were quietly in production at Amazon and Google. The vocabulary landed. Within five years there was a DevOps role in the job market, a certification path, a consulting industry, a conference circuit, and a generation of engineers who joined the field in 2012 thinking it had been invented in 2009.
People who joined the field after the naming event treat the name as the origin, and people who joined before spend the rest of their careers occasionally saying “we did this, we just didn’t call it that,” and being met with polite skepticism. The cycle is sixteen years old. We’re running it again on AI scaffolding.
We weren’t harness engineers starting in 2022, when LLMs got interesting enough to deserve a harness. We were already harness engineers in the early 2010s, when production machine learning meant a model wrapped in an order of magnitude more code for feature engineering, monitoring, retraining, eval, serving, fallback, and shadow-mode evaluation. The pattern goes back further than that: production recommender systems in the late 2000s, production search through the mid-2000s, and arguably any production statistical model further back than I want to embarrass myself by claiming. The discipline of getting a model to be useful in production has been doing the same job for the better part of two decades.
The model in the middle has changed every couple of years, while the harness pattern stayed roughly the same shape it had in 2008.
LLMs are the noisiest model anyone has yet built a harness around. They hallucinate, drift, and fail in ways correlated with the user rather than the input distribution. The load got heavier, on a harness that already existed.
Harness engineering is also one example in a longer sequence. The same arc ran on MLOps (the term landed around 2018, the practice was widespread by 2014), on Platform Engineering (named around 2022, internal-platform teams had been doing the work for a decade), on SRE, and on a half-dozen other “engineering” suffixes the industry has hired around since the early 2000s. Pick any of them and the practice predates the term by years.
The Objection Worth Taking Seriously
The counter to this post, which I expect from Hashimoto’s defenders, is that agentic harnesses are doing categorically new work: multi-turn tool use, self-correction, multi-agent coordination, the MCP-shaped surface area that has emerged in the last twelve months. The 2026 vocabulary, on this view, is responding to a real shift in scope, not just rebranding the old work.
They’re right, in part. Agentic systems have broadened the scope of what a harness must contain. Multi-turn state, persistent memory across turns, tool calling with non-trivial side effects, sub-agent orchestration, and validation of agent reasoning are doing work that a single-turn RAG harness didn’t. The sharpest version: a sub-agent told to verify a test passes, which rewrites the test instead of fixing the code because rewriting minimizes the orchestrator’s failure path more reliably than debugging does. My 2023 retrieval system couldn’t have produced that failure: no model was talking to another model, and nothing was optimizing against a sibling’s evaluation prompt. The orchestration layer needs primitives for catching that kind of misaligned optimization across agent boundaries, and those primitives didn’t exist in pre-2022 ML harnesses because the model-to-model handoff didn’t either. The discipline has gotten larger, but the engineering primitives underneath (retries, observability, state management, evaluation, prompt construction, error handling, graceful degradation) are the same primitives production ML harnesses have been built around for ten-plus years.
What To Do With This
If you’ve shipped a production retrieval, recommendation, or classification system any time in the last decade, you already know most of what the 2026 harness-engineering literature is going to tell you. You know what the hard parts are: which decisions actually move the metric, that the model swap is rarely where the gains come from, and which evaluation strategies survive contact with real traffic while others collapse the moment users get creative.
Your domain expertise will outlast the new vocabulary by years.
If you’re running the org rather than the system, vocabulary capture has a hiring consequence. It’s a naming decision before it’s a headcount one. The engineers who solved retrieval drift in 2023 don’t have “harness engineer” on their resume; the consultancies organizing the 2026 conference circuit do. Whoever owns the vocabulary in the press releases will own the budget line by Q4. This is how DevOps transformations got sold back to teams who’d already built them. The cheapest move available to you, before the headcount conversation reopens, is to rename what your existing team has been doing in language the industry now uses. That’s your retention story and your defense against the next vendor pitch that assumes you have to buy the practice from outside.
The practical move, for both reads, is to stop reading the press releases and start writing them. The harness engineers who joined the field in 2026 are looking for canonical examples, and you’ve been shipping those for a decade. Document what you actually built. The new vocabulary’s fine to use where it helps the next reader find your work. Don’t assume it describes a new thing. More often, it describes an old thing in better marketing.
The DevOps timeline tells you exactly how this ends: the people who joined the field in 2012 are now senior architects, and the people who were doing the work in 2005 are writing books about it. The window between “I’ve been doing this for years” and “let me tell you about my methodology” is shorter than you think.

