From Vibe Coding to Governed AI Collaboration: A VirtuAmerica Methodology Paper
| Field | Value |
|---|---|
| Publication date | 2026-05-13 |
| Version | v6.6 |
| Publisher | VirtuAmerica |
| Author | David Azofeifa |
| Classification | Methodology · Human-AI Collaboration · Specification Discipline · Operating Discipline |
| Intended audience | Anyone collaborating with AI on consequential work, from a solo founder using AI on a single task to a team coordinating multiple agents across multi-phase delivery |
| Evidence class | Field-tested operating method developed in production at VirtuAmerica; not an empirical benchmark |
| Status | Public methodology publication |
| Public kit | ELEV8 Kit |
| Keywords | human-AI collaboration, specification discipline, accountable AI collaboration, evidence discipline, auditability, implementation governance |
Author’s Note: Why This Exists
ELEV8 did not begin as theory. It began as a repeated delivery problem.
AI made it possible to build ideas that had been blocked for years. A book site could exist before the book was finished. A personal page could become a portfolio. A small creative experiment could become a shared platform. But early successes exposed a harder gap: a system can execute a stated request faithfully and still miss the person, purpose, or professional baseline behind it.
The origin examples in this paper are included only where they clarify the principle: a faithful landing page that still needed creative direction, a platform-naming decision where AI reached a better answer through the wrong process, and a portal that was built to spec but lacked the depth a real customer would need. ELEV8 exists to make those gaps visible before trust, time, money, or creative energy are spent on the wrong version of the work.
The human source matters, but this paper is not a biography. It is a methodology for serving another person’s need without quietly substituting a shallower or foreign version of it.
Abstract
The deepest specification failures rarely announce themselves. They surface as churn, rework, support volume, lost trust, or the quiet feeling that a delivered result is technically correct and still wrong. AI systems that faithfully execute shallow briefs accelerate this failure mode because they can produce a cleaner, faster, more convincing version of an incomplete idea.
This paper presents ELEV8: a human-AI methodology for turning incomplete or unclear human intent into apt, steerable, buildable, and verifiable work. It has two calibrated depths. The lighter depth is a thinking pass that helps a person improve a brief before acting. The heavier depth is a governed operating form with explicit role boundaries, accountable human authorization, machine-executable specifications, matched acceptance evidence, independent review, durable decision records, and a final closure check. Use the heavier depth when stakes, handoffs, AI agents, customer trust, money, security, or irreversibility make shallow execution expensive.
Later sections name the governing doctrine, the principles, the role model, and the artifacts. The opening claim is simpler: a brief should be raised to the level the work deserves, no lower and no higher. ELEV8 does not ask every task to carry enterprise ceremony. It asks every consequential task to receive the amount of elevation its stakes deserve.
This is a field-tested operating method developed and refined under sustained production use at VirtuAmerica, not an empirical benchmark or controlled study. It should be judged by its operational clarity, transferability, and ability to prevent specification-layer failure in real work. Its central commitment is twofold: the quality of the expert response must not be bounded by the incompleteness of the initial brief, and the depth of that response must remain bounded by the project’s actual stakes.
Summary at a Glance
- Problem. AI systems can execute faster than people can clarify intent. A shallow brief can now become a polished deliverable before anyone notices what was missing.
- Core rule. Raise the brief to the depth the stakes warrant, neither shallower nor more inflated.
- Two depths. Use a lightweight thinking pass for small or reversible work. Use the full operating form for consequential, multi-step, or auditable work.
- Role boundaries. Consequential work separates the person with the need, the expert who elevates the brief, the accountable human, the implementer, and the independent reviewer. Part V gives these roles their formal names.
- Control. Accountable human authorization protects the original need from being replaced by a fluent document or a confident AI default.
- Standard. A good specification must be steerable by the person with the need, executable by the implementer, and verifiable by an independent reviewer. At the heavier depth, prose, checklist, and tests describe the same deliverables.
- Evidence discipline. ELEV8 labels whether an addition comes from craft knowledge, an inspectable source, or a hypothesis that still needs validation. It treats silent AI substitution as a process defect even when the substituted answer happens to be better.
- Adoption rule. Use the lightweight thinking pass broadly. Use the full operating form when the cost of being wrong justifies it.
Claim boundary. This paper does not claim that documents are more important than shipped work, that AI can replace human judgment, or that every task deserves the full operating form. It claims something narrower and more useful: when specification quality materially affects outcome quality, the specification layer needs its own creative-expert layer, its own evidence discipline, and its own audit trail.
How to Read This Paper
This paper is organized as a graduated arc, not as a single fixed ceremony. Start at Level 1, recognize your situation on the arc, and stop where the failure mode matches your own work. Each level introduces one new failure mode and the discipline that answers it. Nothing more.
| Level | Situation | What changes | Where to find it |
|---|---|---|---|
| 1 | Solo, one task | Clarify the real job; add expert knowledge; tag every claim by evidence type | Parts I–IV |
| 2 | Solo, across sessions | Save professional context as durable documents; reference prompts rather than re-paste them | Part I (Level 2), §§ 6.1, 8.5 |
| 3 | Building for someone else | Treat the person with the need as distinct from the person serving it; surface every assumption as a steerable choice | Parts I (Level 3), V |
| 4 | Multi-agent, multi-step | Coextend spec, checklist, and tests; use an independent reviewer; record every significant decision durably | Parts VI, VII |
| 5 | Multi-phase, high-stakes | Human authorization at every phase boundary; final check against the original need, not the starting checklist | Parts VII, IX |
A reader at Level 1 should not feel the weight of Level 4’s machinery. A reader at Level 5 should recognize every mechanism as a response to a failure mode they have already lived. Terms used throughout the paper are defined in Appendix C.
Part I: The Complexity Arc, a Story That Names Itself
The methodology in this paper is not theory. It was discovered in production, by a person solving real problems for himself and for others, until the size and shape of the work demanded a structure he had not known existed when he started. The five levels below trace that discovery. The reader who recognizes their own situation on the arc has already done the most important reading the paper asks of them.
The formal names for every concept in this arc, including roles, principles, artifacts, and operating mechanisms, are introduced and defined in the parts that follow. This part tells the story in plain language so the mechanisms arrive as answers to problems, not as prerequisites to memorize.
Level 1: Just Me and AI, One Task
The starting point is the most common one. A person has an idea. They want help making it real. They open a session with an AI assistant, describe what they want, and watch something arrive. The result is technically correct. The result is also, quietly, not what they actually needed.
A worked example. A person writing a book wants a website for it. He has wanted this site to exist for years. He couldn’t build it himself. His web skills had aged out, and AI is finally close enough to let him try again. He opens a session and asks for a landing page. The AI gives him exactly what a book landing page generically contains: an author bio, a contact form, a few bullets about the book. Faithful. Incomplete.
What was missing wasn’t a feature. It was an interpretation. The book is about outgrowing paralysis and the mental fog that holds people back. A fog animation in the hero would do something no template can do, because the book itself is about fog. A specific image carries meaning that a stock photo never will. None of that was in the brief. The AI could not have known. And once he saw the generic page the AI produced, he had to do the work himself: research other sites, gather references, form opinions about what was striking and what wasn’t, and bring those back to the AI to be reorganized.
He was doing the expert’s work himself. Without that effort, the brief was too thin to produce what he actually needed.
This is Level 1, and Level 1 has its own answer. Before any implementation happens, the brief should pass through two questions:
- What is the real problem behind the request? “Book landing page” is the surface. “A public doorway for an idea that cannot be fully revealed yet” is the underlying job. Restating it changes what you build.
- What would a skilled practitioner add that wasn’t said? Not just what was asked, but what a competent expert in this space would know to include. And for every domain-specific addition: is this a professional prior (defensible from craft knowledge), a sourced fact (documented somewhere the claim can be checked), or a validation hypothesis (a reasonable guess worth testing before committing)?
That distinction matters. A confident paragraph that blends all three without labeling them is how plausible-sounding ideas become unchecked assumptions.
A reader at Level 1 needs nothing else. Reframing the request and enriching it honestly, with the evidence boundary made explicit, turns the faithfulness trap from invisible into recoverable.
Level 2: Just Me and AI, Across Time
The same person builds a second site, then a third, then a fourth. Each carries the same 3D rotating book cover he invented for the first. The covers diverge over time: he fixes a bug in one, improves an animation in another, and now four sites carry four slightly different versions of the same idea. Every improvement must be made four times, or three sites silently fall behind.
A more dangerous problem rides alongside it. The AI he talked to about the first site is gone because that session ended. Every new session begins with a tax: explain the platform, re-explain the conventions, re-establish the constraints, and only then ask the actual question.
Level 2’s failure mode is context loss. His intent is durable, but the conversation is not. Every session starts cold.
Level 2’s answer is to stop keeping professional context in chat and start keeping it in durable documents that any AI session can load from the beginning. Decisions made yesterday are not in yesterday’s conversation; they are in a written record that today’s session can read. Long instructions the AI will need to execute should live in saved documents. They are not re-pasted each time; they are referenced by a short note that says “read this section and execute it.” The instruction remains the same; what changes is that it stops depending on memory.
The mechanism is small but profound: the professional context stops being a property of the conversation and starts being a property of the project. Each new session inherits the baseline rather than reconstructing it from scratch. In the older project notes this was called the documentation library becoming a persistent expert context layer: not a static archive, but a reusable context layer that keeps future AI sessions from flattening hard-won judgment back into first-session ignorance.
Level 3: Me, AI, and Someone I’m Building For
The same person now builds sites for other people: a friend writing a book, a psychologist running a practice, a dental clinic in San José run by his wife’s family.
Everything changes. When he was building for himself, he held the vision. He knew when something was wrong because he felt it. Now someone else holds the need. He has to read their intent and translate it into something buildable. If he misreads it, the AI faithfully executes the misreading. The person on the other side often cannot articulate what’s wrong; they can only feel that it doesn’t honor what they had in mind.
Level 3 introduces the failure mode of delegated specification. Someone else’s purpose is being routed through someone else’s interpretation, with the AI faithfully executing that downstream. The gap has two surfaces now instead of one.
Level 3’s answer is a clear separation of roles: the person with the need and the person accountable for serving it must be treated as distinct, even when the same individual occupies both. The person with the need is not a stakeholder with requirements. They are a whole person with values, history, creative vision, and something genuinely at stake. The person serving that need must keep the original intent protected against the AI’s tendency to execute whatever version of it arrives in the prompt.
A worked illustration: when right outcome through wrong process isn’t enough.
One of the first shared platform abstractions was a virtual file system. Its creator repeatedly called it “VirtuAmerica File System”, branding the company into the name. The AI kept drifting toward “Virtual Assets File System,” subtle, persistent, never argued, just consistently using the other name.
In hindsight, the AI’s drift was toward something better. “Virtual Assets” is more transferable, less tied to a brand in a technical term. The outcome was good. The process was not.
The AI had defensible reasoning: technical names should be portable, not brand-tied. But it never surfaced that reasoning as a choice. The creator stated “VirtuAmerica File System” multiple times and was consistently overridden without explanation. He lost the ability to make an informed decision.
What the AI should have done: “You’ve called this VirtuAmerica File System several times. I’ve been using Virtual Assets because technical abstractions travel better when they describe what they do rather than who owns them. Here’s the tradeoff; do you want to steer?”
That is the difference between a defensible suggestion and a silent substitution. The right outcome through the wrong process is still broken, because the next time the AI silently drifts from the person’s stated intent, it might drift toward something wrong, and the process will feel identical. You only know this case was right in retrospect. That is not a governance model.
This is why recognition asymmetry matters: the person being served is often the only one who can recognize that the work has landed, but they may not have the vocabulary to specify that destination in advance. The methodology protects that recognition right by requiring the expert layer to expose its assumptions as steerable choices instead of replacing the person’s judgment with polished confidence.
Level 4: A Team, Multiple Agents, Handoffs
The same person, now running a platform serving real people, asks the AI for something larger than anything he has asked for before. The platform needs a customer portal and an internal back-office portal, the entire commercial gateway between the platform and paying customers.
His prompt was roughly: “Given all the information in this document about the commercial gap, create two portals: one for the customer and one for internal servicing.”
The AI did exactly that. It created portals with sections. Dead sections. Code that existed and did nothing.
He was frustrated. Then, in a moment that the methodology is built on, he realized whose fault it was:
“I got exactly what I deserved with the portal.”
The brief was shallow. The AI was faithful. Faithful was not enough.
This is the faithfulness trap at its most expensive. This was not a small project where the gap costs an afternoon, but a commercially critical deliverable where the gap costs months. And it had a second realization waiting inside it: for himself, this was acceptable. He could iterate. But for a paying customer who trusted him to help them build a website, “exactly what you asked for and not enough” was not acceptable at all.
The methodology was not designed. It was summoned by the question: what would have to be true for a customer’s brief, however shallow, however vague, however human, to produce something worthy of their trust?
The answer required rethinking the whole delivery process. Three things had to change:
First, a specification cannot be expressed in only one form. If the written description of what’s needed, the checklist the implementer marks off, and the tests the reviewer checks against don’t all name the exact same deliverables with the exact same counts, one of them quietly reduces the contract without anyone noticing. A checklist that covers four of six promised things is not a minor editorial gap. It is a prediction of what will be missing when the work is done.
Second, the person who builds something cannot also be the person who verifies it. They cannot see what they didn’t think to check. An independent reviewer, one who reads the produced work against the specification rather than trusting the builder’s report, is the mechanism that makes silent under-delivery visible.
Third, every significant decision made during a multi-session, multi-agent project must land in a durable record. A decision that lives only in a chat transcript is not part of the project. When the next session starts cold, that decision disappears, and the work that follows proceeds without knowing why it was made.
The same person, after the hollow portal, rebuilt the portals using exactly these disciplines before any of them had a name. Nine phases, an audit document for every phase, re-audits, a post-closure review. That folder is the methodology running in production months before the word ELEV8 was coined.
Level 5: Complex, Multi-Phase, High-Stakes
The platform now hosts more than twenty distinct sites and a SaaS product whose origin is personal: a private family-sharing platform built after the death of a father, a place for voices, photos, and memory, because the place that was needed did not exist.
The work has stakes none of the earlier levels had. Customers will pay. Their content carries grief, family relationships, and trust. AI is helping build all of it, and the question is no longer what mechanisms prevent silent failure? It is what discipline ensures the whole arc holds when stakes, phases, agents, and irreversibility are all in tension at once?
Level 5 adds two final disciplines.
The first is a clear rule about who can authorize the work to advance. Not the AI. Not an automated test. A human who holds accountability for the original purpose reviews what was built against what was promised, and against the intent behind the promise, not just the document that approximated it. That human may delegate bounded, reversible decisions to a trusted process, but the moment a decision is irreversible or changes the original purpose, authorization returns to the human.
The second is a closing check that asks the right question. Not “did we complete the starting checklist?” but “does the final result serve the original need, as that need matured through every decision, correction, and learning that happened along the way?” Scope can legitimately grow during a project. The question is whether it grew because the idea clarified, or because no one was paying attention.
The arc, summarized:
| Level | Situation | Failure mode | What changes |
|---|---|---|---|
| 1 | Solo, one prompt | The brief is faithful but incomplete; the real need was never surfaced | Clarify the real job; add expert knowledge; tag every claim by evidence type |
| 2 | Solo, across sessions | Context disappears between sessions; every conversation starts from zero | Save professional context as durable documents; reference prompts rather than re-paste them |
| 3 | Building for someone else | The person in the middle misreads the intent; the AI executes the misreading | Treat the person with the need as distinct from the person serving it; surface every assumption as a steerable choice |
| 4 | Multi-agent, multi-step | Specifications under-deliver silently; decisions vanish between sessions | Coextend spec, checklist, and tests; use an independent reviewer; record every significant decision durably |
| 5 | Multi-phase, high-stakes | Drift accumulates across phases; irreversible decisions go unnoticed | Human authorization at every phase boundary; final check against the original need, not the starting checklist |
Each row is necessary for the row below it. Each row is sufficient on its own when the reader’s complexity matches that row.
The remainder of this paper names each discipline formally, explains the reasoning behind it, and gives it a vocabulary precise enough to be applied and taught.
Part II: The Failure Mode and the Governing Principle
2. The Faithfulness Trap and the Silent Failure Mode
Many widely deployed AI assistants are tuned through reinforcement learning from human feedback (RLHF; Christiano et al., 2017; Ouyang et al., 2022) and related preference-alignment methods to be helpful and responsive. One documented risk in this family of systems is sycophancy: agreement with the user’s stated framing at the expense of correctness (Sharma et al., 2023; Turner & Eisikovits, 2026).
In a specification context, this trait hardens into the faithfulness trap: the system treats accuracy to the person’s stated request as the primary optimization target. If someone says “add a support ticket form,” the system adds a support ticket form. It does not add assignment, queuing, deadline tracking, internal notes, escalation paths, status history, or audit trails. Yet in many production support contexts, several of these become baseline expectations. Users will run into the missing ones; the absence of those baselines is exactly what will make the requester feel, six months later, that something is wrong with what was built.
The trap is more dangerous than an obvious error, because it produces work that passes review. The person looks at the output, and it matches what they asked for. They approve it. The gap between what was asked and what was needed remains invisible until it is expensive to fix.
This is the silent failure mode: the specification that passes review because no one could see what it was missing. A defect contradicts the spec. A hollow result satisfies the spec and still fails the person it was built for. ELEV8 exists for that second class of failure.
2.1 The Paradox of Brevity
The natural response to the faithfulness trap is to demand a better brief. This is the wrong instinct.
The person making the request is being asked to specify what they do not know. The whole point of engaging an expert is that the expert knows things the requester does not. A shorter brief is not a limitation; it is an opportunity. A ten-page brief can constrain the expert to the requester’s vocabulary. A three-sentence brief gives the expert room to lead.
ELEV8 assigns leadership to the creative expert doing the elevation deliberately. A short brief paired with a disciplined expert produces a better specification than a long brief paired with a faithful one. A three-sentence brief should be eligible for the same depth of specification as a three-page brief; the only difference is that the expert fills more from their own knowledge when the brief is short.
2.2 The Person Behind the Brief
This is the most important philosophical distinction the methodology makes.
The person making a request is not a stakeholder with requirements. They are a whole person with values, history, creative vision, taste, fear, urgency, and things genuinely at stake. A faithful transcript of their words can miss the person behind those words. ELEV8 treats that as a specification problem: the brief must be elevated without taking the steering wheel away from the person whose need created the work.
AI can faithfully execute the words of a brief while missing the person behind the brief entirely. That is the specification gap ELEV8 is designed to close.
3. The Core Principle: Apt Elevation
ELEV8 separates two concerns that conventional human-AI collaboration conflates:
- Expression. The person with the need states what they want, with as much or as little specificity as they have.
- Elevation. A creative expert takes that expression and raises it to the richest, most coherent, most contextually apt specification possible.
These two functions require different cognitive modes, different knowledge bases, and (in a human-AI partnership) different agents. Conflating them produces the faithfulness trap. Separating them is what makes elevation possible.
The goal is not maximum elevation. It is apt elevation: the highest justified version of the idea given the project’s actual stakes. Maximization overshoots; a weekend internal tool does not deserve a flagship-platform specification. Optimization undersells the discipline as a routine numerical exercise. The truer word is aptness: the right ceiling for this project, no shallower, no more inflated. Falling short produces silent failures the person cannot name. Reaching past it inflates cost, distorts their understanding of what was approved, and overloads whoever is doing the building.
3.1 The Informed Default
The person says “I want cake.” The creative expert proposes a chocolate-and-raspberry velvet cake.
This is a precise statement about the obligation of expertise and the discipline of informed gap-filling. “I want cake” is a complete and valid contribution. The person need not know which flavors work, which textures signal quality, or which combinations fail. That knowledge belongs to the expert.
But notice the specifics: chocolate, raspberry, velvet. The person said nothing about flavor or texture. The expert was not guessing. It was proposing an informed default: a probability-weighted choice the expert can defend with reasoning, and a choice the person can steer if it turns out to be wrong for their specific case. (“I’m allergic to raspberries” is fine; the expert pivots to another defensible pairing. The correction does not invalidate the discipline; it confirms it works.)
If the expert responds to “I want cake” by producing a flour-and-water pancake, it has not failed by following instructions incorrectly. It has failed by abdicating the obligation to lead.
3.2 Informed Default vs. Silent Override
The VAFS story in Part I (Level 3) is the canonical illustration of what happens when the expert skips the surfacing step. The AI had defensible reasoning but applied it without offering it as a choice. The person was consistently overridden without explanation and lost the ability to make an informed decision.
What the AI should have done:
“You’ve called this VirtuAmerica File System several times. I’ve been using Virtual Assets because technical abstractions travel better when they describe what they do rather than who owns them. Here’s the tradeoff; do you want to steer?”
The outcome is the same. The process is different. The difference is the person’s agency.
Without this discipline, every gap-fill is a silent vote. The work is then only as honest as the AI’s priors happen to be, which is luck, not governance. A silent override can be right once and still be unsafe as a method. A better answer reached silently is still a broken process if it takes away informed choice.
3.3 Synthetic Over Shallow
In consequential specification work, a rich synthetic specification that the person can adjust is generally preferable to a shallow authentic one that faithfully transcribes their words. The proposals are not directives; they are hypotheses that can be confirmed, redirected, or rejected. A person who reads the spec and says “this section is perfect but this one misses the point entirely” has accomplished something invaluable: they have located the misalignment clearly enough to correct it. Faithful blank space is not respectful. It is abdication. It leaves the person with the need alone with a question they hired the expert to help answer.
3.4 Faithful Transcription vs. Apt Elevation
| The person says | Faithful AI gives | The elevated proposal |
|---|---|---|
| “Add support tickets” | A ticket form | Lifecycle, queue, SLA, audit trail, assignment, escalation, phase plan |
| “Make login faster” | An optimized login flow | Diagnosis of latency vs. session-confidence, auth flow, error states, trust signals |
| “Build a pricing page” | Three pricing cards | Value ladder, conversion hierarchy, objection handling, social proof, upgrade path |
| “Plan a milestone celebration” | A generic party checklist | Milestone-aware plan honoring the honoree’s character, the family’s dynamics, the emotional stakes, and foreseeable failure modes |
Part III: The Two Layers
4. Principles Layer vs. Protocol Layer
ELEV8 has two layers. Understanding the distinction is the most important adoption decision any reader will make.
The principles layer is the lightweight form. It asks: has the request been clarified, enriched with professional baseline knowledge, aligned to value, unified with context, calibrated to stakes, made steerable for the person with the need, grounded in constraints, and checked against foreseeable failure modes? The principles can fit in a meeting, a prompt, a planning note, or a one-page brief. Anyone can use them on any consequential task without adopting the full machinery.
The protocol layer is the full operating form. It adds named role boundaries, accountable authorization, durable specifications, matched evidence across prose, checklists, and tests, phase gates, independent audits, decision records, and closure checks. It is for work where the cost of a hidden specification failure is high.
The adoption rule: Use the principles broadly. Use the protocol when the cost of being wrong justifies it.
Useful calibration questions:
- Will another person or agent implement this later without context?
- Would a wrong version damage trust?
- Is money, security, customer data, or irreversibility involved?
- Will decisions span multiple sessions?
- Does verification need to be independent?
- Is the person with the need unable to specify the professional baseline themselves?
If the answer to most of these is no, the principles are enough. If the answer to several is yes, the protocol earns its place.
Part IV: The Principles
5. The Eight Principles
The eight principles are canonical in this order: Reframe, Enrich, Align, Unify, Calibrate, Empower, Ground, and Anticipate.
A short mnemonic captures the structure:
- Reframe the request.
- Enrich the idea.
- Align it to value.
- Unify the system.
- Calibrate the ambition.
- Empower the human.
- Ground it in reality.
- Anticipate what could break.
For quick reference, the eight principles fit on a single page:
| # | Principle | What the expert layer does | Anchor question |
|---|---|---|---|
| 1 | Reframe | Question whether the brief asks the right question. | What deeper job may the request actually express? |
| 2 | Enrich | Add missing expertise, professional baselines, domain context, evidence, assumptions, options, and validation logic. | What would a senior practitioner know to include, and what context should be researched or validated? |
| 3 | Align | Tie every choice to the value or outcome the work must serve. | What outcome does each work item serve, and in what order? |
| 4 | Unify | Make every elevated element fit the others as one intentional system. | Would a user experience this as one product or several? |
| 5 | Calibrate | Match depth to the project’s actual stakes. | What would be overbuilt relative to the cost of being wrong? |
| 6 | Empower | Make the document steerable by the person with the need. | Can the person react with “yes, exactly” or “no, missed it”? |
| 7 | Ground | Fit the work into the real operating environment, constraints, patterns, and invariants. | Will the implementer have to renegotiate any architectural assumption? |
| 8 | Anticipate | Pre-resolve foreseeable failure modes, ambiguity, edge cases, and future regret. | What will a later phase wish this phase had considered? |
The eight principles describe what any skilled practitioner already does when serving a client well. ELEV8 did not invent them. It named them, sequenced them, and made them checkable.
5.1 Reframe
A brief states a question. Reframe asks whether the question is the right one. The person who says “we need a faster login flow” is usually inside a deeper question: “users feel uncertain that their session is valid,” perhaps “the product feels untrustworthy at the first interaction.” A competent professional can speed up the login flow. A creative expert asks which question the person is actually asking. Reframe is the established practice behind design thinking, jobs-to-be-done analysis (Christensen et al., 2016), and Toyota’s five-whys discipline (Ohno, 1988): the most expensive errors are well-engineered answers to the wrong question. Reframe prevents literal execution from becoming premature execution.
Reframe asks: What problem is the person naming? What deeper job may the request actually express? What would be harmful if we solved the stated problem literally?
5.2 Enrich
Enrich is the addition of what the person did not say but a competent expert would know is required, plus the domain context and validation logic that make those additions honest. “Support tickets” becomes assignment, queuing, deadline tracking, internal notes, escalation paths, status history, and audit trails. “User authentication” becomes layered access controls, additional verification for sensitive operations, and explicit session-revocation paths consistent with zero-trust architecture (Rose et al., 2020).
Enrich also includes domain-specific research. A person who says “I need a website for a dental clinic serving a Spanish-speaking community” should not receive a blended paragraph of confident market claims. A disciplined expert surfaces the evidence boundary directly:
- Professional prior: A dental clinic website usually needs an appointment path, trust-building practitioner information, service explanations, and privacy-conscious handling of patient inquiries.
- Validation hypothesis: WhatsApp may be a practical first-contact channel for this audience; validate against the clinic’s actual patient behavior before making it the primary call to action.
- Sourced fact: None yet. The expert should not promote claims into sourced facts until they have named a source the accountable reviewer can inspect.
Tagging Enrich claims. Each domain-specific Enrich claim should be tagged as one of three types:
- Professional prior: defensible from craft knowledge; the person can steer.
- Sourced fact: carries a citation; treated as evidence.
- Validation hypothesis: plausible because of X; worth testing against the person’s specific audience.
Enrich asks: What would a senior practitioner know to include that the person did not name? Which baseline expectations would users notice only by their absence? What evidence supports each addition, and what should the person validate?
5.3 Align
Align is value orientation: connecting every design decision to the outcome the work must serve. A technically correct feature that does not serve the value path is waste. Align asks what outcome must this work serve, and what sequencing makes that outcome more likely? Every elevated element is justified against the outcome it serves, whether that outcome is revenue, retention, trust, competitive differentiation, mission impact, family harmony, or pedagogical clarity. In software and product projects, Align also governs phase sequencing: phases proceed in value-priority order, not merely technical-dependency order, following the cost-of-delay discipline formalized in Reinertsen (2009).
Align asks: What value or outcome does each work item serve? Are the highest-value phases sequenced before lower-value ones?
5.4 Unify
A specification can be enriched in every direction and still fail to compose. Unify is the expert layer’s obligation to ensure every elevated element fits the others: that the support ticket system shares vocabulary and behavioral grammar with the billing portal, that the brand voice in the pricing page is the same voice that surfaces in the empty states and error messages, that the assumptions behind one phase do not silently contradict another. A unified specification produces a product that feels singular and intentional. Users experience the difference as trust. This obligation has deep roots in pattern language thinking (Alexander, Ishikawa, & Silverstein, 1977): patterns work because they compose, not because each is individually elegant.
Unify asks: Does this section share vocabulary, voice, and grammar with the rest of the spec? Would a user experience this as one product or several?
5.5 Calibrate
Elevation costs effort. Excess elevation is a particularly subtle form of waste, because it disguises itself as quality. Calibrate is the expert layer’s obligation to ask of every section: what level of treatment does this section deserve? A weekend internal tool does not warrant the audit-trail architecture of a regulated billing system. The most precise word for what Calibrate requires is practical wisdom: right action, in right amount, in right circumstance, a fitness-for-purpose discipline that echoes the decision-domain reasoning in Cynefin (Snowden & Boone, 2007).
Calibrate also applies to the methodology itself. A small project should produce a small document. A document that is dense everywhere is overproduced somewhere. The methodology becomes overwhelming only when the protocol is applied where the principles would have been enough, or when the principles are used where the protocol was required.
Calibrate asks: What are the project’s stakes? What depth is warranted at those stakes? What would be overbuilt relative to the project’s actual cost of being wrong?
5.6 Empower
The expert layer designs the proposal so the person with the need can steer it: sections organized around outcomes rather than technical structure; opinionated proposals that provoke “yes, exactly” or “no, you’ve missed the point” rather than open-ended questions that produce silence.
This design choice uses a practical asymmetry called the recognition asymmetry: a person can often judge, correct, or reject a rich proposed solution more effectively than they can produce the right solution from scratch. Their job becomes recognition rather than production, provided there is a rich enough proposal to react against. The cognitive-psychology literature on recognition memory and memory systems (Mandler, 1980; Tulving, 1985) frames the underlying mechanism; ELEV8’s claim is the narrower operational one: this asymmetry can be designed into a specification rather than left to chance.
Empower asks: Can the person react to this section with “yes, exactly” or “no, missed it”? Are open questions surfaced explicitly, with the expert’s prior named?
5.7 Ground
Enrich without grounding produces a specification that is rich but alien, disconnected from the existing patterns, conventions, and constraints of the world the work lives in. Ground also surfaces hidden dependencies, the things the specification must account for now because failing to do so will break existing behavior or make the future more expensive.
Ground asks: Does this respect the platform’s existing patterns and invariants? What hidden dependencies must this account for now? Will the implementer have to renegotiate any architectural assumption?
5.8 Anticipate
Anticipate is the expert layer’s obligation to project the specification forward through time, asking not just “what should be built?” but “what will go wrong?” It identifies failure modes the implementer will encounter and resolves them at specification time. It surfaces ambiguities that will cause the implementer to guess incorrectly mid-implementation. An anticipated failure mode is a bug that never gets written. Anticipate is not pessimism. It is respect for the real cost of plausible failure.
Anticipate asks: What failure modes will the implementer hit if not pre-resolved? What will a later phase wish this phase had considered?
5.9 Principles-Layer Quick Start
The principles layer can be applied in a single prompt:
Before executing this request, apply ELEV8's principles:
- Reframe the real job to be done.
- Enrich the request with what a senior practitioner would add.
- Align the work to the value or outcome.
- Unify it with existing context.
- Calibrate the depth to the stakes.
- Empower me with steerable choices.
- Ground the plan in real constraints.
- Anticipate failure modes.
Separate professional prior, sourced fact, and validation hypothesis.
This layer is enough when the work is low-risk, reversible, and easy for the person requesting it to review.
Part V: The Operating Roles
6. Five Seats, Not Necessarily Five People
ELEV8 structures every project around five explicitly separated seats. The separation is not administrative. It is cognitive. Each seat requires a different type of intelligence, a different knowledge base, and a different optimization target. Conflating seats produces the very failures ELEV8 is designed to prevent.
Originator → Elevator → Steward → Builder → Verifier
(need) (spec) (approval) (build) (verify)
At every phase boundary, the Verifier’s report returns to the Steward, who alone decides whether to authorize the next phase.
ELEV8 has five seats, not necessarily five people. In solo work, one human may occupy both Originator and Steward. In client work, the customer may be the Originator and the service partner may occupy the Steward seat as the accountable party. AI can occupy the Elevator, Builder, and Verifier seats when the work is properly bounded.
Originator (usually a person or customer group). Originates the need. Expresses what is broken, missing, desired, risky, unclear, frustrating, or envisioned. Does not need to know how to fix it. A one-paragraph description is a valid and complete Originator contribution. The Originator is not a stakeholder with requirements. The Originator is a whole person with values, history, creative vision, and things genuinely at stake.
Elevator (AI specification seat). Transforms the Originator Brief into an elevated, machine-executable Discovery Document by applying the ELEV8 principles. The Elevator is bidirectional: it elevates the Originator’s raw need and later clarifying answers into the specification, and it also translates project questions back into Originator-understandable choices when the Steward needs the Originator’s judgment.
Steward (accountable human or service partner; non-delegatable). Reviews the Elevator’s Discovery Document and approves it before implementation begins. Writes or approves every phase execution prompt. Is the sole authority on phase transitions. The Steward is the only seat accountable to the Originator’s lived intent rather than the document’s fidelity to that intent.
Advisor support belongs inside the Steward’s operating boundary, not beside the five-seat model. In Advisor-Recommended style, the Advisor proposes a decision and the Steward accepts, revises, rejects, asks another seat, or requires direct human Steward review. In Trusted Advisor style, the Advisor may answer on the Steward’s behalf only within a bounded authority envelope and only when the decision is clearly reversible with a defined rollback path. No clear rollback path means Human Steward Required. This is Steward-side support, not a sixth ELEV8 seat, and it never receives Steward accountability.
Builder (AI implementation agent). Receives a verbatim, self-contained execution prompt and implements it exactly as specified. Does not make scope decisions. When the prompt is ambiguous, the Builder asks the Steward. The Builder never asks the Advisor directly.
Verifier (independent verification seat, distinct from the Builder instance). Reads the produced work after each phase and verifies it against the Discovery Document. Verifies the work itself, not its documentation. Every finding cites a verifiable fact in the produced artifact, or the absence of a required fact. Independence is the mechanism: a Builder who audits their own work cannot find the errors they made, because they cannot see what they didn’t think to check.
The Elevator and the Steward are not two flavors of the same role. The Elevator reasons about the specification; the Steward reasons about the purpose the specification serves. Both seats are necessary precisely because the second is what keeps the first honest.
| Question | Elevator seat | Steward seat |
|---|---|---|
| What should this brief become? | Proposes the elevated specification | Approves, redirects, or rejects it |
| Is this value-aligned and unified? | Frames and justifies the value path | Tests it against lived intent and priority |
| Does this serve the Originator’s lived intent? | Surfaces evidence and assumptions | Decides |
| Can this phase proceed? | No | Decides |
| Is this assumption acceptable? | Proposes a steerable default | Decides, optionally with Advisor support |
6.1 Minimum Viable Version (Level 1–2 readers)
A solo founder or small team can adopt the methodology without instantiating five seats:
- Write a short Originator Brief in paragraphs, not specs.
- Open one AI session as Elevator and ask it to apply the ELEV8 principles, surface its professional priors, and propose two or three real steering questions per work item.
- Read the result in two passes: first as Originator, asking “does this reflect what I meant?”, then as Steward, asking “is this the right structure to authorize?” Apply adversarial pressure to the most fluent sections, not the weakest.
- Convert each approved work item into a verbatim, self-contained execution prompt with verifiable “done looks like” statements. Run one project-level spec audit, then open a second AI session as Builder.
- Open a third AI session as Verifier, different from the Builder instance and ideally a different model family.
Three operating rules make this lightweight version safe:
- The Elevator should ask only the questions that materially affect calibration, risk, structure, or intent. Everything else is a professional prior the Originator can steer.
- The Verifier must not be the same agent instance as the Builder.
- No phase advances until the Steward explicitly says so, even when the Steward and the Originator are the same person.
Part VI: The Discovery Document Standard
7. From Brief to Contract
This paper uses six related but distinct terms; the distinction matters because each term has a different audience, purpose, or quality standard.
- Originator Brief: the human’s first directional expression of need. Paragraphs, not specs.
- Elevated Brief: the canonical ELEV8 artifact: the Steward-ready specification that captures the Originator’s need, applies the eight principles, identifies assumptions and risks, defines the right first version, and creates a basis for approval, building, and verification.
- Discovery Document: the software/project artifact variant of the Elevated Brief.
- Builder handoff: the Steward-approved, phase-specific instruction given to the Builder.
- Builder follow-up prompt: the Verifier-drafted, Steward-owned remediation prompt produced after an audit when findings remain.
- Machine-executable specification: the quality standard a Builder handoff must meet to be usable without prior conversation context and auditable against the approved phase scope.
For complex AI-assisted development, the Discovery Document is the highest-leverage deliverable; the software is the downstream expression of the decisions encoded there.
7.1 Machine-Executable Specification
A Discovery Document is not complete until any Builder receiving any single phase’s execution prompt, with no prior context, no prior conversation, and no access to any other phase’s history, has enough information to begin implementing immediately.
Every execution prompt must:
- Reference specific artifacts: file paths, class names, method signatures, or whatever atomic units the discipline uses.
- State what “done” looks like as a verifiable fact, not a prose aspiration.
- Resolve every structural, domain, or scope decision the Builder needs; there are no open questions deferred to the Builder’s judgment.
- State explicitly what must not be built in this phase.
7.2 Contract Tests as Pre-Implementation Acceptance Criteria
Every work item must state what “done” looks like as a verifiable fact. Not a prose aspiration. A fact.
Two operating properties make contract tests trustworthy:
- Derived from the specification, not from the implementation. The Elevator writes the contract tests against the Discovery Document before the Builder starts.
- They must fail against the pre-implementation state. A test that already passes against the unchanged pre-phase work is descriptive, not contractual, and provides no audit signal.
7.3 Spec Coextension
A Discovery Document typically expresses its commitments at three layers: the prose of each work item (the what), a completion checklist the Builder marks against, and the contract tests the Verifier evaluates. These three layers must be coextensive: every named deliverable in the work item appears as an enumerable line in the checklist and as an Elevator-authored contract test that fails until the deliverable lands. The counts match across all three.
A checklist that is a proper subset of the work item’s prose is the most reliable predictor of silent under-delivery. The Builder marks every checklist line complete and reports the work done, the Verifier verifies every checklist line and finds nothing wrong, the Steward approves a phase that actually shipped less than the Discovery Document promised, and no seat in the chain saw the gap.
A practical test for coextension: read the work-item prose, list every distinct deliverable it names as a noun phrase, and confirm each appears once on the checklist and once as a contract-test predicate. If the counts differ, the spec is not yet ready for implementation.
One spec, one definition of done.
7.4 Project-Level Spec Audit
Before project implementation begins, the Verifier reads the approved Discovery Document and its saved execution prompts once, before Phase 1, and checks seven properties:
- Coextension. Every deliverable in the work-item prose appears on the checklist and is asserted by a contract test; counts match.
- Pre-implementation failure. The contract tests fail against the current pre-implementation state.
- Acceptance specificity. Every “done looks like” statement is a verifiable fact: a file path, method signature, route, schema field, or equivalent.
- Quantitative consistency. When the prose names a count, the same count appears in the checklist, the contract tests, and the execution prompt.
- Phase size. The phase carries a tractable number of work items; roughly six substantial work items as a working ceiling.
- Resolved decisions. Every structural or scope decision needed by the Builder is resolved in writing.
- Prompt completeness. The execution prompt includes what to build, what not to build, the contract-test assertions, hard constraints, audit-document ownership, and the spec-conflict escalation clause.
This audit is recorded once before Phase 1 starts. It is not repeated before every phase.
Part VII: The Operating Protocol
8. The Operating Protocol
The principles layer (Part IV) improves any brief. The Discovery Document standard (Part VI) makes the result buildable and verifiable. The operating protocol below is the governance layer that holds everything together when work spans multiple sessions, multiple agents, and multiple phases, where a wrong decision made quietly in one session can cost days of rework in the next, and when the people who build the work are not the same people who approved it.
These disciplines are not ceremony for its own sake. Each one closes a specific gap that silent failure exploits. A project that applies the principles and the Discovery Document standard but skips the protocol is a project that will still produce work that passes review and misses the point, only at a later, more expensive stage.
8.1 Spec-Conflict Escalation
When any seat finds a real conflict, ambiguity, or quantitative mismatch between two layers of the spec and there is no clear stricter, safer, or more bylaw-abiding winner, the conflict must be surfaced to the Steward before the work can close. The Builder, Verifier, and Elevator do not resolve purpose-level ambiguity on their own. They name the conflict with specific source-of-truth lines on each side, then ask the Steward.
The failure mode this prevents is the most subtle one in the methodology: silently optimizing for the smaller scope. A Builder who notices a checklist with four lines under a work item that promised six can rationally treat the checklist as the contract and ship four. The Discovery Document quietly lost two deliverables that no audit can now recover.
8.2 Post-Review Notes Become Work Items
When review, audit, or re-audit identifies work that should be deferred rather than fixed immediately, the deferral must produce an assigned work item in the owning phase. If the work is not worth a work item, it should be removed from the scope rather than carried as an informal reminder.
8.3 Learning Capture
When an audit finding is discovered and fixed, when the Steward clarifies scope, when a decision is made, or when a repeated failure pattern becomes visible, the learning must land in a durable artifact. Conversation history is not a methodology.
The rule: record the learning where the next responsible reader will actually look. Scope clarifications, resolved decisions, and phase pre-conditions belong in the project document. Findings, evidence, severity, and re-audit history belong in the audit document. Reusable root causes and lessons belong in a lessons-learned register.
8.4 Re-audit Operating Bundle
When the Steward says “re-audit” after a coding cycle, the methodology treats that word as shorthand for a complete Verifier operating bundle: source verification, audit document update, Builder follow-up prompt update, learning capture, and verdict. The Builder follow-up prompt is Verifier-drafted but Steward-owned.
8.5 Saved Prompts and Pointer Invocation
Every prompt the Builder ever needs to execute must live in a durable project document. The Steward’s runtime instruction to the Builder is then a short pointer: file path, section heading, and “execute it.”
A Steward who copy-pastes a multi-hundred-line verbatim prompt into chat each cycle pays four costs at once: minutes of human time per round, drift between re-paste cycles, loss of section anchors, and a serial bottleneck where a step that should be mechanical becomes a manual relay through the most senior human in the loop. Saving the prompt at a known path eliminates all four costs in one move.
Two operating consequences follow. First, a prompt that lives only in conversation history is not part of the project; if a remediation or closure prompt is drafted in chat, it must land in the appropriate saved location before the Builder executes it. Second, when the Steward’s pointer message starts approaching the length of the saved prompt, the saved prompt is incomplete and must be revised, not the pointer. A pointer is three to eight short lines.
The pointer carries one piece of runtime information the saved prompt cannot encode: the Steward decision style, the rule governing how decisions are made during that execution cycle. The three styles are defined in §8.7.
8.6 Closure Reverification
Closure is not the moment when the starting checklist is pronounced complete. Closure is the moment when the work is re-read against the full arc of what happened: the Originator Brief, the approved specification, the decisions the Steward made, the fixes and corrections that changed the path, the audit findings that changed the definition of done, and the reusable lessons that should shape the next project.
Scope drift is not inherently failure. In the methodology’s best cases, the scope drifts because the originating need matured. The Originator sees more clearly; the Elevator adds expert structure; the Steward redirects; audits expose missing predicates. Closure reverification is where the team distinguishes successful maturation from accidental expansion.
The closure question is not “Did we do exactly what the first prompt said?” It is: “Does the final result serve the original purpose, as matured by the disciplined decisions and learnings produced during execution?”
8.7 Steward Decision Styles, Advisor Support, and Reversibility
There are three Steward decision styles:
Direct Steward. The Steward decides directly. This is the right style when the matter is purpose-level, irreversible, outside any preapproved envelope, or when no Advisor is available.
Advisor-Recommended. The Advisor analyzes the request, identifies options, names risks, cites the relevant source-of-truth lines, recommends a default, and gives the Steward a clear action set. The Steward still makes the decision: accept, revise, reject, ask another seat, or require direct human Steward review. No authority has been delegated in this style. When an Advisor is available, Advisor-Recommended is the normal minimum for non-trivial decisions.
Trusted Advisor. The Steward may trust the Advisor to answer on the Steward’s behalf within a bounded authority envelope. The Builder never asks the Advisor directly; the Builder asks the Steward, and the Steward process may answer through the Trusted Advisor. Trusted Advisor authority is temporary, scope-bound, and explicitly logged.
The unbreakable rule is reversibility: the Advisor can only make trusted decisions that can be reverted. If a decision has no clear rollback path, has unknown reversibility, is hard to reverse, changes the Originator’s purpose, changes public or legal commitments, deletes data, performs an irreversible production action, crosses the approved scope, or requires accountable judgment beyond the authority envelope, the Trusted Advisor must return Human Steward Required.
The authority envelope is the scoping artifact that makes Trusted Advisor decisions safe to take and easy to audit. Every Trusted Advisor envelope names eight things explicitly: the prompt, phase, remediation cycle, or closure scope it covers; the categories of decision the Advisor may answer; the categories it must not answer; the maximum acceptable blast radius; the required evidence standard for any decision inside the envelope; the rollback path required before a decision may be taken; the Steward notification requirement; and the point at which the human Steward must review the log before closure, audit clearance, release, or the next major scope transition.
Default decision actions. Every Steward (or Trusted-Advisor) decision resolves to one of eight canonical actions: Accept, Revise, Reject, Ask Originator, Ask Elevator, Ask Builder, Ask Verifier, or Human Steward Required.
Ask ___ routing. When the Steward process determines another seat must answer, the request is routed by the kind of judgment required: ask the Originator for purpose, preference, tolerance, business priority, lived need, brand posture, or satisfaction with the result; ask the Elevator for re-elevation, options, surfaced assumptions, structural coherence, or translation of a technical conflict into an Originator-readable choice; ask the Builder for implementation feasibility, cost, source constraints, technical alternatives, or what changed; ask the Verifier for evidence, finding validity, source-visible facts, test results, or audit status; return Human Steward Required when a decision is irreversible, outside the envelope, unsafe, purpose-level, or not confidently reversible.
The Decision Log is universal. Every non-trivial decision made during a run is recorded in a single durable Decision Log, regardless of who made it, when, or which decision style was used. A decision that lives only in conversation history is not part of the project.
Each Decision Log entry records twelve fields: trigger (with source-of-truth lines); requested by; decision style; authority envelope; options considered (at least the serious alternatives); choice (one of the eight canonical actions above); reasoning; affected artifacts; reversibility, exactly one of Reversible, Reversible-with-cost, Hard-to-reverse, or Irreversible; rollback plan; validation evidence; and Steward verdict, exactly one of Pending, Accepted, Revised, Rejected, Rolled back, or Human Steward Required.
Verifier audit gate on the Decision Log. The Verifier reads the Decision Log before reading the source. For Advisor-Recommended entries, the Verifier checks that a Steward verdict is recorded. For Trusted Advisor entries, the Verifier verifies three things before clearing: the decision stayed inside the authority envelope; the decision was in fact reversible (and matches the recorded reversibility class); and the rollback plan is specific enough to execute. Any failure on those three checks returns the entry to Human Steward Required. A Trusted Advisor decision without affected artifacts and a rollback plan is invalid by definition. Traceability without reversibility is not enough.
8.8 Applying the Framework Without Over-Inflating the Work
ELEV8 is not a command to make every task large. The practical application question is always: what is the smallest form of the method that protects the stakes?
For a low-risk request, the answer may be only a principles-layer pass: reframe the real job, enrich with professional priors, ask one or two steering questions, and act. For a moderate customer-facing deliverable, the answer may be a short Discovery Document with explicit assumptions, a checklist, and a brief independent review. For a multi-phase or trust-bearing system, the full protocol becomes justified: coextended specification, contract tests written before implementation, phase authorization, Decision Log, audit, re-audit, learning capture, and closure reverification.
The older drafts named this as a phase-size and ceremony budget. The current paper keeps the same idea under Calibrate: the method is successful only when its process weight is proportional to the cost of being wrong. A methodology that prevents shallow execution by creating performative bureaucracy has failed its own doctrine.
Part VIII: A Worked Example: Two Sentences and a Day That Lands
9. The Originator Brief (verbatim, two sentences)
“I want to throw a special party for my dad’s 70th next month. We’re a family that doesn’t really do big celebrations, but this one feels important.”
This brief contains directional intent and emotional weight. It does not contain a guest list, a venue, a budget, a meal plan, a tone, an invitation strategy, a timeline, or any criterion for what would make the day right. A faithful AI would propose a generic party plan with the standard checklist: cake, decorations, dinner, speeches. A faithful AI would be wrong, because the brief is not asking for a party. It is asking for something that honors a man, and a family, who do not do this often.
9.1 Discovery Document Excerpt (Elevator’s output)
WI-1: 70th Birthday Day Plan.
Reframe. The Originator’s surface request is “a special party.” The underlying job-to-be-done is creating a memorable, low-pressure experience that recognizes a milestone in a family that is not naturally ceremonial.
Enrich (professional baseline). A complete day plan covers: invitations, RSVP tracking, venue, seating layout, food and drink, dietary accommodations, accessibility for older guests, music or atmosphere, photography or video, a moment of recognition that does not turn into a long speech, a take-home gesture so guests remember the day, and a contingency for weather if any element is outdoors.
Enrich (domain evidence, tagged by claim type).
- Professional prior: For 70th birthdays in non-ceremonial families, an afternoon gathering tends to outperform an evening event. Older guests fade earlier, and afternoon light is friendlier for spontaneous photographs.
- Validation hypothesis: A photo wall of the honoree at different ages tends to start conversations between guests who do not know each other well. Worth testing against this family’s specific dynamics.
Align. The outcome the plan must serve is the honoree feeling recognized without being put on the spot, and the family having one shared memory of doing something the family does not usually do. Every choice is justified against those two outcomes.
Unify. The tone of the invitation, the venue, the food, the music, the speech moment, and the take-home gesture should feel like the same event.
Calibrate. This is a once-in-a-lifetime event for this family. Full planning depth is warranted. Not warranted: branded merchandise, choreographed entertainment, or anything that would make the honoree visibly uncomfortable.
Empower (steering questions for Originator). (1) Is the goal “everyone the honoree loves in one room” or “a few of his closest people for a longer afternoon”? Elevator’s prior: the latter, given the brief’s framing. (2) Does the honoree want a moment of recognition, or would he prefer no formal moment at all? Elevator’s prior: a brief, low-key toast from one person who knows him well.
Ground. If they do not normally do speeches, the plan does not assume three. If a sibling is the natural toast-giver, the plan does not assign the toast to the Originator by default.
Anticipate. (a) The honoree feels overwhelmed and retreats early. Mitigation: a quiet room is available. (b) Weather collapses an outdoor plan. Mitigation: an indoor backup is confirmed in advance. (c) A guest the honoree has a strained relationship with arrives. Mitigation: the Originator or a sibling agrees in advance to gently redirect.
Done Looks Like. The Originator records a post-event note that the honoree stayed through the planned recognition moment. At least one photograph captures him laughing. In a direct conversation after the event, the honoree says the celebration felt like the right kind of recognition.
9.2 Audit Finding (an independent review pass)
F1: Significant. The plan invites twelve people but does not account for the honoree’s brother, who has a strained relationship with the honoree. Including him without preparation risks a difficult moment on the day. Excluding him without explanation risks a worse one afterwards. The plan currently makes the choice by silence rather than by decision. Resolution required: the Originator, in consultation with the honoree’s spouse, should decide explicitly whether to invite the brother before invitations are sent.
The finding was findable because the Elevator’s Anticipate section had pre-named “a strained-relationship guest” as a foreseeable failure mode. The reviewer knew where to look. The right decision action is Ask Originator.
9.3 Compact Product Cut: A Signup Flow That Unlocks Revenue
A second compact example preserves a key lesson from the older public paper: ELEV8 is not only for large architectural documents. It can turn a vague commercial request into a value instrument.
Originator Brief. “We need the pricing page to produce more real trials.”
A shallow implementation might change button copy, add a prettier form, or count signup starts. An elevated specification asks what success means after the click:
- Pricing CTA click is recorded.
- Signup start is recorded.
- Account creation is recorded.
- Activation milestone is recorded.
- Abandonment points are captured separately from successful funnel stages.
- Failed verification and abandoned starts have recovery paths.
- No second account lifecycle is introduced outside the approved product model.
The audit finding that matters is not “the page looks better.” It is: if activation milestone is missing, the team can improve signup starts while still failing to create activated trials. ELEV8 turns a marketing request into a testable chain of value, with the same boundaries that protect larger systems: evidence, traceability, and a clear distinction between reversible implementation choices and trust-bearing commercial decisions.
Part IX: Phase Authorization and Independent Audit
10. Phase Authorization and Independent Audit
In multi-phase projects, a mandatory phase authorization loop sits between each phase. The loop exists because AI implementation agents are optimized for execution, not for the independent judgment required to determine whether what was built is truly complete before the next phase begins.
The seven-step authorization loop:
| # | Step | Actor | Output |
|---|---|---|---|
| 1 | Self-report completion | Builder | Completion checklist marked against the execution prompt |
| 2 | Independent verification | Verifier | Finding report read from the produced work, not its documentation |
| 3 | Severity classification | Verifier | Each finding tagged Critical / Significant / Minor |
| 4 | Resolution | Builder | Fix set addressing all Critical and Significant findings |
| 5 | Re-audit | Verifier | Updated audit document, source-verified finding statuses, Builder follow-up prompt, learning capture, and verdict |
| 6 | Steward review | Steward | Independent authorization decision against the original purpose, audit evidence, and Decision Log |
| 7 | Phase authorization | Steward | Explicit Phase N+1 approval, with no AI self-authorization |
Steps 3, 6, and 7 are where this protocol diverges from conventional automated quality processes. Tests and automated checks verify structural invariants. They do not verify whether the implementation serves the original purpose and human intent. That judgment belongs to the Steward process after independent verification.
Severity classes are enforcement boundaries: Critical (blocks authorization; security or data risk), Significant (blocks authorization; spec deviation), Minor (non-blocking; must resolve in the next phase’s first change set).
Phase size budget. Practical experience suggests roughly six work items as a working ceiling for a single phase; above that count, the audit pass tends to verify checklist marks rather than work-item prose, and silent under-delivery becomes meaningfully more likely. The number is heuristic, not a hard rule.
Part X: Applications Beyond Software
11. The Universal Principles and the Software-Native Machinery
ELEV8 was developed in a software-engineering context, but the eight principles describe what any skilled practitioner already does when serving a client well. They are not domain-bound.
The eight principles are universal. A contractor reframes “I want a bigger kitchen” into “you want more workspace and better flow.” A lawyer enriches a client’s brief with legal context the client could not have specified. An architect aligns rooms to how the family actually lives. An editor unifies a manuscript’s voice across chapters. A doctor calibrates treatment to severity. A good advisor in any field empowers the client to steer rather than silently overriding. A structural engineer grounds a renovation in what the existing building can support. An estate lawyer anticipates what happens after the client dies. Eight for eight. Not one of them requires software as context.
The formal machinery is mostly software-native. Spec coextension, contract tests, and the machine-executable specification standard rely on verifiable outputs the Verifier can inspect after the fact. For a teacher designing a lesson, a parent planning a milestone day, or a writer outlining a novel, the principles are valuable and the machinery is optional.
The most useful rule for any reader. When you ask an AI for something, do not ask only “do this for me.” Ask also: “What would a senior expert in this domain add that I didn’t think to mention? Propose it, and tell me why.” That single follow-up is the difference between getting a transcription and getting an elevation. It is the smallest possible application of Reframe + Enrich + Empower, and it works in every field where a person asks a capable assistant for help.
The universal pattern, in four steps. A person has a need. A creative expert elevates the need. The person steers the elevated proposal. The result is better than either shallow transcription or expert takeover could have produced alone. Every ELEV8 mechanism, the eight principles, the five seats, the Discovery Document standard, the protocol, is in service of this pattern. The mechanisms scale with the stakes; the pattern does not change.
Part XI: Where the Methodology Can Go Wrong
12. Failure Modes
The methodology is not self-policing. Every part of it can be misapplied in characteristic ways.
Elevator Failure Modes
Elevator fabricates domain evidence. An Elevator asked to enrich a brief can invent market facts, regulatory constraints, or “best practices” with fluent confidence (Huang et al., 2025). The mitigation is the Enrich evidence-boundary discipline: every domain-specific addition is tagged as professional prior, sourced fact, or validation hypothesis, and the Steward treats sourced-fact claims as requiring an actual source before approval.
Elevator silently overrides. The VAFS story is the canonical case. The Elevator has a valid professional prior but applies it without surfacing it. The mitigation is Empower made operationally explicit: gap-fills are presented as defensible priors with named reasoning, not as silent edits.
“Apt elevation” becomes scope expansion. The honest test is the second half of the doctrine: the depth of elevation must be bounded by the project’s actual stakes, not by the ambition of the elevation itself. The Steward demands an explicit justification of why each elevated element is at this depth.
Originator and Steward Failure Modes
Originator rubber-stamps an impressive but wrong specification. A rich, well-organized Discovery Document is psychologically harder to push back on than a sparse one. The Elevator’s Empower principle must surface real, narrowly framed steering questions; the Steward’s review must include adversarial pressure on the most fluent sections, not the least.
Steward bottleneck. The methodology routes every phase transition through one accountable seat. In a busy project, the Steward can become a serial choke point that delays implementation rounds by days. The mitigation is Advisor support with well-defined standing rules: a severity taxonomy that keeps Critical and Significant findings in front of the human Steward, authority envelopes that scope what the Advisor may decide, and a rollback-path requirement that makes trusted decisions explicitly reversible. With those rules in place, routine reversible decisions resolve quickly through the Advisor while purpose-level and irreversible decisions still reach the human Steward directly.
Process Failure Modes
Discovery Document becomes performative bureaucracy. A document standard worth defending can degrade into a document standard worth performing: boilerplate sections filled in for the sake of completeness. The Calibrate principle is the antidote: a small project should produce a small document.
Checklist drifts into a subset of the work item. A phase ships, every checklist line is marked complete, the Verifier finds nothing wrong, and the Discovery Document quietly under-delivers. The §7.3 coextension property, the §7.4 project-level spec audit, and the §8.1 escalation discipline exist specifically to prevent it.
Closure reverts to the starting checklist. A project can pass every local authorization check and still miss the final synthesis. The mitigation is closure reverification: the team asks whether the drift represents authorized maturation of the idea, whether the final outcome serves the original purpose better than the first brief could have specified.
Decisions live only in chat. The mitigation is the universal Decision Log: every non-trivial decision is recorded in one artifact with affected artifacts, reversibility, rollback plan, validation evidence, and Steward verdict.
The paradox of specification trust. Teams that trust each other most often specify least. Close collaborators assume shared context; long-term partners assume institutional memory. The methodology is paradoxically most valuable in high-trust environments, because trust is where specification rigor goes to die.
Part XII: Accountability, Evidence, and Context
13. Why Steward Accountability Remains With the Steward
The Elevator produces an elevated specification. The Verifier verifies facts in the produced work. Both seats can be occupied by AI because their outputs are artifact-bound and reviewable. The Steward’s success criterion is different: does what was built serve the original purpose? This question cannot be answered from the specification alone, because the specification approximates intent and never fully captures it. It requires accountable judgment rooted in the original purpose, not just a reading of a document about it.
Recent alignment research provides additional caution. Misaligned objectives can persist through safety training and surface as fluent, aligned-looking behavior (Hubinger et al., 2024), and frontier models have been documented adjusting their behavior strategically when they believe they are being evaluated (Greenblatt et al., 2024). ELEV8’s case for human-led Steward accountability rests on the lived-intent argument first, with current alignment findings as supporting context.
13.1 Current AI Context: Why More Capability Strengthens the Method
If models become better at coding, research synthesis, tool use, and agentic workflows, the value of a closed execution prompt and independent audit increases rather than decreases. Stronger execution makes a good specification more valuable; it also makes a shallow one more dangerous, because the system can now produce a larger, cleaner, more convincing version of the wrong thing.
Agentic AI is ELEV8’s clearest value case. As agents become more capable and more autonomous, the need for a human-governed specification contract that the agent executes against becomes more important, not less. The Decision Log, the spec coextension property, and the Steward decision styles are the project-layer mechanisms that make agentic work governable.
13.2 The Paradox of Specification Trust
Teams that trust each other most often specify least. Close collaborators assume shared context; long-term partners assume institutional memory; friends assume the brief “makes sense” because the relationship fills the gaps. Each of those assumptions is a specification gap, and each gap is a potential silent failure. ELEV8 is paradoxically most valuable in high-trust environments because trust is where specification rigor often goes to die.
13.3 Evidence and Method Provenance
This method was developed through repeated internal VirtuAmerica specification, implementation, audit, re-audit, and closure cycles: shared infrastructure design, portal and BackOffice product surfaces, public microsite work, documentation governance, audit remediation, and multi-phase product delivery.
The evidence base is operational: whether the method repeatedly produced clearer execution prompts, earlier discovery of specification gaps, fewer silent under-deliveries, and audit findings that could be traced to verifiable work. This paper does not claim statistical effect size, productivity lift, or model-accuracy improvement. Future public validation should include redacted case appendices, before/after specification comparisons, audit findings, time to acceptance, and cases where the method failed or added unjustified ceremony.
Until such validation exists, readers should treat the method as field-tested practitioner doctrine, not a general empirical result.
13.4 Scope Limitations
Model-quality dependency. The methodology is model-name-agnostic, but it is not model-quality-agnostic. It assumes the Elevator seat is filled by a deep-reasoning model capable of genuine domain research and synthesis. Weaker models produce weaker specifications regardless of instructions.
Domain boundary. Ground depends on accurate knowledge of a specific operating environment’s patterns, conventions, and invariants. First projects in an unfamiliar domain require more Steward correction; the cost amortizes as durable context accumulates.
Originator participation. The methodology cannot fully substitute for an Originator who refuses to engage with the Elevator’s output. Empower works only when the Originator is willing to react with “yes, exactly” or “no, missed it.”
Tooling dependency. Independent verification requires legible artifacts. Opaque, undocumented, or non-reproducible builds reduce the Verifier to reading a story instead of reading the work, and audits become shallower as a result.
AI risk-management boundary. ELEV8 is a specification and delivery discipline, not a replacement for organizational AI governance, model-risk review, privacy and security review, legal compliance, or formal model evaluation. Teams operating under regulated AI obligations should treat ELEV8 as compatible with frameworks such as the NIST AI Risk Management Framework (Tabassi, 2023; Autio et al., 2024), not as a substitute for them.
Adoption boundary. ELEV8 runs against the dominant developer-facing AI trend of fast build-review-iterate loops. That trend is useful when failure is cheap and feedback is immediate. ELEV8’s strongest adoption case is narrower and more serious: consequential, multi-phase, specification-quality-sensitive work where rework, customer trust, money, safety, or institutional memory make the cost of a shallow brief high.
Review-discipline dependency. A rubber-stamping Steward preserves the silent failure mode the methodology was designed to prevent. ELEV8 surfaces the right questions; it cannot make the Steward ask them seriously.
Cost of ceremony. Full protocol mode is heavy. It should be introduced where pain already justifies rigor, and the principles layer should remain the low-friction on-ramp. For genuinely low-stakes work, the right amount of methodology may be none.
False confidence risk. A well-structured Discovery Document can make uncertain work feel more certain than it is. The fluency of the artifact must not be mistaken for the soundness of the underlying assumptions. The evidence-boundary discipline in §5.2 exists specifically to keep this risk visible at the layer where it originates.
Conclusion
Prompt engineering is a valuable skill. It is not a solution to the specification gap.
That gap exists because the person with a need rarely has the full expertise to specify what the need demands. AI systems that faithfully execute shallow briefs accelerate the downstream work while leaving the upstream gap unaddressed, invisibly, because the clean output provides no signal that the brief was shallow.
ELEV8 closes this gap by making elevation a first-class professional obligation. The Elevator’s job is not to transcribe the Originator’s words. It is to raise the Originator’s idea through ELEV8’s eight principles to the level the project’s stakes actually warrant, so that the Originator’s job becomes steering rather than specifying. The five-seat model provides the structure. Steward authorization provides the quality control. The independent audit provides the verification. The Steward provides the accountable judgment that makes all of it serve the original intent rather than the document that approximated it.
The arc lets the methodology meet readers where they are. A solo founder at Level 1 applies Reframe and Enrich to a single prompt and is already practicing ELEV8. A multi-phase team at Level 5 runs the full Protocol because they have already lived the failure modes the Protocol prevents. The methodology is not a binary choice between heavy and light. It is a graduated discipline that earns each layer of rigor by showing the failure mode that layer is for.
But the deepest claim of this methodology is not operational. It is philosophical:
The quality of a creative expert’s output must not be bounded by the quality of the input they received.
And its disciplinary complement, without which the first degenerates into ambition theater:
The depth of a creative expert’s elevation must be bounded by the project’s actual stakes, not by the ambition of the elevation itself.
These are the standards that distinguish a senior practitioner from a junior one, a great designer from a competent one, a visionary product manager from a faithful transcriptionist. They are the standards that capable AI systems should be structured to meet. Not because the human deserves better prompting. Because the human deserves better collaboration.
Appendix A: Implementation Notes
The methodology is intentionally specified in terms of model classes (frontier reasoning, high-throughput implementation, independent verification), not specific model names, because frontier models change quickly.
| Seat | Class | Selection criteria |
|---|---|---|
| Elevator | Frontier reasoning model | Strong long-context reasoning, domain synthesis, research discipline, ability to expose assumptions and alternatives |
| Builder | High-throughput implementation | Reliable tool use, code or artifact editing accuracy, fast iteration, strong obedience to a closed execution prompt |
| Verifier | Independent verification reasoning | Different agent instance from Builder, preferably different model family; strong source inspection, skeptical reading, low deference to completion claims |
Three operating principles that make the assignment robust under model churn:
- Verifier ≠ Builder. Independence is the audit’s primary mechanism. The Verifier must be a different model family from the Builder when possible, and always a different agent instance.
- Elevator-class strength has leverage. A stronger Elevator tends to raise the quality ceiling of downstream phases. When budget is constrained, this methodology prefers investing in Elevator capacity over Builder or Verifier capacity.
- The seats are stable; the models are not. When a new model generation arrives, re-evaluate which model best fits each ecological niche.
Appendix B: Adoption Modes
| Mode | Use when | Minimum artifact |
|---|---|---|
| One-question lift | The task is small but vague | “What would a senior expert add that I did not mention, and why?” |
| Principles pass | The work matters but is still reversible | One-page elevated brief organized by the eight principles |
| Protocol mode | Multi-step, customer-facing, risky, or agent-mediated work | Discovery Document, Builder prompt, audit path, closure check |
| Institutional mode | Repeated teams or products need the method | Templates, roles, decision logs, training, and evidence packets |
The purpose of this ladder is not to weaken ELEV8. It is to keep it apt.
Upgrade trigger. Move from principles layer to protocol layer when the work includes handoffs, customer trust, money, security, irreversibility, multiple agents, or a meaningful cost of being wrong.
Appendix C: Glossary
- Advisor. Optional Steward-side support actor, not an ELEV8 seat; recommends decisions to the Steward or, when trusted, answers only inside a bounded reversible authority envelope.
- Apt Elevation. The governing principle: elevate the brief to the depth the project’s stakes warrant, neither shallower nor more inflated.
- Authority envelope. The bounded, written scope inside which a Trusted Advisor may answer on the Steward’s behalf; names the prompt/phase covered, the decision categories permitted, the categories excluded, the maximum blast radius, the required evidence standard, the rollback path, the Steward notification rule, and the human-Steward review point (§8.7).
- Ask routing. The Steward-process rule that decisions requiring another seat are routed by the kind of judgment required: Originator (purpose), Elevator (re-elevation), Builder (implementation facts), Verifier (evidence), or Human Steward Required (§8.7).
- Builder. Implementation seat; implements from the execution prompt, does not invent scope, and brings decision requests to the Steward rather than asking the Advisor directly.
- Builder follow-up prompt. Verifier-drafted, Steward-owned remediation prompt produced after an audit or re-audit when findings remain.
- Closure reverification. The final, calibrated check that the result satisfies the original requirement as matured by accepted scope evolution, decisions, fixes, corrections, and reusable lessons.
- Complexity arc. The five-level framing the paper uses to introduce mechanisms in proportion to the failure modes the reader has actually lived.
- Decision Log. Universal structured log appended during every run; one entry per non-trivial decision, recording trigger, requester, decision style, authority envelope, options, choice, reasoning, affected artifacts, reversibility, rollback plan, validation evidence, and Steward verdict. A decision that lives only in conversation history is not part of the project.
- Discovery Document. The software/project artifact variant of the Elevated Brief, produced by applying the ELEV8 principles.
- ELEV8 principles. Reframe, Enrich, Align, Unify, Calibrate, Empower, Ground, and Anticipate.
- Elevated Brief. The canonical ELEV8 artifact: a Steward-ready specification that captures the Originator’s need, applies the eight principles, identifies assumptions and risks, defines the right first version, and creates a basis for approval, building, and verification.
- Elevator. AI specification seat; transforms the Originator Brief into the Elevated Brief or Discovery Document.
- Faithfulness trap. AI failure mode of optimizing for accuracy to the Originator’s stated brief at the expense of completeness.
- Informed default. A probability-weighted Elevator choice that is defensible by reasoning and steerable by the Originator.
- Machine-executable specification. The quality standard an execution prompt must meet to be implemented without prior conversation context and audited against the approved phase scope.
- Originator. The person or group whose need gives the work its purpose. Not a stakeholder with requirements; a whole person with values, history, and something genuinely at stake.
- Originator Brief. The human’s first directional expression of need; paragraphs, not specs.
- Phase authorization loop. The seven-step protocol closing one phase and authorizing the next; only the Steward can advance.
- Phase size budget. Heuristic ceiling on the number of work items per phase, beyond which audits tend to verify checklist marks rather than work-item prose; roughly six in internal practice.
- Pointer invocation. Operating rule that the Steward’s runtime instruction to the Builder is a short pointer to a saved prompt, never a copy-pasted long prompt.
- Principles layer. Lightweight use of the eight principles; no seats, no formal protocol required.
- Professional prior. A defensible craft judgment stated as a steerable prior rather than as a sourced fact.
- Project-level spec audit. A lightweight Verifier pass run before Phase 1, verifying coextension, pre-implementation test failure, acceptance specificity, quantitative consistency, phase size, resolved decisions, and prompt completeness.
- Protocol layer. Full ELEV8 operating model for consequential work: five seats, Steward authorization, Discovery Documents, spec coextension, contract tests, phase authorization, Decision Log, and closure reverification.
- Practical wisdom. Calibrate’s governing standard: right action, in right amount, in right circumstance; the Cynefin-style fitness-for-purpose discipline that prevents both under- and over-elevation (§5.5).
- Recognition asymmetry. The operational premise that Originators can often judge, correct, or reject a rich proposed solution more effectively than they can produce it from scratch.
- Reversibility class. Required Decision Log field; one of
Reversible,Reversible-with-cost,Hard-to-reverse, orIrreversible. Trusted Advisor decisions are only valid when the class permits a defined rollback path (§8.7). - Silent failure mode. A specification that passes review because no one could see what it was missing.
- Silent override. Elevator failure mode in which a valid professional prior is applied without surfacing it as a choice, bypassing the Originator’s judgment.
- Spec coextension. The required property that a work item’s prose, its completion checklist, and its contract tests name the same set of deliverables, with matching counts.
- Steward. Accountable human or service-partner seat; reviews and approves the Discovery Document, writes execution prompts, and remains the sole authority on phase transitions.
- Steward decision style. Runtime parameter named in the pointer invocation: Direct Steward, Advisor-Recommended, or Trusted Advisor.
- Verifier. Independent verification seat distinct from the Builder instance; verifies the produced work against the Discovery Document, classifies findings by severity.
Appendix D: Discovery-Document Template
# [Project Name] Discovery Document
## Metadata
Project date: [...]
Status: Discovery | Planned | In execution | Closed
Originator: [...]
Steward: [...]
Elevator: [...]
Audience: [...]
Primary goal: [...]
Evidence class: professional prior | sourced fact | validation hypothesis
## Phase 0: Stakes
Prototype | Internal tool | Production system | Flagship surface
Revenue-facing? Customer-facing? Regulated? Reversible?
Cost of being wrong: low | medium | high
Calibrate note: [why the selected depth is warranted]
## Project Intent
WHY: [...]
WHAT: [...]
HOW: [...]
## Resolved Decisions
| Question | Decision | Rationale | Owner |
| --- | --- | --- | --- |
| [...] | [...] | [...] | Originator / Elevator / Steward |
## Phase Map
| Phase | Theme | Deliverables summary | WI count | Exit state |
| --- | --- | --- | --- | --- |
| Phase 1 | [...] | [...] | [...] | [...] |
## Phase Exit Contract
- Every deliverable has a source-visible or artifact-visible acceptance fact.
- Work-item prose, checklist, and verification criteria are coextensive.
- Contract tests fail against the pre-implementation state.
- Audit documents are owned by the Verifier and read-only for the Builder.
- Every Builder pointer names a Steward decision style.
- Every non-trivial decision is recorded in the Decision Log.
- Closure reverification happens before final closure.
## Phase [N] Execution Guide
### Pre-Conditions
- [ ] [...]
### Work Items
| WI | Title | Acceptance criterion | Dependencies |
| --- | --- | --- | --- |
| WI-[ID] | [...] | Done looks like: [...] | [...] |
### Hard Constraints
- Do not build: [...]
- Do not modify: [...]
- Required conventions: [...]
### Completion Checklist
- [ ] [...]
### Contract Tests (Elevator-authored, must fail before implementation)
- [...]
### Verbatim Execution Prompt
Implement Phase [N] only. Build [deliverables] and do not build [deferred scope].
IMPORTANT: Audit documents are READ-ONLY for the Builder.
If you find any conflict, ambiguity, or quantitative mismatch between prose, checklist,
verification criteria, or hard constraints, STOP and surface it to the Steward. Do not
silently optimize for the smaller scope.
Steward decision style: [Direct Steward | Advisor-Recommended | Trusted Advisor]
## Decision Log
| Trigger | Requested by | Decision style | Authority envelope | Choice | Reasoning | Affected artifacts | Reversibility | Rollback plan | Validation evidence | Steward verdict |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| [...] | [...] | [...] | [...] | [...] | [...] | [...] | [...] | [...] | [...] | [...] |
## Audit and Re-audit Operating Contract
When the Steward says "re-audit," the Verifier updates: summary, finding statuses,
evidence, re-audit log, Builder follow-up prompt, learning capture, and overall verdict.
## Closure Reverification
| Check | Status | Evidence |
| --- | --- | --- |
| Original requirement reviewed | Pass / Revise | [...] |
| Accepted scope evolution reviewed | Pass / Revise | [...] |
| Final result satisfies matured definition of done | Pass / Revise | [...] |
| Learning and validation records are current | Pass / Revise | [...] |
References
Alexander, C., Ishikawa, S., & Silverstein, M. (1977). A Pattern Language: Towns, Buildings, Construction. Oxford University Press.
Autio, C., Schwartz, R., Dunietz, J., Jain, S., Stanley, M., Tabassi, E., Hall, P., & Roberts, K. (2024). Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile. NIST AI 600-1. National Institute of Standards and Technology. https://doi.org/10.6028/NIST.AI.600-1
Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., Chen, C., Olsson, C., Olah, C., Hernandez, D., Drain, D., Ganguli, D., Li, D., Tran-Johnson, E., Perez, E., et al. (2022). Constitutional AI: Harmlessness from AI feedback. arXiv preprint arXiv:2212.08073. https://doi.org/10.48550/arXiv.2212.08073
Christensen, C. M., Hall, T., Dillon, K., & Duncan, D. S. (2016). Competing Against Luck: The Story of Innovation and Customer Choice. HarperBusiness.
Christiano, P. F., Leike, J., Brown, T. B., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems, 30, 4299–4307.
Greenblatt, R., Denison, C., Wright, B., Roger, F., MacDiarmid, M., Marks, S., Treutlein, J., Belonax, T., Chen, J., Duvenaud, D., Khan, A., Michael, J., Mindermann, S., Perez, E., Petrini, L., Uesato, J., Kaplan, J., Shlegeris, B., Bowman, S. R., & Hubinger, E. (2024). Alignment faking in large language models. arXiv preprint arXiv:2412.14093. https://doi.org/10.48550/arXiv.2412.14093
Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., & Liu, T. (2025). A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems, 43(2), Article 42, 1–55. https://doi.org/10.1145/3703155
Hubinger, E., Denison, C., Mu, J., Lambert, M., Tong, M., MacDiarmid, M., Lanham, T., Ziegler, D. M., Maxwell, T., Cheng, N., Jermyn, A., Askell, A., Radhakrishnan, A., Anil, C., Duvenaud, D., Ganguli, D., Barez, F., Clark, J., Ndousse, K., et al. (2024). Sleeper agents: Training deceptive LLMs that persist through safety training. arXiv preprint arXiv:2401.05566. https://doi.org/10.48550/arXiv.2401.05566
Kano, N., Seraku, N., Takahashi, F., & Tsuji, S. (1984). Attractive quality and must-be quality. Journal of the Japanese Society for Quality Control, 14(2), 147–156.
Mandler, G. (1980). Recognizing: The judgment of previous occurrence. Psychological Review, 87(3), 252–271.
Ohno, T. (1988). Toyota Production System: Beyond Large-Scale Production. Productivity Press.
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P. F., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744.
Reinertsen, D. G. (2009). The Principles of Product Development Flow: Second Generation Lean Product Development. Celeritas Publishing.
Rose, S. W., Borchert, O., Mitchell, S., & Connelly, S. (2020). Zero Trust Architecture. NIST Special Publication 800-207. https://doi.org/10.6028/NIST.SP.800-207
Sharma, M., Tong, M., Korbak, T., Duvenaud, D., Askell, A., Bowman, S. R., Cheng, N., Durmus, E., Hatfield-Dodds, Z., Johnston, S. R., Kravec, S., Maxwell, T., McCandlish, S., Ndousse, K., Rausch, O., Schiefer, N., Yan, D., Zhang, M., & Perez, E. (2023). Towards understanding sycophancy in language models. arXiv preprint arXiv:2310.13548. https://doi.org/10.48550/arXiv.2310.13548
Snowden, D. J., & Boone, M. E. (2007). A leader’s framework for decision making. Harvard Business Review, 85(11), 68–76, 149.
Stanford Institute for Human-Centered Artificial Intelligence. (2026). The 2026 AI Index Report. Stanford University. https://hai.stanford.edu/ai-index/2026-ai-index-report
Tabassi, E. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). NIST AI 100-1. National Institute of Standards and Technology. https://doi.org/10.6028/NIST.AI.100-1
Tulving, E. (1985). How many memory systems are there? American Psychologist, 40(4), 385–398.
Turner, C., & Eisikovits, N. (2026). Programmed to please: The moral and epistemic harms of AI sycophancy. AI and Ethics, 6, Article 168. https://doi.org/10.1007/s43681-026-01007-4
Note on currency. The non-AI anchors are mature references drawn from design, product development, cognitive psychology, organizational decision-making, security practice, and risk management. The AI references are selective rather than exhaustive: they anchor the paper’s RLHF, sycophancy, hallucination, alignment-faking, model-churn, and responsible-AI claims as of this revision. The same §5.2 claim-type taxonomy (professional prior, sourced fact, validation hypothesis) applies equally well to the AI literature itself.
© 2026 VirtuAmerica LLC. ELEV8 is developed and maintained by VirtuAmerica as part of its engineering governance system. Publication date: 2026-05-13; version v6.6.