The Model Is Only 10%: The Real Lesson of the New SDLC

📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

A recent whitepaper emphasizes that AI models account for only 10% of system behavior; the majority depends on how developers engineer the surrounding infrastructure. This shift impacts AI development strategies significantly.

A new Google whitepaper titled ‘The New SDLC With Vibe Coding’ states that the AI model itself constitutes only about 10% of the system’s behavior, with the remaining 90% determined by the surrounding harness, verification, and context engineering. This challenges common assumptions about AI development and suggests a strategic shift for organizations adopting AI tools.

The whitepaper, authored by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, emphasizes that the dominant factor in AI system performance is how the AI is configured, tested, and integrated. It argues that many failures attributed to models are actually caused by misconfigured harnesses, missing tools, or poor context management. Concrete experiments cited include a coding agent that improved performance by 13.7 points solely through tweaks to prompts and middleware, with the model unchanged.

The paper introduces the concept of agentic engineering, where AI is embedded within a structured framework of rules, tools, and verification processes, contrasting with ‘vibe coding,’ which relies on quick prompts and minimal oversight. The authors highlight that cost-efficiency and reliability are driven by investment in harness and context engineering, not just model improvements.

At a glance
reportWhen: published early 2026
The developmentThe Google whitepaper ‘The New SDLC With Vibe Coding’ highlights that the true engineering challenge is in harnessing and verifying AI, not just selecting or improving the model itself.
The Model Is Only 10% — The New SDLC With Vibe Coding
AI Dispatch · Field Notes
Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified
Vibe Coding
Casual prompts · “does it seem to work?” · disposable code · high risk
Structured AI-Assisted
Detailed prompts + constraints · manual testing · features in real codebases
Agentic Engineering
Formal specs · automated tests + evals + CI gates · production scale · low risk
Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.
The idea worth building your strategy around
Agent = Model + Harness
~10%
HARNESS — prompts · tools · context · hooks · sandboxes · observability
MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S
Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.
“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.
The economics: it’s a token-cost problem (CapEx vs OpEx)
Vibe Coding
Low CapEx · High OpEx
Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.
Agentic Engineering
High CapEx · Low OpEx
Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.
85%
of devs use AI coding agents (51% daily)
41%
of all new code is AI-generated
~90%
of agent behavior is the harness, not the model
+19%
longer on some tasks (METR) — verification is the cost
The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.
thorstenmeyerai.com

Impact of Harness and Context Engineering on AI Success

This insight shifts the focus for AI teams from constantly chasing the latest model to investing in building robust scaffolding, tools, and verification processes. Organizations that master harness and context engineering can achieve better, more reliable AI outcomes at lower costs, avoiding the pitfalls of superficial model upgrades.

For decision-makers, this means reevaluating AI budgets and strategies, prioritizing infrastructure and process improvements over model subscription costs. The approach can lead to significant savings and more predictable AI performance, especially as models evolve rapidly.

AI Model Risk Blueprint: Model Validation Testing | Ethical Considerations in AI Models | Integrating AI with Business Risk Plans | Real-World AI Model ... Strategies | AI Governance Tools & Resource

AI Model Risk Blueprint: Model Validation Testing | Ethical Considerations in AI Models | Integrating AI with Business Risk Plans | Real-World AI Model … Strategies | AI Governance Tools & Resource

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Evolution of AI Development Practices and New SDLC Framework

The whitepaper builds on the ongoing shift in AI development from vibe coding—quick prompts and minimal oversight—to agentic engineering, which involves structured, verified workflows. As of early 2026, 85% of professional developers use AI coding agents regularly, with 51% doing so daily. The focus has increasingly moved toward how AI is integrated into systems, rather than the AI models themselves.

This development follows broader industry trends of emphasizing verification, tooling, and structured context to ensure AI reliability and cost efficiency. Previous practices prioritized model access, but recent experiments show that tuning the surrounding infrastructure yields more substantial performance gains.

“The behavior you experience in AI tools is dominated by the scaffolding you build around the model, not the model itself.”

— Addy Osmani

The Agentic AI Handbook: Design Patterns, Frameworks, and Tests for Real-World LLM-Powered Agents

The Agentic AI Handbook: Design Patterns, Frameworks, and Tests for Real-World LLM-Powered Agents

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Unclear Aspects of Implementation and Industry Adoption

While the whitepaper provides compelling evidence and experiments, it remains to be seen how quickly organizations will adopt this framework at scale. Specific best practices for harness design, context management, and verification are still evolving, and industry-wide standards are not yet established. Additionally, the long-term impact on AI model development and pricing strategies is still uncertain.

Ai Automation Kit PLC Programming Software, Logic Function HMI, Run Simulator

Ai Automation Kit PLC Programming Software, Logic Function HMI, Run Simulator

1 PLC Controller

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Next Steps for AI Developers and Organizations

Organizations should evaluate their current AI workflows to identify opportunities for improving harness and context engineering. Developing standardized tools, frameworks, and best practices for verification and configuration will be critical. Industry groups may also begin to formalize guidelines based on these insights, accelerating the shift toward more structured AI development. Monitoring emerging case studies and benchmarks will help refine these strategies further.

AI-Native Software Delivery: Proven Practices to Produce High-Quality Software Faster

AI-Native Software Delivery: Proven Practices to Produce High-Quality Software Faster

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is the model only 10% of the system’s behavior?

According to the whitepaper, the model itself provides the core generative capability, but the overall system performance depends heavily on how it is configured, tested, and integrated through scaffolding, tools, and verification processes.

How does this shift affect AI development costs?

Investing in harness and context engineering may have higher upfront costs but leads to lower marginal costs, better reliability, and reduced long-term expenses compared to frequent model upgrades or ad-hoc prompting.

What is agentic engineering?

Agentic engineering involves embedding AI within a structured framework of rules, tools, and verification processes, enabling more reliable and cost-effective AI systems than vibe coding approaches.

Will this change how AI models are developed or priced?

While the whitepaper suggests a focus shift away from model improvements toward system engineering, the long-term impact on model pricing and development strategies remains to be seen as the industry adapts.

What should organizations do now?

Organizations should start assessing their AI workflows, invest in developing robust harnesses, improve context management, and establish verification protocols to maximize system reliability and cost-efficiency.

Source: ThorstenMeyerAI.com

You May Also Like

The cleaner cap table. Why Anthropic’s public-benefit structure dodges OpenAI’s charitable-trust problem — and trades it for a governance question of its own.

Analysis of how Anthropic’s mission-oriented trust structure avoids OpenAI’s conversion issues, yet introduces new governance challenges for public markets.

Search as Code: Perplexity Is Right About the Future — Just Not First to It

Perplexity introduces Search as Code (SaC), enabling AI agents to build custom retrieval pipelines, promising higher accuracy and efficiency in search tasks.

Meta to sell excess AI computing capacity via cloud business, Bloomberg News reports

Meta plans to sell surplus AI computing capacity through its cloud business, according to Bloomberg News, signaling a new revenue stream from its infrastructure.

VigilSAR Benchmark: There Is No Best Model

The VigilSAR Benchmark reveals there is no universally best AI model, emphasizing context-specific rankings based on capability, reliability, and deployability.