📊 Full opportunity report: Engineering Is Automated. Research Is the Residual. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
AI systems have achieved near-complete automation of engineering tasks, according to recent benchmarks and expert analysis. However, research remains less automated, with uncertainties about how quickly this gap will close.
Recent evidence and expert analysis confirm that AI systems now automate the majority of core engineering tasks, with benchmarks reaching saturation levels. However, the automation of AI research remains incomplete, leaving open questions about how quickly this will change and what it means for the future of AI development.
Multiple AI benchmarks, including CORE-Bench and MLE-Bench, show rapid progress toward near-complete automation of engineering-relevant skills. For instance, CORE-Bench, which measures research reproduction, has reached a 95.5% success rate, with some experts declaring it ‘solved.’ Similarly, MLE-Bench, assessing Kaggle competition performance, has seen scores rise to levels comparable with mid-tier human practitioners, prompting the benchmark organizers to pause new submissions.
These developments suggest that the engineering side of AI research—reproducing experiments, optimizing kernels, and managing infrastructure—is increasingly handled by AI systems. The pattern across multiple independent benchmarks indicates a saturation point, where further measurable improvements are limited by the benchmarks themselves.
In contrast, the ability of AI to conduct original research—formulating hypotheses, designing experiments, and generating novel insights—remains less automated. While some progress is evident through recent papers on kernel design and model optimization, experts note that research at scale might itself be a form of engineering, potentially reducing the residual gap faster than anticipated.
Thorsten Meyer, analyzing Clark’s recent work, emphasizes that the structural question is whether research is inherently a form of engineering at a larger scale, which could accelerate automation beyond current expectations. The key uncertainty is how much of research can be automated and how soon, given the current trajectory.
Engineering is automated.
Research is the residual.
Six skill benchmarks. Edison’s framing. The question Clark leaves open is whether research is just engineering at scale.
Jack Clark’s Import AI #455 catalogs six benchmarks measuring AI capability on AI R&D tasks and concludes “AI can today automate vast swatches, perhaps the entirety, of AI engineering.” The residual question is research. The structural read on the residual: it may not be a permanent moat.
Six skills. One trajectory.
Clark catalogs six benchmarks measuring AI capability on AI R&D-relevant tasks. Each individual benchmark could be noise. Six benchmarks moving together is a curve. The pattern is the cascade observed across the broader Clark series — visible here in the specific R&D-skill domain.
AI engineering automation software
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three data points. Mixed signal.
Clark provides three data points on the creative-spark question. Yes-evidence: Erdős-1051, centaur math discovery, sporadic Move-37-style moments. No-evidence: low yield, framing dependence, absence of acceleration. The mixed signal is the honest read.
The data supports two readings. Pessimistic: rare moments suggest creative insight is qualitatively distinct from engineering work. Optimistic: rare moments are an artifact of low-volume exploration; more shots on goal yields more discoveries. Both readings are consistent with Clark’s “vast swatches, perhaps the entirety” claim. They differ on the residual.

AI Tools for Finance and Accounting Professionals: Automate Tasks, Save Hours, Work Smarter
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five dimensions Clark gestures at but leaves underdeveloped.
Clark’s section is rigorous on the empirical evidence. Five strategic dimensions matter for the institutional response that the Clark series synthesis argues is structurally inadequate.
AI experiment management platform
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Two readings. Different equilibria.
The structural question Clark leaves open: is research a permanent moat that bounds automated AI R&D, or is it engineering at scale that dissolves with more shots on goal? Both readings are consistent with the current data. They differ by orders of magnitude in consequences.
Productivity multiplier years
Recursive loop operational
AI research hypothesis generator
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five audiences. Asymmetric cost of being wrong.
The institutional response should not bet on inspiration being a permanent moat. If the distinction holds, capacity built is still useful. If it closes, capacity is necessary. Asymmetric cost-of-being-wrong points toward building now.
IN INDUSTRY
IN ACADEMIA
POLICYMAKERS
INVESTORS
EVERYONE ELSE
Engineering is automated. The residual is the question. The institutional response should not bet on inspiration being a permanent moat.
Implications of Engineering Automation for AI Development
The rapid automation of engineering tasks suggests that future AI development could become more efficient and less dependent on human intervention for routine research activities. This shift could accelerate innovation cycles, reduce costs, and reshape the organizational structures of AI labs and companies. However, the slower progress in automating research itself raises questions about the limits of current AI capabilities and whether breakthroughs in research automation are imminent or will require fundamentally new approaches.
Understanding this divide is crucial for policymakers, industry leaders, and researchers planning the next decade of AI development, investment, and regulation. The potential for AI to handle engineering at scale could also influence the pace of technological progress and the distribution of AI-related expertise globally.
Recent Advances in AI Engineering Capabilities
Over the past 18 months, multiple benchmarks have demonstrated AI’s rapid progress in core engineering skills. CORE-Bench, which tests AI’s ability to reproduce research experiments, has seen a leap from 21.5% to 95.5%, with some experts claiming it is ‘solved.’ Similarly, Kaggle competition performance has improved from 16.9% to 64.4%, nearing professional-level results. These benchmarks have been complemented by ongoing research papers showcasing advancements in kernel design, GPU optimization, and infrastructure automation, indicating that AI is transitioning from experimental to production-grade capabilities.
This pattern of rapid progress aligns with the broader trend of AI systems handling increasingly complex engineering tasks, reducing the need for human intervention in routine and even advanced engineering activities. However, the capacity of AI to conduct original research—such as formulating novel hypotheses or designing entirely new experiments—remains less clear and is the subject of ongoing investigation.
“The pattern across multiple benchmarks indicates a saturation point where AI can automate most engineering tasks, but research automation is still emerging.”
— Thorsten Meyer
Unconfirmed Limits of Research Automation
It remains unclear how much of AI research—such as hypothesis generation, experimental design, and theoretical innovation—can be automated in the near term. While progress in engineering tasks is evident, the pace at which research automation will catch up is uncertain. Some experts suggest that research may itself be a form of large-scale engineering, which could accelerate automation, but definitive evidence or timelines are lacking.
Next Milestones in AI Engineering and Research
In the coming months, researchers and industry will likely focus on pushing the boundaries of research automation, potentially through new benchmarks or pilot projects. There is also anticipation of further improvements in existing engineering capabilities, possibly reaching a plateau or prompting the development of more sophisticated measurement tools. Monitoring these developments will be essential to understanding how quickly AI can fully automate the entire research cycle.
Key Questions
What does it mean that engineering is now automated?
It means AI systems can now handle most core engineering tasks, such as reproducing research, optimizing kernels, and managing infrastructure, with minimal human input, significantly reducing manual effort.
Why is research automation still uncertain?
Because research involves creative and theoretical work that is harder to replicate with current AI systems, and there is limited evidence on how quickly these capabilities will develop.
How might this impact AI development in the future?
If research automation accelerates, it could lead to faster innovation cycles and reduce costs, but it also raises questions about the future role of human researchers and the nature of scientific discovery.
Are there any risks associated with automation of engineering tasks?
Potential risks include over-reliance on AI systems, reduced human oversight, and challenges in verifying AI-generated results, which could impact reliability and safety.
Source: ThorstenMeyerAI.com