SECTION 01 / OPENING
Notes on harness engineering
Where reliable AI agent capability actually comes from — and how to build it.
In engineering, AI capability does not live in the model alone. It is distributed across tools, verifiers, control flow, and the design of the operating environment. The Harness is a working notebook on that problem — writing from the intersection of agentic AI and architecture, engineering, and construction.
SECTION 02 / TOPICS
Harness Engineering
The tools, verifiers, orchestration, and process design that make agentic work reliable. Not the model — the system around it.
3 articles 02Agent Evaluation
Benchmarking agents on real engineering tasks. What to measure, how to measure it, and why most evals miss what matters.
2 articles 03AI in AEC
Applying agentic AI to architecture, engineering, and construction — where the work is instance-bound, constraint-heavy, and intolerant of generic answers.
5 articlesSECTION 04 / RECENT
- aec The Harness Is All You Need Why domain-specific agent harnesses, not bigger models, are what close the AI performance gap on real engineering tasks — and why the AEC industry needs proper benchmarks to prove it.
- autoresearch What If the Harness Could Improve Itself? Applying the autoresearch pattern to self-improve an engineering agent harness. Automated prompt optimisation across HVAC audit tasks on Claude and GPT-4.1-mini, showing how harness engineering compounds when the improvement loop runs itself.
- hvac-ai Benchmarking Agents on Real Engineering Work Is Already Teaching Us Something Important Benchmarking AI agents on real HVAC engineering tasks across Claude and GPT models. Results on harness-dependent capability, agent evaluation design, and why AEC-domain benchmarks reveal what general benchmarks miss.
- engineering-ai Where Capability Actually Lives in Agentic Engineering In AEC and domain-specific engineering, AI agent capability lives not in the model alone but in harness engineering — the tools, verifiers, orchestration, and process design that make agentic work reliable.