The migration plan said we needed four months of factory tuning to handle the Opus 4.7 model change. We needed zero.
Our agentic coding factory runs on Claude Code. It has 29 specialist agents organized into seven categories. 37 helper scripts for automation. A 17-phase feature development workflow with five hard-coded approval gates. 23 slash commands. Every phase produces an artifact, and a read-back verifies that artifact before the next phase starts. Any gate can block a ship.
Anthropic released Opus 4.7 on April 14, 2026. Our account got provisioned two days later. We wrote a migration plan before switching models. It listed five steps covering tokenizer re-baselining, agent schema updates, prompt trimming, task budgets beta adoption, and cost observability. Most steps would take weeks.
We did none of them. We switched to Opus 4.7 xhigh and asked the factory to build a feature.
The feature shipped in 12 hours. Nothing broke.
What Shipped
The feature slug was peer-benchmark-page. It added a sixth page to our security scanner's underwriting PDF report. The page shows where a company's security posture ranks against industry peers, with a gap analysis section explaining how to improve the rank.
Complex tier by our classification. 22 tasks broken into four waves. The feature touched four surfaces at once: a new page in the underwriting PDF, a new panel in the broker portal, a conversion widget on the public scan results page, and one new database migration. It also made real external API calls to Perplexity for industry resolution, with privacy constraints on cross-tenant data aggregation.
End state after 12 hours:
- 1,744 backend tests passing, up from 1,649 before the feature started
- 14 frontend tests passing with TypeScript clean
- Two CI fixes landed during the ship (missing pypdf dependency, corrupt venv on self-hosted runner)
- PR #21 merged as commit
7dec4efto main - Vercel and Render auto-deployed
The factory did all of this without changing its prompts or its governance rules. No agent definition was touched either.
What We Expected to Break
The pre-ship migration plan named three behavior risks that Anthropic's release notes called out. We expected each to require defensive factory changes.
Risk 1: Fewer subagent spawns by default
Anthropic's 4.7 release notes said the model would prefer reasoning over delegation. Our factory depends on delegation at every phase. If the orchestrator stopped spawning the analyst or the architect, the whole workflow would collapse.
It didn't happen. Every phase delegated correctly. 52 subagent spawns total by our count. No delegation language changes required.
Risk 2: Fewer tool calls by default
The release notes said the model would reason rather than use tools. Our agents depend on tool use for real work. The analyst needs to read the brief. The developer needs to read existing code before editing it.
It didn't happen. During Phase 1, the analyst ran a live API call against Hunter's /v2/domain-search endpoint to verify our assumption that it returned industry data. The response didn't contain industry. This empirical catch triggered a user-approved design pivot from Hunter to Perplexity as the layer-3 industry resolver. That's the opposite of "prefers reasoning over tool use."
Risk 3: Existing prompts need trimming
We budgeted a broad audit of feature-dev.md and the 29 agent prompts. We were ready to remove reminders and format-demonstration examples that no longer earned their tokens. We were also ready to strip defensive prompting written for 4.6 quirks.
No evidence any of it was needed. 4.7 followed the existing prompts as written. The prompts weren't broken.
60% of the migration plan got deprioritized. The fears were theoretical.
What Actually Surfaced
Two real issues appeared during the ship. Both had been there all along. The migration observation made them visible. Neither was caused by the model change.
Issue 1: Session log drift
The orchestrator writes PRDs, architectures, task lists, and test specs during each phase. It updates sessions/{date}.md later, in batched catch-ups. During peer-benchmark-page, the session log went 10 hours without updates while three phases completed. The trailing "Next step" pointer stayed stale even after the batched catch-up caught everything else up.
This is an orchestrator prompt issue. It would happen on 4.6 too. We shipped a one-line fix requiring the session log update before each phase delegation. Halt-before-delegate if the log is stale.
Issue 2: Architect scope gap on promote-to-public tasks
When the architect designs a task that moves a private function to a public utility module, it greps for callers of that function. It missed a lazy inline import inside api/app/portal/underwriting.py, where the import sat inside a function body rather than at the top of the file. The developer caught it at Phase 6 via our DONE_WITH_CONCERNS protocol and fixed it surgically. The pattern got flagged for prevention.
This is an architect prompt issue. It would happen on 4.6 too. We shipped a one-line fix requiring a full grep for every reference, including lazy imports, when promoting private functions.
Neither fix touched model behavior. Both fixes are pure prompt discipline.
Why the Factory Held
The factory has a lot of gates. Approval gates at Phases 1, 4, 6.5, 7, and 9. Read-back verification after every subagent edit. Fresh-agent-per-file refactor review in Phase 6.5. A DONE_WITH_CONCERNS status subagents use when they find a problem the architect didn't anticipate. A dress rehearsal phase that runs the feature against production data before ship.
A lot of workflows treat gates as ceremony to minimize. Our factory treats gates as the product. The friction is load-bearing.
Jaime Teevan, Microsoft's Chief Scientist, has built a research program around what she calls positive friction: AI that deliberately slows people down at the right moments to make them surface assumptions and keep thinking, rather than handing over an answer that removes the thinking. In a June 2025 conversation with Scott Galloway on staying sharp as machines take over the hard thinking, she named the cost directly.
"AI increases the metacognitive demand."
Jaime Teevan, Chief Scientist, MicrosoftEvery gate in our workflow exists because a past failure required it. The Phase 1 craftsmanship self-audit caught four shortcut recommendations last week where the analyst's PRD would have scoped down by default. Phase 6.5 deep refactor catches smells Phase 6 developers can't see because they're deep in the current file. DONE_WITH_CONCERNS caught the lazy-import scope gap two phases before it would have caused a test failure. Dress rehearsal once caught a batch page crash on real data that had passed every other test.
When the model changes underneath this kind of workflow, the workflow doesn't depend on the model doing anything clever. It depends on the model following structured prompts and respecting gates. Both are durable properties across Claude 4.6 and 4.7.
What This Means for Anyone Building Agentic Systems
The lesson from this migration isn't about Opus 4.7 specifically. It's about what kind of workflow survives model changes.
If your workflow depends on the model being unusually smart at inferring intent or unusually disciplined about delegation, then every model change is a migration project. New model quirks surface. Old hacks stop working.
If your workflow instead depends on the model following structured gates and writing artifacts that get read back and verified, then model changes become mostly neutral. The structure catches whatever the model does differently.
Our factory took several months to build. It has 29 agents, 17 phases, 37 helper scripts, and roughly 40 accumulated craftsmanship rules encoded in prompts. A competitor couldn't replicate it in a month because most of the rules exist to prevent specific past failures, and those failures haven't happened in their workflow yet.
The Opus 4.7 migration dry run showed us the factory absorbs model change. That's what the gates are for.
It all traces back to one principle. Structure beats cleverness. Documented gates beat implicit discipline.
The factory survived 4.7. It'll survive 4.8 too.
Building agents that have to survive production?
We design agentic workflows around gates, read-back verification, and an audit trail, so a model swap stays a non-event. Book a scoping call and we'll map where your workflow breaks first.
Sources
- VKTR. "AI Reality Check: Microsoft's Chief Scientist on the Future of Work" (2025), reporting on Jaime Teevan's June 12, 2025 conversation with Scott Galloway.
- First-party records: the factory's git history, PR #21, the
peer-benchmark-pagesession log, and.claude/lessons.md. Commit hashes, test counts, and spawn counts trace to those records.