Under the Hood of Microsoft Cowork

Seven Patterns Anthropic Just Showed Us

Apr 21, 2026

For the first time, we can read the source code of the layer Microsoft Cowork runs on.

Anthropic has unbundled the agentic AI stack into three licensable layers: the model, the harness, and the application. Microsoft has licensed the middle two. Until three weeks ago, the harness was a black box. Now it isn’t. Here is what 512,000 lines of TypeScript tell us about where Microsoft Cowork is going, and why the architectural pattern language matters more than the model choice.

View all my published articles

The Three-Layer Stack

Most coverage of modern agentic AI still treats the model as the product. That framing is a year out of date. Look at how the serious labs ship agentic AI in 2026 and you see three distinct layers, each licensable on its own terms.

The bottom layer is the model. Claude Opus 4.7, GPT-5.2, Gemini 3 Ultra. This is the part every analyst benchmarks and every procurement team interrogates. It is also the part where the differentiation gap is narrowing fastest.

The middle layer is the harness. This is the agentic runtime that wraps the model. It is the while-loop over tool calls, the context compaction pipeline, the permission gates, the memory system, the sub-agent orchestrator, the MCP integration layer. Anthropic’s version of this middle layer is exposed to customers as the Claude Agent SDK. Until March 31, most people outside of Anthropic had no real sense of how deeply engineered this layer actually is.

The top layer is the application. Claude Code for developers, Claude Cowork for knowledge workers, and now Microsoft’s own Copilot Cowork built on licensed Anthropic primitives. The application layer is where the brand and the workflow context live.

Microsoft has done something interesting with this stack. They are buying the bottom two layers from Anthropic (as a subprocessor, with all the compliance plumbing that implies) and building the top layer themselves inside the Microsoft 365 trust boundary. In Microsoft’s own words, they have integrated “the technology behind Claude Cowork” into Copilot Cowork. That is product marketing language for “we licensed Anthropic’s model and SDK and wrote our own orchestrator on top.”

The Claude Code source, now public whether Anthropic likes it or not, gives us our first real look at what that middle layer actually contains. Not the sanitized developer docs. The production code.

Why the Middle Tier Is the Interesting One

Here is the punchline from the analysis of the leaked codebase: the agentic loop itself is about twenty lines of code. It is a while-loop over tool calls, with message history as the core data structure. That is not where the engineering lives.

The engineering lives in everything wrapped around that loop. Context management. Permission systems. Memory compaction. Tool schemas. Error recovery. Sub-agent orchestration. All told, roughly 512,000 lines of TypeScript across 1,906 files, just to make a language model behave reliably inside a bounded environment for longer than five minutes.

If you are a technical leader evaluating agentic AI for your enterprise, this is the insight that should change how you think about the procurement decision. Model choice is becoming commodity. Harness choice is not. The harness determines whether your agents can run for hours without context rot, whether they can safely execute privileged operations without a human in the loop, whether they can remember what they learned last week, and whether they leave an audit trail your compliance team will accept.

Here is what the source tells us the production-grade harness actually does.

Seven Patterns That Define the Middle Tier

1. Memory as hint, not truth

The source reveals a three-tier memory architecture that deliberately rejects the RAG-everything approach most enterprise agents ship with today. At the core is a file called MEMORY.md, a lightweight index of pointers, roughly 150 characters per line, perpetually loaded into every prompt. This index does not store data. It stores locations.

Actual project knowledge lives in separate topic files fetched on demand. Raw transcripts are never fully reloaded into context; they are grep’d for specific identifiers. Critically, the agent is instructed to treat its own memory as a hint, not as ground truth. It must re-verify any cached fact against the primary source before acting on it.

If you come from clinical informatics, this pattern will feel immediately familiar. It matches how experienced clinicians actually reason: cached knowledge is always provisional until reconfirmed against the patient in front of you. For any compliance-sensitive deployment, the memory-as-hint pattern is the correct starting point. The alternative, which most enterprise agents still ship, is a confident agent with stale assumptions. That is not a posture you want inside a regulated workflow.

2. autoDream, or what happens while the agent is idle

The source revealed a background subsystem called autoDream, modeled explicitly after REM sleep in biological systems. It runs every 24 hours or on demand via a /dream command, and it operates in four phases. Pruning removes outdated or contradictory entries. Merging combines duplicate fragments and unifies different phrasings of the same idea. Refreshing updates stale information and re-weights importance. Synthesis compiles recent learnings into structured memory files with new indexes for faster retrieval.

The subtle and somewhat unsettling detail: autoDream rewrites tentative observations as assertions once enough supporting evidence accumulates. “This function might handle authentication” becomes “this function handles authentication.” Hedging language gets erased from the agent’s own memory. There is no human approval step in this loop.

For regulated industries, this is simultaneously the most exciting and most governance-relevant feature in the entire harness. An agent that can consolidate institutional knowledge between sessions is a step-change in capability. An agent that can silently upgrade guesses to facts is a step-change in risk. Any enterprise deployment will need a policy posture on this one, and I suspect the first wave of enterprise-ready autoDream implementations will include a review queue the human actually has to sign off on before provisional facts get promoted.

3. KAIROS, the daemon that decides when to act

KAIROS is referenced more than 150 times in the source. It is not yet publicly enabled, but it is clearly finished code behind a feature flag. The Greek root is deliberate: kairos means the opportune moment, contrasted with chronos, sequential time. The agent does not run on a schedule. It decides when to engage based on context.

Architecturally, KAIROS is an always-on background daemon. It outlives individual conversations. It receives periodic tick prompts and autonomously decides whether to act. It has a 15-second blocking budget to prevent any single decision from monopolizing system resources. And here is the audit-friendly detail: all of its actions are written to an append-only log that the agent itself cannot erase.

This is the move from reactive chat to autonomous agent. The append-only audit trail is the compliance-safe version of that autonomy. Any CISO evaluating agentic AI should understand that this is the direction the frontier is heading, and that the audit-log primitive already exists in production code at Anthropic. Microsoft’s equivalent will live inside Copilot’s existing auditing and data loss prevention boundary. If you are building governance policy now, the pattern to encode is “autonomous action is fine, silent action is not.”

4. Tool-call orchestration and sub-agent forking

We already covered the headline: the loop is trivial, the harness is not. Where it gets interesting is sub-agent orchestration. Claude Code can spawn sub-agents, but it does not do so through a fancy orchestration framework. Sub-agents are just another tool call in the registry. The AgentTool is a tool like any other.

When the primary agent forks a sub-agent, it creates a byte-identical copy of the parent context so they share the KV cache. Sub-agents process only their unique instructions, not the entire shared context. Parallelism becomes nearly free in token cost. This is the mechanism that makes multi-agent workflows economically viable at scale, and it is the single most important economic insight in the entire leak. Most enterprise agent frameworks today do not share cache across sub-agents, which is why they break the budget the moment anyone tries to run them in parallel.

The broader architectural lesson: keep the orchestration flat. Most agent frameworks in the wild introduce complex state machines, DAG-based planners, or custom runtimes. Claude Code does none of that. It proves that the right answer is a simple loop with sophisticated tooling around it. If your current agent framework requires a diagram to explain its control flow, you are probably over-engineering the wrong layer.

5. The two-mind permission model

This one deserves its own paragraph. Every tool in Claude Code is independently sandboxed. The agent does not have filesystem access. The agent can use the Read tool, and Read has its own permission gate that evaluates deny, ask, and allow rules before anything executes. Deny always wins.

The architectural principle is: the model decides what to attempt. The tool system decides what is permitted. These are two separate minds, and the tool system does not trust the model.

Operationally brilliant detail: permission checks are run by Claude Haiku, the smallest and cheapest model in the Anthropic family, not by the main Opus model handling the reasoning. Permission evaluation is framed as a cheap cascading classifier, not as a reasoning task. This keeps the economics of safety sustainable, which matters enormously once you are running thousands of agent-hours per month.

For HIPAA-regulated deployments, the architectural separation between intent and authorization is not a nice-to-have. It is the pattern the regulators are going to expect. If you are building an agent for a covered entity, your permission system should not live inside the same reasoning context as the agent itself. Put a different mind in charge of the lock.

6. MCP and lazy tool discovery

Model Context Protocol is Anthropic’s open standard for connecting AI agents to external services. Microsoft has adopted it. OpenAI has adopted it. It is becoming the connector standard of agentic AI, and Claude Code’s implementation is now the production reference.

The clever detail in the source: when MCP servers are connected, Claude Code does not load all their tool schemas into context upfront. It loads only tool names at session start, then uses a search mechanism to discover relevant tools when a task actually needs them. This is the only way to scale tool counts into the hundreds without blowing out the context window.

For enterprise deployments wiring an agent into dozens of line-of-business systems, which is exactly the Microsoft position with M365 and the position of every major healthcare system running Epic plus a dozen niche clinical tools, this lazy-discovery pattern is not optional. It is the primitive that makes the entire integration story work. If your current agentic platform eagerly loads every tool schema at startup, it does not scale to the enterprise integration surface you actually have.

7. The three-stage context compaction pipeline

Long sessions are the unsolved problem of agentic AI. Every engineer who has built an agent has hit the same wall: the longer the session runs, the more confused the model gets. Anthropic internally calls this context entropy.

The harness contains a three-stage compaction pipeline that is arguably the single most valuable pattern in the entire codebase. Stage one truncates cached tool outputs locally, preserving the decisions without the raw data. Stage two generates a structured 20,000-token summary when the conversation approaches the context limit. Stage three compresses the full conversation and adds recently accessed files (up to 5,000 tokens per file), active plans, and relevant skills back into the rebuilt context.

The operational insight for technical leaders: context management is the hardest problem in agentic systems, and it deserves the most engineering investment. Most teams spend their time tuning prompts. The teams that ship working agents spend their time engineering what goes into, and out of, the context window. If you are funding an agentic AI initiative right now, ask your team what their context compaction strategy is. If the answer is “we just use a longer context window,” the initiative will fail at scale.

What This Means for Microsoft Cowork

Walk the seven patterns against Microsoft’s own description of Copilot Cowork and the translation becomes obvious.

Microsoft says Cowork “runs within Microsoft 365’s security and governance boundaries. Identity, permissions, and compliance policies apply by default, and actions and outputs are auditable.” That is the permission and hook model, re-implemented on top of Microsoft Entra and Purview instead of Claude Code’s local sandbox.

Microsoft says Cowork “runs in a protected, sandboxed cloud environment, so tasks can keep progressing safely as you move across devices.” That is KAIROS, re-implemented on Azure instead of your laptop.

Microsoft says Cowork “turns your request into a plan. The plan continues in the background, with clear checkpoints so you can confirm progress, make changes, or pause execution at any time.” That is the coordinator-plus-sub-agent pattern with the append-only audit log, expressed in product language.

Microsoft says Cowork is “powered by Work IQ” and “draws on signals across Outlook, Teams, Excel, and the rest of Microsoft 365.” That is MEMORY.md plus the MCP integration layer, re-implemented on top of the Microsoft Graph.

None of this is coincidence. Microsoft is consuming the Anthropic pattern language. They are not copying the code. They are licensing the architectural primitives via the Claude Agent SDK and wrapping them in Microsoft’s identity, compliance, and data boundaries. The model is Claude Opus 4.7 (now in Copilot Cowork as of last week). The harness is Anthropic’s SDK. The application is Microsoft’s.

And that is precisely why the Anthropic codebase is the most useful document you can read right now if you want to understand where Copilot Cowork is going. The features sitting behind feature flags in the Anthropic source today are the features that will ship in Copilot Cowork in the next two to three quarters.

Why This Matters for Technical Leadership

If you are a healthcare CIO, a hospital informatics lead, or a CTO of any regulated enterprise evaluating where to place your agentic AI bets, here is the read.

Model choice is becoming less important than harness choice. Harness choice determines whether your agent can safely persist across sessions, whether it leaves an audit trail, whether it respects your data boundaries, whether it can scale to the tool counts your actual business requires, and whether it can handle long-running workflows without hallucinating its way into a compliance incident.

The Anthropic harness, now visible in unprecedented detail, represents the current state of the art. Microsoft is consuming it. Other platforms will follow. The pattern language itself is the differentiator for the next eighteen months, not the underlying model.

For healthcare specifically, three of the seven patterns are immediately relevant. Memory-as-hint matches how clinical reasoning works and should be the default for any clinical-adjacent agent. The two-mind permission model is the pattern your compliance team will accept, because it separates intent from authorization at an architectural layer regulators can audit. And the append-only audit log that KAIROS introduces is the pattern that makes autonomous agents defensible under HIPAA and the emerging state AI governance laws.

The leak was framed as a security story. It is actually an industry story. For the first time, we can see the shape of what the middle tier of agentic AI looks like in production, and we can read Microsoft’s product roadmap by looking at the features currently flagged off in the Anthropic codebase. The features that ship next in Claude Code will almost certainly appear in Copilot Cowork a few months later, with a Microsoft wrapper and a different billing mechanism.

Pay attention to the middle tier. It is where the real competition is happening, and it is where your architectural bets for the next three years will either pay off or strand.

References

Paul J. Swider is the CEO and Chief AI Officer of RealActivity, a Microsoft partner building healthcare AI solutions. He is an analyst-practitioner with Cloud Wars and the Acceleration Economy, a Microsoft MVP and MCT, and the founder of BOSHUG, the Boston Healthcare Cloud & AI Community.

Discussion about this post

Ready for more?