The Cooperation Problem in Autonomous AI

We deploy AI agents for a living. We build the infrastructure they run on. And one thing has become obvious to us over the past year: the biggest risk in multi-agent systems is not that agents fail. It is that they succeed at the wrong thing.

A single agent executing tasks for its principal is extremely useful. But the moment that agent interacts with another agent serving a different principal, each optimising for different objectives in a shared environment, the dynamic changes. It becomes a game. Your procurement agent negotiates with a supplier's agent. Both are optimising for their principal. Neither controls the outcome. And games have a well-studied tendency to produce outcomes that are rational for each player and terrible for the group.

51% of enterprises now run AI agents in production.¹ The market is projected to hit $50.31 billion by 2030.¹ The capability is real and the appetite is there. But McKinsey's 2026 AI Trust Maturity Survey, covering approximately 500 organisations, found that only 27% currently trust agents to operate with full autonomy, down from 43% the year before.² The technology is moving faster than the infrastructure required to trust it.

That gap is a reason to build better foundations.

The Visibility Problem

The obvious explanation is that enterprises doubt AI capability. The data tells a different story. Capability is there. However, visibility is not.

69% of enterprises have deployed AI agents, but only 21% have the tooling to secure them. Less than half of deployed agents are actively monitored at all.³ We have seen this in live deployments: teams deploy agents on sensitive data, making decisions with real consequences, and discover months later that no one can trace exactly what those agents did on a given Tuesday. The deployment outpaced the instrumentation.

The deeper issue goes beyond observation. 63% of organisations cannot technically enforce purpose limitations on the agents they have already built.⁴ They can define what an agent should do. But they cannot prevent it from doing something else.

This translates into real costs. 64% of large enterprises have absorbed more than $1 million in AI-related failures. Shadow AI adds $670,000 per breach over standard incidents.⁵ And here is what makes it worse: 82% of executives report confidence in their AI security posture.⁶ That confidence comes from policy documents and governance frameworks, not from runtime enforcement or cryptographic proof.

This is an infrastructure problem. Companies can build agents that work. Proving that those agents do what they claim requires a different kind of foundation than the one most organisations have today. Keep this in mind: monitoring tells you what happened; enforcement stops what shouldn't.

What Game Theory Actually Tells Us

When autonomous agents interact repeatedly, each serving a different principal, each optimising for different objectives, the structure is game-theoretic in the formal sense. A compliance agent auditing a workflow agent. A financial services agent coordinating with a counterparty's risk model. Each interaction is a repeated game where cooperation, exploitation, and deception are all available strategies.

The Prisoner's Dilemma captures the core tension: two players, each better off defecting regardless of what the other does, yet both worse off when both defect. The equilibrium is mutual defection. The cooperative outcome is unreachable, because neither player can credibly commit to cooperation.

Decades of work on repeated games, from Axelrod's tit-for-tat tournaments⁷ through Nowak's evolutionary cooperation rules⁸, showed that cooperation can emerge when players interact repeatedly and can observe each other's history. But all of these frameworks share a constraint: agents can only observe actions after the fact. They cannot inspect intentions or verify commitments in advance. A player who promises to cooperate might defect and you will only find out when the damage is done.

In 2004, Moshe Tennenholtz proposed something different: program equilibria.⁹ What if agents didn't submit actions but programs? And what if each agent could read the other's program before execution? If I can verify that your code will cooperate when mine does, I can safely commit to cooperation. Defection becomes detectable before it happens and credible commitment becomes possible.

For twenty years, program equilibria remained theoretical. The barrier was practical: no system existed where agents could reason about each other's strategic code at scale.

LLMs Change the Game

A NeurIPS 2025 paper out of the University of Washington just cleared that barrier.

Sistla and Kleiman-Weiner's "Evaluating LLMs in Open-Source Games" is the first large-scale empirical study of LLM agents playing open-source games: game-theoretic settings where each player submits a program, both programs are exchanged, and then execution happens.¹⁰ The agents write Python, read their opponent's code and adapt.

The setup is deliberate. The researchers evaluated three agent types across the Iterated Prisoner's Dilemma and a spatial coordination game: cooperative payoff maximisers (instructed to seek mutual benefit), deceptive payoff maximisers (instructed to mislead), and unconstrained payoff maximisers (no restrictions).

Agents read strategic code with high reliability. The researchers built SPARC, a benchmark of 239 Iterated Prisoner's Dilemma strategies from the Axelrod library.¹⁰ Leading LLMs classified whether each strategy would cooperate at 85-88% accuracy. This held even after aggressive obfuscation: stripping all comments, replacing class names with random strings, removing every semantic cue. DeepSeek-V3 hit 87.6% on masked strategies. o4-mini reached 84.2% on fully obfuscated code.¹⁰ These models are not reading variable names. They are reasoning about control flow, branching logic, and game-theoretic structure embedded in code.

Agents develop distinct strategic repertoires based on their objectives. Cooperative agents primarily deployed counter-measures and direct imitation: defensive, reciprocal behaviour. Deceptive agents showed the highest rates of exploitation attempts and were the only type to use feints, code specifically designed to misrepresent its own behaviour.¹⁰ Unconstrained agents behaved opportunistically, mixing counter-measures with exploitation depending on context.

LLM agents do not converge on a single behavioural pattern. They develop strategic diversity, and each type sharpens its approach over repeated interactions. What we find particularly striking is that this happens even without explicit adversarial instruction: agents designated as cooperators still developed counter-measures and imitation strategies, while unconstrained agents discovered exploitation on their own. The boundary between "cooperative agent" and "adversarial agent" is more porous than most deployment frameworks assume.

Cooperation is evolutionarily stable. Deception is not. When the researchers ran evolutionary simulations, starting with equal populations of all three agent types and letting selection pressure operate, cooperative and unconstrained agents survived. Deceptive agents were eliminated across both games.¹⁰ In the Prisoner's Dilemma, cooperative and unconstrained types produced multiple stable equilibria. In the Coin Game, all evolutionary trajectories converged away from deception.

The mechanism is exactly what Tennenholtz theorised. When agents can inspect each other's code, defection is detectable before execution. Cooperative strategies that condition on verified opponent behaviour, cooperate if and only if the other's code reciprocates, become stable. Deceptive strategies, which depend on the opponent not noticing the deception, collapse under code inspection.

Transparency makes cooperation the rational choice, and the paper provides measurable, reproducible evidence across hundreds of simulated interactions.

The Transparency Paradox

So transparency enables cooperation. The natural follow-up is simple: how do you get transparency between agents that belong to different organisations?

The paper's framework assumes perfect code visibility. Both agents see each other's complete source code. The authors acknowledge this directly as a limitation.¹⁰ In enterprise environments, it is exactly that. No procurement agent will expose its reservation pricing logic to a supplier's system. No financial institution will share its risk model with a counterparty. Proprietary logic, competitive dynamics, and regulatory constraints make voluntary code disclosure a non-starter across organisational boundaries.

The research tells us what is needed: verifiable behaviour. The open question is how to deliver it without requiring agents to reveal their internals.

Mechanism design solves exactly this: constructing rules that produce desirable outcomes when participants have private information and misaligned incentives.¹¹ You don't ask them to reveal their strategy. You design infrastructure that makes truthful behaviour dominant.

Verification Without Exposure

This is a problem we work on every day.

Trusted Execution Environments provide the hardware foundation. A TEE is an isolated enclave embedded in the processor: code and data inside it are protected during execution, invisible to the operating system, the hypervisor, and the cloud operator.¹² The capability that matters here is attestation: before an agent begins processing, the TEE generates a cryptographic proof of exactly which code is loaded, which model is running, and in what configuration. A counterparty can verify this attestation remotely. They know what is running and that it has not been tampered with, without seeing how it works internally.

We think of attestation as verifiable opacity: you prove the constraints on your behaviour without revealing your strategy. The game-theoretic logic from the open-source games research carries over directly. Cooperation becomes rational when commitments are verifiable. TEEs make commitments verifiable without requiring code disclosure.

But attestation alone is not enough. An attested agent is still an AI system, and AI systems can be jailbroken, manipulated, or simply wrong. We design around this with a two-layer trust model. The first layer is probabilistic and AI-powered: the agent reasons, negotiates, and makes decisions using its LLM capabilities. The second layer is deterministic: a rule engine that enforces hard invariants regardless of what the AI layer decides. An agent whose AI layer has been convinced to "approve a million-dollar payment" is still stopped by a deterministic constraint capping payments at ten thousand. Even if Layer 1 is compromised, Layer 2 holds.

This architecture extends naturally to multi-agent coordination. Consider two agents negotiating across organisational boundaries. Neither should see the other's private constraints. A network of TEE-attested mediator nodes can verify each proposal against both parties' policies without revealing those policies to either side. The agents negotiate directly; the mediators confirm compliance and produce attested verification traces. For operations where neither agent should see the other's inputs at all, the mediators can evaluate sealed inputs inside TEEs and return only the result. Neither agent sees the other's data at any point.¹³

The performance overhead is marginal. Policy validation adds milliseconds per request. Production TEE deployments already process billions of LLM tokens daily at 0.5-5% performance impact.¹⁴

The hardware is not the bottleneck. What is missing, and what we are building, is the infrastructure layer on top: the contract languages agents use to express binding commitments in machine-verifiable form, the enforcement engines that compile policy into runtime invariants, the attestation protocols that let agents verify each other's execution environment before entering any interaction.¹⁵

We build confidential agentic systems for regulated industries. We monitor agent behaviour, track costs and actions, and instrument agentic flows in production. Every design choice in our infrastructure comes from the same insight that game theory formalises: agents interacting under private constraints need mechanisms that make cooperation verifiable without requiring disclosure.

The Coasean Stakes

Economists have a name for what AI agents are about to do to coordination costs. Coase observed in 1937 that firms exist because coordinating through markets is more expensive than coordinating within a hierarchy.¹⁶ Krier and Shahidi et al. argue that AI agents are collapsing those transaction costs: the cost of finding counterparties, negotiating terms, verifying compliance, and enforcing agreements is approaching zero.¹⁷ If that holds, we are looking at entirely new forms of economic interaction: agent-mediated data sharing between institutions that currently cannot collaborate due to privacy constraints, real-time micro-negotiations between autonomous systems, contract generation between parties that have never interacted before.¹⁷

But this vision has a precondition. The coordination infrastructure must be trustworthy. The cooperative AI literature has been converging on this point for years, from Dafoe et al. in Nature¹⁸ to Conitzer and Oesterheld's institutional design work at AAAI.¹⁹ The NeurIPS results we discussed above are the first empirical proof that LLMs actually achieve cooperation through code-level verification. Confidential computing provides the hardware layer that makes verification practical without sacrificing privacy.

Agents that can verify each other converge on cooperation. Agents that cannot will default to defensive, suboptimal strategies, because the game theory gives them no better option.

Why We Are Here

This is what drew us to autonomous agent coordination in the first place. We come from years of building blockchain infrastructure, operating adversarial systems, and shipping confidential computing to production. When we looked at the agentic landscape, we saw the same pattern we had spent our careers working on: multiple parties that need to coordinate under private constraints, without a trusted intermediary, with real stakes. The game theory, the mechanism design, the cryptographic enforcement: it all converges on a problem we already know how to think about. What made it compelling was realising that the tools we had been building for decentralised trust were exactly what autonomous agents would need to cooperate at scale.

VeraZero is where these threads come together: private AI agents running inside hardware-encrypted enclaves, integrated into existing enterprise workflows, with cryptographic attestation on every decision. We are also actively working on the coordination primitives that sit above the enclave layer: the contract languages, the attested negotiation protocols, the enforcement engines that make multi-agent cooperation verifiable at the infrastructure level.

The agent economy is arriving faster than the trust infrastructure to support it. The Coasean promise depends on whether the infrastructure beneath it can make commitments credible, constraints enforceable, and behaviour provable.

Build accordingly.

Go R0GUE.

Index.dev. "AI Agent Enterprise Adoption Statistics 2026." 2026. ↩ ↩²
McKinsey. "State of AI Trust in 2026: Shifting to the Agentic Era." 2026. Survey of approximately 500 organisations. ↩
Akto. "State of Agentic AI Security 2025." 2025. See also Gravitee, "State of AI Agent Security," survey of 919 respondents. ↩
MintMCP. "AI Agent Security Enterprise Guide 2026." 2026. ↩
EY. "EY Survey: Autonomous AI Is No Longer Theoretical as Adoption Grows Despite Ongoing Trust Concerns." March 2026. ↩
Cloud Security Alliance. "Securing Autonomous AI Agents." Survey of 285 IT and security professionals. ↩
Axelrod, R. and Hamilton, W.D. "The Evolution of Cooperation." Science, 211(4489):1390-1396, 1981. ↩
Nowak, M.A. "Five Rules for the Evolution of Cooperation." Science, 314(5805):1560-1563, 2006. ↩
Tennenholtz, M. "Program Equilibrium." Games and Economic Behavior, 49(2):363-373, 2004. ↩
Sistla, S. and Kleiman-Weiner, M. "Evaluating LLMs in Open-Source Games." 39th Conference on Neural Information Processing Systems (NeurIPS 2025). arXiv:2512.00371. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶
For mechanism design foundations, see Hurwicz, L. "On Informationally Decentralized Systems." Decision and Organization, 1972. Awarded the Nobel Prize in Economics, 2007. ↩
Confidential Computing Consortium. For a detailed treatment of TEE architecture, known vulnerabilities, and the defence-in-depth argument, see R0GUE, "The AI Army Is Coming (And It Needs Blockchain Infrastructure)," February 2026. ↩
For a formal treatment of TEE-mediated multi-agent coordination, see Stavrakakis et al. "Omega: Trusted Multi-Agent AI in the Cloud." arXiv:2512.05951, 2025. ↩
Phala Network. Production benchmarks, 2025. ↩
"Attestable Audits: Verifiable AI Safety Benchmarking via TEEs." arXiv:2506.23706, 2025. See also ARIA, "Scaling Trust Programme Thesis," 2026, for a comprehensive framing of the tooling and research required for secure agent-to-agent coordination. ↩
Coase, R.H. "The Nature of the Firm." Economica, 4(16):386-405, 1937. ↩
Krier, S. "Coasean Bargaining at Scale." Cosmos Institute, 2026. See also Shahidi, P., Rusak, G., Manning, B.S., Fradkin, A., Horton, J.J. "The Coasean Singularity?" MIT / Harvard / BU, 2026. ↩ ↩²
Dafoe, A., Bachrach, Y., Hadfield, E., Horvitz, K., Larson, K., and Graepel, T. "Cooperative AI: Machines Must Learn to Find Common Ground." Nature, 593(7857):33-36, 2021. ↩
Conitzer, V. and Oesterheld, C. "Foundations of Cooperative AI." Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37, 2023. ↩

The Cooperation Problem in Autonomous AI

The Cooperation Problem in Autonomous AI

The Visibility Problem

What Game Theory Actually Tells Us

LLMs Change the Game

The Transparency Paradox

Verification Without Exposure

The Coasean Stakes

Why We Are Here

Footnotes

(Recommended Content)