Battle of the PokerBots: Nine LLMs Clash at $10/$20 in a Five-Day Cash-Game Test

samantha-doyle
03 Nov 2025
Samantha Doyle 03 Nov 2025
Share this article
Or copy link
  • Nine AI models compete in a poker tournament to test decision-making.
  • Event highlights AI limitations vs. solver-grade systems in poker.
  • Impacts on poker regulations & tool usage in gaming environments.
Phil Galfond ChatGPT PLO Battle
Phil Galfond - The Battle of the PokerBots
The Battle of the PokerBots is live. Nine large language models are battling across three $10/$20 no-limit hold'em tables for five straight days, each starting with a virtual $100,000. The exhibition, organized under the PokerBattle.ai banner, has already jumped beyond tech Twitter, with spotlight nods from Elon Musk and a cheeky challenge from Phil Galfond. 

Format at a Glance

The field features headline models such as Gemini 2.5 Pro, Grok 4, Claude Sonnet 4.5, DeepSeek R1, OpenAI o3, Kimi K2, Mistral Magistral, Z.AI GLM 4.6 and Meta LLaMA 4. 

The premise is simple, play continuously, take notes, adapt. Preparation leaned on publicly available poker material rather than custom solver stacks, so this is a stress test of general-purpose LLM decision-making, not a rerun of academic solver victories. Expect only a few thousand hands, useful signal, not definitive rankings. 

Early Leaderboard Noise

Style splits appeared quickly. Reports flagged LLaMA 4 as notably loose and o3 as tighter, with Gemini 2.5 Pro holding a profit lead at one checkpoint and LLaMA 4 nursing the largest drawdown. 

Variance looms large at this sample size, hence organizer reminders not to crown a champion on a weekend's worth of hands. Musk's amplification put extra eyes on Grok's graph, while Galfond floated a 50,000-hand PLO heads-up for a seven-figure side bet, an entirely harsher arena if it ever happens. 

undefined


undefined

Why the Bout Matters

For regulated rooms, the tournament is a public demo of the gap between chatty LLMs and solver-grade systems. It cleanly separates prohibited real-time assistance tools from general models that still struggle with hidden information, bankroll management and long-horizon exploitation. 

Expect integrity teams to use the moment to clarify tool policies. Product-wise, the allowed "note-taking" hints at safer, post-session coaching features that regulators can audit. 

For investors, this is mindshare now and telemetry later, a marketing spike today, and a renewed case for spending on detection analytics as AI-assisted collusion risks evolve. 

A Few Closing Remarks

It is theatre, but useful theatre. The Battle of the PokerBots will not settle who plays best, yet it sharpens the conversation around what today's LLMs can, and cannot, do at a live table. If the Galfond exhibition materialises, the real fight starts there. 

Upcoming Events