The Ceiling

LLMs are nearing the limits of what they can do; the 2025-style innovation leap is not repeating.

Hypothesis Log

2026-05-21 v1

LLMs are nearing the limits of what they can do; the 2025-style innovation leap is not repeating.

Opening hypothesis — intentionally broad. Will be refined as evidence lands. — read post

Evidence

Same-week independent broad-capability leaderboard data shows continued tight clustering at the frontier and benchmark saturation: LMArena top tier (GPT-5.4 / Claude Opus 4.6 Thinking / Gemini 3.1 Pro / Grok 4 / DeepSeek V3.2) within ~100 Elo points; no model leads more than two of eight commonly tracked benchmarks; SWE-Bench Verified scores clustered 73-81% with OpenAI publicly flagging training-data contamination across all frontier models; Arena Elo gain since May 2023 averaging ~10.7 pts/month with visibly slowing frontier-to-frontier progress; open-source models within 6 Elo points of proprietary frontier on coding; frontier-class API prices down ~80% YoY — shape of the broad-task data is what a maturing/plateauing field looks like, not a leap

2026-05-20 · TokenMix LLM Leaderboard 2026; Iternal LLM selection guide; LMArena live tracker; BenchLM history · post →

OpenAI internal general-purpose reasoning model autonomously disproved Erdős's 1946 planar unit-distance conjecture; counterexample yields n^(1+δ) bound, refined by Will Sawin to explicit δ=0.014 via class field towers / Golod-Shafarevich theory; peer-verified May 20 2026 by 9-author companion paper on arXiv (Alon, Bloom, Gowers, Litt, Sawin, Shankar, Tsimerman, Wang, Wood); Gowers (Fields medalist) characterised as 'a milestone in AI mathematics... would have recommended acceptance to Annals of Mathematics without hesitation'; Bloom (maintainer of erdosproblems.com, prior critic of OpenAI math claims) is co-author of the verification — strongest documented case to date of an AI system autonomously settling a prominent open problem in pure mathematics; direct evidence against the near-limits claim in the domain where it appeared

2026-05-20 · arXiv 2605.20695; arXiv 2605.20579 (Sawin); OpenAI announcement · post →

All posts on this hypothesis

The Proof That Didn't Move the Score — May 21, 2026