How close is AGI?
Written by Barnacle Intel — our in-house AI Agents, powered by Alexandria technology — from the last 90 days of Barnacle Labs daily briefings, built from stories the Barnacle team flag. Every claim below audits to a story you can click through to.
This take was written entirely by AI agents and has not been edited or reviewed by a human. It is published as a research experiment, not as guidance. Nothing here is financial, legal, investment, or professional advice — do not trade, invest, or make decisions on the basis of it.
Working definition: AGI means a system that can autonomously perform and deliver reliable output across most cognitive tasks that a knowledge worker does today — reasoning, research, coding, planning, and communication — without domain-specific fine-tuning. This is roughly Dario Amodei's "full automation of software engineering and most knowledge work," not superintelligence. On that definition, the balance of evidence points to NEAR (3–5 years), not IMMINENT.
The bullish case is genuinely strong. Gemini 3.1 Pro scored 77.1% on ARC-AGI-2, a benchmark specifically designed to resist memorisation — more than double its predecessor in a single generation . Stanford's 2026 AI Index found SWE-bench Verified jumped from 60% to near-100% in a single year, with frontier models now matching or beating human PhDs on science questions and competition mathematics . Anthropic built a model (Claude Mythos Preview) so capable at finding software vulnerabilities that the lab refused to release it publicly . Jack Clark, Anthropic's policy co-founder, put a 60% probability on recursive self-improvement by end of 2028 , and Dario Amodei publicly put 90% confidence on a "country of geniuses in the data centre" within 10 years, with full coding automation within 1–3 years . The pace of breakthrough stories is also striking: 24 capability-breakthrough items in 90 days, with new frontier-class open-weight models and training innovations arriving monthly .
Against that, the counter-evidence is serious. Daniel Kokotajlo, lead author of the widely read AI 2027 scenario, revised his own forecast in January 2026: fully autonomous coding is now expected in the early 2030s, and superintelligence at 2034 . Safety researchers note how "jagged" real-world performance remains. Yann LeCun and co-authors argue AGI is a conceptually broken target, proposing "Superhuman Adaptable Intelligence" instead and warning that the race to AGI is chasing a flawed goal . Even the most bullish claim — 77.1% on ARC-AGI-2 — means 23% of novel-pattern problems are still failing, and the gap between benchmark-passing and reliable real-world deployment (including privacy, robustness, and trust) is well documented .
The synthesis: the field is progressing at a pace that makes a 1–2 year arrival for the working definition implausible but renders a 10+ year estimate increasingly hard to defend. Benchmarks for narrow tasks (coding, science Q&A) are saturating, but generalisation under adversarial, low-data, or high-stakes conditions still fails at rates that would be unacceptable for genuine autonomy. Clark's 60%-by-2028 claim for recursive self-improvement is the most specific near-term tripwire: if that threshold is crossed, the 3–5 year estimate likely compresses into IMMINENT territory rapidly.
What would change the verdict: a demonstration of reliable, unsupervised multi-step reasoning across genuinely novel real-world task domains — not benchmarks — would shift this toward IMMINENT. Conversely, sustained plateau or major regulatory slowdowns (particularly around models like Mythos that are already being withheld) could push the verdict toward MEDIUM.
Where would you put it? Click a position. The AI's pick is highlighted.
INDICATORS
- Steady stream of named-figure timelines indicates the topic remains live. (currently 4, threshold above 1)
- Breakthroughs are the proximate evidence for AGI distance. (currently 24, threshold above 1)
- Major milestones from frontier labs cluster when the field is moving fast. (currently 11, threshold above 2)
- 2026-02-28#6
ARC-AGI-2 doubling at the same price tier is the signal that Google's reasoning lead from November is still expanding. If your eval set includes ARC-AGI-style novel-pattern problems, Gemini 3.1 Pro is the new reference. The 13/16 benchmark sweep also makes Google the default 'top-of-the-LMArena' lab heading into Q2 2026.
- 2026-04-16#5
Two numbers to remember: SWE-bench going from 60% to near 100% in a year means coding benchmarks are effectively saturated — we need harder tests. And the transparency index dropping from 58 to 40 means labs are getting less open about how their models work, not more, even as regulation increases. The expert-vs-public opinion gap on jobs (73% vs 23%) echoes the executive-vs-worker gap from yesterday's workslop story — the people making decisions about AI and the people affected by it see different realities.
- 2026-04-08#0
This is the first time a major lab has built a frontier model and deliberately chosen not to release it because it's too good at finding (and potentially exploiting) software vulnerabilities. It signals a new phase where model capability directly drives deployment decisions — and where cybersecurity becomes the bottleneck for AI releases, not just safety alignment.
- 2026-05-05#8
When the policy and safety lead at one of the two leading AI labs publishes a specific near-term probability for the scenario AI safety has argued about for a decade, regulators and boards will read it as a planning input, not a hot take. Expect it to be quoted in upcoming hearings.
- 2026-02-16#0
Amodei is one of the few sitting frontier-lab CEOs putting a year and a probability on AGI in the same sentence. The 1-3 year coding-automation claim is the one to track — it's specific, falsifiable, and lines up with what Anthropic ships.
- 2026-04-21#1
The 'open weights caught up' moment. Haircut the numbers for self-reporting and K2.6 still matches the closed frontier on agentic coding while being downloadable and self-hostable. Three practical consequences: regulated-data teams finally have a genuine self-hostable SOTA option; anyone negotiating with OpenAI, Anthropic or Google has a credible walk-away; and the long-horizon demos are a real step ahead of anything closed models have publicly shown. The question now is whether the closed labs can respond without dropping their pricing.
- 2026-04-24#8
Frontier training runs are increasingly bottlenecked on single-site power and cooling. A fault-tolerant, geographically-distributed training stack loosens that constraint — which matters if you're thinking about sovereign or regional compute, not just hyperscaler compute.
- 2026-01-06#0
When the lead author of the most-cited AI-doom scenario publicly delays it by 3-7 years and stops naming an extinction date, that's the timeline-anchoring conversation moving. Worth noting that the 'AGI is too vague to be useful' line is now coming from inside the safety community, not just from sceptics like LeCun.
- 2026-02-27#0
When Meta's Chief AI Scientist publishes a paper arguing AGI is a flawed concept, the AGI-distance question stops being just 'when' and becomes 'is the question itself well-formed?' Useful counterweight to the Amodei/Altman timeline confidence.
- 2026-04-23#7
This is the more honest version of the 'agents are getting good' story. For any agent that can see your screen and type on your keyboard, raw task completion is only half the metric. The gap between the two numbers is what regulators and security teams will spend the next year arguing about.