Normalising a Short AGI Horizon

Jye Sawtell-Rickson · June 3, 2025

AI 2027 was recently released and is another great forecast of how the next few years could play out, similar to another favourite of mine, Situational Awareness. While listening to the authors on the Dwarkesh Podcast I tried to look a bit deeper on my own thoughts. Why do I also believe AGI is coming soon? Why does it still seem so fantastical, despite my belief? To help myself, I decided to jot down the key points that have influenced that thinking. It’s far from an exhaustive list, but are some of the latest facts that help me to stay grounded in my predictions.

Benchmark Progress

The easiest thing to keep abreast of is benchmark progress. Every month when an AI lab announces their latest and greatest model they tout their improvements on key benchmarks such as MMLU or GPQA. In short, models are quickly saturating most benchmarks with just a few standing strong, such as Humanity’s Last Exam (20%), SWE-Bench (30%) and ARC-AGI (3%). Figure 1 shows this rapid progress across a variety of fields.

Figure 1: Benchmark progress relative to humans across a variety of fields.

The figures in those key remaining benchmarks hide a few things. ARG-AGI has a v1 and v2, with the v1 basically solved by o3. SWE-Bench has a “Lite” version which is closer to 70% accuracy. Both of them have shown rapid progress over the last two years.

More importantly, a key reason this isn’t just ‘more improvements of benchmarks which will be updated’ is that benchmarks are approaching human-level difficulty across the board. Exceeding the human-level means we’ve probably reached AGI and are on the path to ASI.

In general, the biggest gaps remain in the area of complex reasoning and longer task horizons.

Time Horizons and Agents

2025 was named year of the Agent (in 2024). This has largely held true with the likes of DeepResearch and AlphaEvolve bringing agentic behaviour to the public’s attention. More generally, we expect AGI to be able to operate over large timelines in order to achieve more complex goals.

To that end, METR released their research: Measuring AI Ability to Complete Long Tasks. This work tracks the length of task that AI can do and found that it’s currently doubling every 7 months, with values in early 2025 sitting at 1hr (see Figure 2). At the current rate, this would mean day long tasks being accomplished in around 2 years.

Figure 2: Length of task that AI can do over time.

MUST. GO. FASTER.

AI has historically had impact in many ways, but it’s nothing compared to 2025 where it has meaningfully changed the lives of countless workers worldwide. For example, research shows that 50% use it to help with work tasks with tools such as ChatGPT or Perplexity’s new Perplexity Labs. But it’s not only usage, Github and MIT (2023) found that developers using Copilot completed tasks 55% faster on average in a randomised controlled trial, while McKinsey (2023) estimated a $2.6–4.4 trillion annual productivity uplift from generative AI adoption. AI is impacting the majority of people, and driving significant economic value which will lead to even faster changes.

Human progress is accelerating, but we’re also starting to see hints of AI’s direct contributions. Sakuna AI released the AI Scientist which was reported to create high quality research results (though feedback is mixed). More recently largely AI-written papers have even been accepted into A* journals. As AI contributes to scientific value directly it may begin to snowball more noticeably.

While the above scientific publications weren’t particularly groundbreaking, there are recent key milestones from AI-based systems which show it can go beyond (or at least match) human capabilities. AlphaFold is a classic example which led to a Nobel Prize in chemistry for its noticeable contributions to the field. AlphaEvolve was reported to have meaningful impact to Google’s revenue by solving complex coding problems as well as discovering a new matrix multiplication method.

Finally, with efforts like the Model-Context Protocol (MCP) (2024) to create standardised frameworks for AI systems to interface with the world, it’s likely that vast ecosystems of AIs will soon exist, able to run experiments in the physical world through robotics or other means.

Leveraging Others

As a human, I constantly look to others and are influenced by their thoughts. The key works mentioned earlier AI 2027 and Situational Awareness both point to AGI systems by 2030. Betting markets are another valuable source of forecasts. While the individual contributors may not be as well researched as the above-mentioned authors, there’s real money on the line and markets have been shown to be effective at complex predictions by leveraging networks of knowledge.

Manifold is one of my favourite prediction markets and they have a great page dedicated to AI predictions. They have a specific market for when AGI will pass a ‘high quality Turing test’ which is currently set for 2031. Polymarket doesn’t have a specific market for AGI, but even the particular market of AGI in OpenAI by 2025 has a 13% chance.

Checking In

Even as I write this, even after seeing these patterns for many ears, it’s still hard to accept it. The stories seem like sci-fi. My gut reaction is that it’s just unrealistic. But it’s not. I have to constantly remind myself how I thought five years ago and how much that things have changed.

We’ve had ChatGPT forever, it’s a part of my routine! ChatGPT was released in November 2022, 2.5 years ago.

OK, well it’s hardly changed! The difference in task length and general competency has actually drastically improved, it’s just that the smaller steps with new releases every few months has made it less noticeable. We’ve also got a bunch of new tools, e.g. DeepResearch.

Well you still can’t count r’s in Strawberry! Fair, there are only 2.

And it’s not only language modelling. Robotics is shocking. Somehow I see a video of a robot walking around the house cleaning and cooking and I’m desensitised enough to think “sure, but it’s probably only able to do those things, it’s not that amazing!”. We have robot demos in Neo Gamma, Tesla’s Optimus and Atlas from Boston Dynamics, and robots in homes aren’t an impossibility (Figure 1 prices their smaller model at $16K).

What Now

So we’re beating all the benchmarks, we’re creating agents that can operate over longer and longer timeframes, AI is accelerating human progress and beginning to make its own, and experts in the field are making their thoughts clear.

Of course we’re not there yet, of course there are plenty of reasons that current approaches could not work out, but the data is pushing more and more to quick (before 2030) arrival of AGI systems and I (we) need to normalise this view because the cost to not do so is shocking.

If you want to read more, please refer to AI 2027 and Situational Awareness which dive deeply into this topic. Even if you don’t buy the timings for the predictions, the paths they lay out are still likely ones.

Twitter, Facebook