Another Great Year for AI, 2024 -> 2025

Jye Sawtell-Rickson · January 1, 2025

2024 was a great year. What interested me:

  • Models models models: they got bigger and better. GPT-4o (May), o1 (preview in September), o3 (teased in December) Bard becomes Gemini (Feb), Gemini 2.0.

  • Capabilities:
    • Video: is now possible at length. Sora (teased a few times), Meta’s MovieGen.
    • Voice: OpenAI released Voice Mode allowing realistic voice chats with LLMs.
    • Where we live: Windows 12 with Copilot built in, Google AI Overview integrated into search (May).
    • Agents: October Anthropic’s model could take over your screen with computer use, an end-to-end research the AI Scientist shows what’s possible.
    • Tools: NotebookLM generates podcasts and has chat, Cursor the AI code editor.
  • Society: AI is more and more putting itself at the front of everyone’s minds.
    • Scientific Recognition: (October) Nobel Prizes for Physics (Hinton and Hopfield) and Chemistry (DeepMind).
    • Law: AI legislation act EU August
    • Money: NVIDIA top-performer with a $3.5 trillion market cap in October
    • Risks: investing in nuclear to impending solve energy crisis, training data shortage (though Claude 3 used synthetic data), Situation Awareness and the year ahead.
  • Reasoning: One of my favourite benchmarks, ARC was solved to an accuracy of 85% with OpenAI’s teased o3. SWE-bench at 50% (compared to 3% in 2023)

All that, in one year. I’m looking forward to 2025 bringing even more amazing advancements.

Hot topics for 2025:

  • Agents: are LLMs capable of being actors? Is the quality high enough for long, uninterrupted chains of actions? Can we align them properly over long timescales? Can AI do research?
  • Data: models have largely consumed the internet, can they start consuming themselves? What is self-play in the world of LLMs? Can models make more efficient use of the available data (you don’t skim a textbook once, you study it)?
  • Reasoning: how far will investing into inference time compute get us? What innovations does o3 contain? Can we combine symbolic + connectionist approaches together (e.g. paper)?
  • Power: as LLMs continue to grow in capabilities, is the current ownership model sustainable? What global conflicts could this cause?
  • Benchmarks: with a lot of key benchmarks already at the breaking point in 2023, with 2024 they’re largely saturated. With 2025 we’re expecting a new ARC and a completely new reasoning benchmark, but I won’t be surprised to see a bunch more.

Twitter, Facebook