Another Great Year for AI, 2024 -> 2025

Jye Sawtell-Rickson · January 1, 2025

AGI

2024 was a great year. What interested me:

Models models models: they got bigger and better. GPT-4o (May), o1 (preview in September), o3 (teased in December) Bard becomes Gemini (Feb), Gemini 2.0.
Capabilities:
- Video: is now possible at length. Sora (teased a few times), Meta’s MovieGen.
- Voice: OpenAI released Voice Mode allowing realistic voice chats with LLMs.
- Where we live: Windows 12 with Copilot built in, Google AI Overview integrated into search (May).
- Agents: October Anthropic’s model could take over your screen with computer use, an end-to-end research the AI Scientist shows what’s possible.
- Tools: NotebookLM generates podcasts and has chat, Cursor the AI code editor.
Society: AI is more and more putting itself at the front of everyone’s minds.
- Scientific Recognition: (October) Nobel Prizes for Physics (Hinton and Hopfield) and Chemistry (DeepMind).
- Law: AI legislation act EU August
- Money: NVIDIA top-performer with a $3.5 trillion market cap in October
- Risks: investing in nuclear to impending solve energy crisis, training data shortage (though Claude 3 used synthetic data), Situation Awareness and the year ahead.
Reasoning: One of my favourite benchmarks, ARC was solved to an accuracy of 85% with OpenAI’s teased o3. SWE-bench at 50% (compared to 3% in 2023)

All that, in one year. I’m looking forward to 2025 bringing even more amazing advancements.

Hot topics for 2025:

Agents: are LLMs capable of being actors? Is the quality high enough for long, uninterrupted chains of actions? Can we align them properly over long timescales? Can AI do research?
Data: models have largely consumed the internet, can they start consuming themselves? What is self-play in the world of LLMs? Can models make more efficient use of the available data (you don’t skim a textbook once, you study it)?
Reasoning: how far will investing into inference time compute get us? What innovations does o3 contain? Can we combine symbolic + connectionist approaches together (e.g. paper)?
Power: as LLMs continue to grow in capabilities, is the current ownership model sustainable? What global conflicts could this cause?
Benchmarks: with a lot of key benchmarks already at the breaking point in 2023, with 2024 they’re largely saturated. With 2025 we’re expecting a new ARC and a completely new reasoning benchmark, but I won’t be surprised to see a bunch more.