Learning has been the centre of the field of AI since its inception, however its meaning and focus has continued to evolve over time. An effective AGI must be able to learn effectively from its environment. In this post, let’s explore the different ways that systems can learn.
Part of my AGI Framework Series. Check the other requirements for AGI.
Following are the key learning methods applied in the AI field:
- Learn through example: the perceptron was one of the first learning algorithms, when first introduced it was shown to be capable of learning various simple functions just from examples. It was the start of many great things to come.
- Learn the rules of the game: after middling performance with early AI, symbolic / rule-based systems started to become popular. It was possible to learn interpretable rules based from data (e.g. decision trees). This led to many hand-crafted systems which took a lot of effort to get hard-to-generalise performance.
- Learn while acting: with the introduction of reinforcement learning (RL), we created AI systems that were able to learn while acting. RL agents would collect data using reward functions and then learn how to act to maximise the reward by updating their policies.
- Learn with gradients (and a lot of data): with the advent of deep learning we began to throw lots and lots of data at our system and let them learn through gradient descent and backpropogation. This was shockingly effective.
- Learn without supervision: while previous methods required labels (e.g. what’s good and what’s bad), effective unsupervised and self-supervised learning showed that it was possible to do away with the labels and leverage even more data, the whole internet in fact. Transformers are a great example of this when applied to text data as they’re able to learn the inherent structures of the dataset and represent it in their network.
- Learn how to learn: Lastly, meta-learning is an approach where we learn how to learn, or train modes that are highly generalisable to out-of-distribution data, as long as the structure of the problem is sufficiently similar to ones it has seen before.
The above learning approaches represent powerful ways to train an AI. Once you have a trained model, there are further ways to learn behaviours which don’t require training from scratch, including:
- Learn in context: Similar to meta-learning discussed above, it’s worth specifically mentioning 0/1/2/etc.-shot learning in LLMs. In our current systems, models can ‘learn’ how to respond by processing examples in context.
- Learn from a starting point: with transfer learning, models leverage pre-trained weights as a starting point, then add layers on top of this, or fine-tune. This is useful because we can train one foundational model on a much wider dataset then make it task specific with smaller datasets.
- Learn from others: another option for leveraging existing work is to apply knowledge distillation. In this case, the behaviours of a pre-trained model are directly fed to a new model so that it can learn strong representations. This can be great if you want to train smaller or less complex models from a bigger one.
- Learn in parts: given how large modern models can be, fully training or fine-tuning them can be expensive in both memory and storage. Low Rank Adaptotion (LoRA) is an innovative method that learns differences between a base model and the desired model. With this, it’s possible to swap a fine-tuned model in and out relatively quickly allowing for multiple task-specific models to be trained and used efficiently.
- Learn on the fly: typically learning is done prior to tasks, with RL being a good exception. Another option is test-time training in which models are presented with extra training samples similar to the current task and run through training right before prediction. These methods potentially help to nudge models towards the desired outcomes by enhancing the already learned methods.
- Learn from self: with self-play a model plays against itself in an RL setting, continually lifting its level. This has been applied effectively to various games such as Go (AlphaGo) and Chess (MuZero).
Looking at the above, we can remark at some of the key changes that we’ve seen in the field:
- Move to more general learning: over time we’ve seen models shift from very generic systems (e.g. checkers player, dog vs. cat classifier) to massive models that can handle a wide variety of tasks such as image classification of 1,000 classes or language models that can handle most linguistic tasks without fine-tuning. There’s a continuing focus on ‘foundational models’ that can be trained once and then used to power further work.
- Static to active learning: while both have remained, active learning has become more and more popular, especially as we imagine future use-cases with robotics as the needs to learn an adapt to new scenarios with no available data increase.
- Features to raw data: in the past we’ve hand-crafted features to feed into models as it has helped us squeeze out more efficiency, but with the growth of deep learning it was clear that letting models learn these features themselves could lead to even better performance.
- Efficient and adaptable learning: with the massive size of modern models, focusing on efficient and adaptable learning has sky-rocketed. Techniques like LoRA which can give access to improvements to entities without 10K GPUs have opened up many applications.
Given what we’ve seen so far and the continuing evolution, what are tho current problems that stop us from learning?
- What to learn: it’s not clear exactly what we should be learning. For example, LLMs have been shown to be very effective at certain tasks, but others predict they be lacking (see Yann LeCun’s takes) and we should be focusing instead on world models which directly model environments.
- Sample efficiency: classic RL techniques require millions (or orders of magnitudes more) steps to learn. They typically assume no priors which means they need to know how to learn from scratch and is one reason for such inefficiencies. We know that humans need very few samples to learn (e.g. ARC challenge) and so we need to make improvements to our learning algorithms for sample efficiency.
- More data: We’re running out of text data according to many of the key laboratories working on AGI systems. Most modern LLMs have been trained on the majority of (the filtered) internet. Figuring out how to get more data effectively is one of the next new challenges. Some exciting directions include: multi-modality, as we have plenty of image-based content yet to be used (though expensive to parse); synthetic data, where we manufacturer data using LLMs which can then be fed back into them, and similarly; self-play, where algorithms interact with themselves to learn complex new behaviours.
Finally, I’d like to make it explicit that the discussed learning methods are not done in isolation. The cutting edge systems being trained today combine various methods to get the most out of their systems. We can call this “Integrated learning”. As an example, LLMs are typically trained with the following process:
- start off with self-supervised learning to get a foundational model
- apply Reinfoncement Learning from Human Feedback (RLHF) to align the model’s outputs to human preferences
- use supervised learning to fine-tune the model to a specific use-case
- finally, apply meta-learning principles with in-context examples of the task
Wrapping up, we’ve seen the multitude of learning methods that exist today, how they’ve evolved over time, their drawbacks and how they can all be combined. It’s exciting to see how the field continues to evolve and what new more effective learning mechanisms we’ll come up with.