A New Keyboard

We had a big family gathering for Christmas which meant we decided to do a Secret Santa for gifts. I made the bold choice of a new keyboard. I wanted to share my experience so far for two reasons: I’m a big fan of custom setups, 2. it requires a decent amount of work.

Read More

Lessons Learned Since Graduation

My undergraduate university recently posed the question to its alumni: if you could pass one message to yourself at the time of graduation, what would it be? I thought this was a decent question to reflect on so I spent some time on it and am now sharing my thoughts below.

Read More

Embeddings: a Tool for Compression and Expansion

Embeddings are at the heart of machine learning. Embeddings allow us to represent any imaginable object as a list of numbers which can be processed by models. This idea is shockingly powerful. Literally anything, a picture of a car, a poem you wrote in fifth grade, the sound of your favourite song or something as abstract as a stream of vibrations in the Earth’s crust. By formulating all these different inputs into a consistent form we can leverage similar techniques to do useful work such as description, prediction and prescription.

Read More

Embedding Dimensions: from VAEs to VSAs

Vectors are at the heart of modern machine learning techniques. LLMs are powered by the transformer which operates on language tokens which are mapped to embeddings that have around 1000 dimensions. Similarly, vision models represent images as pixels of colour which are translated to vectors (or tensors), audio models represent sound as frequencies and amplitudes which are again translated to vectors. Vectors are a core computational input and processing component throughout machine learning.

Read More

How Language Models Have Scaled

LLMs have grown from 200M parameters up to the estimated 6T parameter models we see today - a factor of 10,000 - let’s explore what innovations made that possible by stepping through the largest models at the time and their specific innovations.

Read More

Nelder Mead Optimisation

Optimisation is at the core of AI research. We spawn instances of massive models with trillions of parameters and then try to optimise their parameters towards some goal, typically represented by a loss function. We’ve become really good at this type of optimisation because it has a key property: we can calculate the gradient. Packages such as PyTorch automatically calculate the expected changes in our loss function if we were to tweak parameters (the gradients) which allows us to make meaningful progress towards the goal. But what if you don’t have gradients?

Read More