Hello from Chinatown!
Reading the mysterious occurrences surrounding the company building the Silicon God has become somewhat of an arcane art. What did Ilya see? The question, famously posed by many on X when Ilya Sutskever voted to fire Sam Altman from OpenAI, was made all the more poignant when he and Jan Leike (head of the superalignment team) recently announced they were leaving the company.
I'm going to dabble at reading the tea leaves here, based on what another star OpenAI exit said about Meta's LLaMa-3. When Andrej Karpathy wrote this tweet, previously held wisdom stated that a model the size of LLaMa-3 would max out its abilities after around 1 Trillion tokens — or 10 million books — worth of training. It turns out that Meta continued to find it improving after 10 times that quantity of training data (however obscene that amount of books already sounded. Now where did Meta get all that data?)
This interpretation seems to announce that the common wisdom that training set size should increase proportionally relative to model size, known as the Chinchilla scaling law, might not quite hold. Might it even be an exponential relation?
Unverified leaks say that GPT-4 was trained for three to four months. It's been over a year since it was released now, but if we're thinking of a bigger model trained on a substantially larger dataset, it's possible GPT-5 even today is still improving. It's also possible that it's not clear when it will stop doing so.
It’s also worth noting that at this point, to meet the unfathomable training set requirements we’re speculating about, OpenAI might be relying on purely synthetic data, likely having GPT-5 provide itself with feedback to satisfy its voracious appetite for data — an idea akin to self-play. Having to actively generate the training set GPT-5 is training on might also make things extra slow. That would explain why OpenAI hasn't published it yet, opting for other releases to keep the hype alive, such as Sora or GPT-4o.
It might also explain a general feeling of unease within the super-alignment team. Being faced with a model with seemingly boundless capacity to improve, when alignment as a discipline is still struggling to catch up to the current generation of models, might have led to growing anxiety on Leike and Sutskever's part.
I now wonder if GPT-5 will ever be publicly released, as it might get so good that private interests near Altman's inner orbit will want to reserve access for themselves. They certainly have leverage, given Altman's seemingly bottomless appetite for funding. Anyways, just a theory.
See you next Sunday (or Monday if I’m late)!