Why the Future of LLMs is 8x Faster & Smarter | Deep Dive into SSMs & Nested Learning
ehNWO8v4CG0 • 2025-12-02
Transcript preview
Open
Kind: captions
Language: en
Welcome to the explainer. You know that
engine that powers pretty much all of
modern AI? Well, it's starting to
sputter. Today, we're going to look at
that foundational tech, the cracks that
are starting to show and what might just
might come next. Wow. Okay, so this is a
pretty shocking statement, right? It
comes from Leon Jones, one of the
co-authors of that game-changing 2017
paper, Attention Is All You Need. You
know, the paper that literally invented
the transformer and kicked off this
entire AI boom. So yeah, let's dive into
this. To really get his frustration and
what it means for the future of AI, we
kind of have to understand the
incredible worldchanging impact his
invention had in the first place. So
let's start with act one of our story.
The undisputed dominance of this one
architecture born from that 2017 paper,
an architecture that just completely
took over the entire field. I mean
before the transformer progress in AI
was slow incremental. But after it was
exponential. This was the breakthrough
that made things like large language
models LLM even possible. It really
became the fundamental building block
for well everything we think of as
modern AI. It's the DNA inside models
like GPT4 and Gemini. It unlocked these
weird and powerful emergent
capabilities, these unexpected skills
that just kind of pop up when the model
gets complex enough. And of course, this
drove billions in investment and made
the transformer the unquestioned default
way to build AI. And its success was
built on this really simple, powerful
idea. The bigger and deeper you build
the model, the smarter it gets. But what
if that core idea has a hidden critical
flaw? And that brings us to act two, the
conflict. This is where the cracks start
to appear in the king's armor and not
from the outside but from deep within
the math of the model itself. So the
central promise was always that stacking
more and more layers, you know, making
the model deeper would make it way more
powerful. But some really recent
research is flipping that whole idea on
its head, showing that after a certain
point deeper actually becomes weaker. A
recent paper with the very provocative
title attention is not all you need.
They identified a fundamental problem
with this deep stacking approach. They
call it rank collapse. And honestly, the
best way to think of it is like a game
of telephone. You know how the first
message is perfectly clear, but as it
passes from person to person, or in this
case from layer to layer, it gets all
distorted and simplified. By the time
that information travels through a ton
of layers, the message gets so garbled
that the signal is just lost in the
noise. the deepest parts of the network,
they basically stop learning anything
new. Their output becomes no better than
just a random guess. And check this out.
The data actually proves it. Researchers
tested how well information paths of
different links performed. And as you
can see, these short paths crossing just
one or two layers, super effective. But
look at the long paths, the ones
crossing six layers, the accuracy just
plummets down to a level that is barely
better than a coin flip. I mean, this is
a massive finding. It suggests there's a
hard mathematical limit to that bigger
is better approach. But this technical
flaw, it's only half the story. The
other problem, the one that really
frustrates Lion Jones is a human one.
Which brings us right back to the
inventor Leon Jones and why he basically
thinks the entire field has lost its
way. He argues that this flood of money
and talent into AI, it hasn't sparked
creativity, it's actually killed it. The
immense pressure from investors who are
demanding quick returns has forced
everyone to just double down on the one
thing they know works, the transformer.
And that's stifled any real fundamental
innovation. And he is not alone in
feeling this way. You go online, you
look at discussions among researchers
and they're all echoing this exact same
thing. You see phrases like the myopic
vision of industry or that it's become a
race to the bottom focused on a shinier
product, not a smarter model. One
developer put it perfectly. He said,
"The field feels stuck, just fine-tuning
the same 2017 paper like it's the
Bible." And this right here illustrates
the problem perfectly. The entire focus
of the research community isn't on
inventing something new. It's on finding
more efficient ways to patch the old
model. They're trying to fix the engine,
not design a whole new one. So, this is
the critical question, right? If the
king is flawed and the kingdom is afraid
of change, where does the next
revolution even come from? Well, this
leads us to act three, a potential new
path forward, one that learns from the
mistakes of the past. So, frustrated by
all this stagnation, some researchers
are now looking for inspiration in the
most complex and efficient learning
machine we know of, the human brain.
Okay, so one of the most promising new
ideas is called nested learning. Instead
of treating an AI as one giant single
network, it reimagines it as a system of
smaller interconnected modules. And each
module learns at a different speed, kind
of mimicking how our brains turn
short-term experiences into long-term
knowledge. [snorts] And by doing that,
it tries to solve a huge problem in AI
called catastrophic forgetting. Now,
here's a comparison that really shows
the difference. Current models, they're
static. Their knowledge is frozen after
training. And when they learn a new
task, they often forget the old ones.
Nested learning, on the other hand, aims
to create systems that can learn
continuously, consolidating memories and
building a real spectrum of short and
long-term knowledge, just like we do.
And to prove this isn't just some
theory, researchers actually went and
built a whole new architecture from the
ground up based on these ideas. They
call it hope. And the results are
incredibly promising. When they tested
it on a bunch of common sense reasoning
tasks, the hope architecture
consistently and significantly beat a
standard transformer model of about the
same size. This is a fundamental shift.
It's the difference between building
static tools that we have to constantly
throw away and replace and creating
truly dynamic systems that can adapt,
evolve, and improve all on their own
over time. So, what are the key
takeaways here? Well, first, the
transformer, the king of modern AI, has
real mathematical limits. Second, the
industry's obsession with it has created
a research bottleneck. And third, these
new ideas inspired by neuroscience, like
nested learning, are showing a potential
way out. The era of just making AI
bigger might be ending. And the era of
making it smarter in totally new ways
could be just beginning. This shift from
just scaling up to truly scaling smarter
could very well be the thing that
defines the next decade of artificial
intelligence. Thanks for joining us for
this explainer.
Resume
Read
file updated 2026-02-12 02:44:58 UTC
Categories
Manage