Transcript
0w3PWzm9Jx4 • Forget LLMs: MIT's New Recursive Language Model (RLM) Explained
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/FoundationModelsForRobotics/.shards/text-0001.zst#text/0063_0w3PWzm9Jx4.txt
Kind: captions
Language: en
You ever feel that weird paradox with AI
right now? I mean, we're getting these
huge million token context windows,
which is amazing. But then the second
you ask the model to do any real
thinking over all that data, it just
falls apart. It's kind of like having
this massive city-siz library, but the
librarian can only remember the last
sentence they just read. Well, today
we're going to tear into a
groundbreaking paper from MIT that
offers a solution. And it's not just
about building a bigger library. It's
about getting a way, way smarter
librarian. This could seriously be the
end of the context window crisis for
good. Okay, so here's our road map for
this deep dive. First up, we're going to
really diagnose the problem, what I'm
calling the LLM reasoning crisis. Then
we'll get into MIT's pretty radical
solution, the recursive language model
or RLM. After that, we'll pop the hood
and see how this thing actually works
step by step. Then comes the good stuff,
the proof, the performance, the price.
We'll also check out what the community
is saying, the good and the critical.
And finally, we'll zoom out and talk
about what this means for the whole
future of AI. All right, let's start
with a big problem. There's this
phenomenon that researchers are starting
to call context rot. And the idea is
pretty simple, but also kind of scary.
As you pour more and more information
into a large language model, its ability
to actually reason about that
information doesn't just get a little
bit worse. It can totally completely
collapse. And this isn't some tiny issue
with small models. Nope, it's happening
to the best of the best. Even the kinds
of top tier systems that the researchers
in the paper call GPT5. So this right
here, this is the marketing slide. This
is the picture we all get sold on. It's
the classic needle in a hay stack test.
The job is super simple. Just find one
specific fact inside a giant mountain of
text. And no surprise, a powerful model
nails it. A perfect score even with a
massive 272,000 tokens. Looks
incredible, right? It suggests the model
has perfect memory. But this is where
the story starts to get a lot more
interesting. Because here's a reality
check. The moment we switch from just
finding something to actually reasoning
about it, the whole picture just falls
apart. This chart shows a much harder
test. One that asks the model to find
all pairs of things that share a certain
quality. A task that needs a way deeper
understanding. And look what happens. At
just 16,000 tokens, which is a fraction
of its max, performance is already
struggling. But by 33,000 tokens, it
drops to almost zero. The model
completely and utterly fails. This right
here is context rot in action. And the
really crucial thing to understand is
that this isn't some gradual gentle
decline. The MIT researchers use a very
specific term, a phase transition. You
should think of it like water hitting
its freezing point and suddenly turning
to solid ice. It's a sudden catastrophic
failure point where the model's core
attention mechanism just breaks. It
literally can't keep its thought
straight anymore. This is the proof that
just making the box bigger, the context
window, doesn't fix the fundamental
problem with the architecture. And that
brings us to MIT solution. So instead of
trying to build an even bigger, stronger
box to cram all the information into,
they just decided to get out of the box
completely. This isn't just a small
tweak or an improvement. It's a totally
different way of thinking about how AI
should interact with huge amounts of
information. It's a paradigm shift.
Yeah, this slide lays it out perfectly.
On the left, that's the old way, the
brute force method. Just keep stuffing
more and more text into the models
context window and hope for the best.
But on the right, that's the new
recursive language model way. And here,
the prompt isn't jammed inside the model
at all. Instead, it's treated like a
file on your computer's hard drive. It's
an external environment that the LLM can
look at, analyze, and operate on using
code. What they've built here is a
textbook example of a neurosymbolic
system. And that just means it's a
hybrid, right? It takes the best of both
worlds. It lets the neural network, the
LLM do all the stuff it's great at, you
know, the fuzzy things like
understanding nuance, generating
creative summaries, and writing code.
And then it hands off all the rigid
logical tasks like counting things,
looping through data a thousand times,
and perfect recall to a symbolic system.
In this case, that's just a simple
Python interpreter. It's all about using
the right tool for the right job. The
researchers use this brilliant analogy
that really makes the whole thing click.
So, picture the entire long prompt, all
that text, as a giant file on your
computer's hard drive. The main LLM, the
root model, is like the operating
system, like Windows or Mac OS. Its job
isn't to read the whole file itself. Its
job is to manage access to it. The
model's actual tiny context window,
that's your computer's limited RAM. It
can only hold a little piece of the file
at a time. And the recursive calls, what
are those? They're like multi-threaded
child processes. When the OS finds a
tricky part of the file, it just spawns
a whole new specialized program to
handle just that one piece and report
back what it finds. Okay, so that's the
big idea. But how does an RLM actually
handle a request? Let's walk through the
whole execution step by step because
it's actually incredibly slick. The
whole thing really boils down to four
main phases. First, the model probes the
data. It writes a little bit of code to
see what it's dealing with. Is this a
text file, a spreadsheet? Then it moves
into decomposition. Here, it writes a
program, usually a simple for loop to
break the big problem down into smaller,
manageable pieces. Third is the magic
step, recursion. Inside that loop, it
starts spawning brand new independent
subl to chew on each little chunk. And
finally, there's aggregation. The main
model acts like a manager, collecting
all the answers from its little workers
and putting them all together into one
final polished answer. Let's make this
real with an example straight from the
paper. The mission is to find the name
of a beauty pageant winner from a
certain year and that name is buried
somewhere in a collection of 10,000
documents. I mean, this is a classic
multi-step reasoning problem that a
standard LLM would have almost no chance
on if you just tried to cram all 10,000
documents into its prompt. So, what's
the first thing the RLM does? It doesn't
start reading, it starts coding. The
root LLM acts just like a human
programmer. It writes a tiny bit of
Python code using a regular expression.
It's basically telling the computer,
"Hey, don't read this whole 10,000
document library. Just do a super fast
scan and tell me all the places where
you see the words beauty pageant or
festival." It's using a sharp tool to
find the interesting spots without
wasting time reading everything. So
after it runs that search, the computer
reports back and says, "Okay, I found
something that looks interesting over
here in chunk number six." Now, does the
root LLM read it? No. It does the most
important thing. It makes a recursive
call. It uses this special LLM query
function to spin up a totally new,
totally fresh sublm with a clean slate.
It gives just that one little snippet of
text to this new sublm and says, "Hey,
you analyze just this piece and tell me
what's in it." So, the sub LLM does its
job, finds the winner's name, and
reports back to the main model. But, and
this is the really cool part, the main
model doesn't just blindly trust it. It
acts like a good manager. Before it
declares victory, it decides to doublech
checkck the work. It makes two more
recursive calls asking other fresh sub
LLMs, hey, can you look at this document
and confirm that this person actually
won? It's only after it gets that
confirmation that it aggregates
everything together and gives you the
final verified answer. It's a whole
process of delegating, verifying, and
then synthesizing. It's beautiful. All
right, we've gone through the theory and
the step by step. Now for the moment of
truth. Does this whole elegant process
actually perform any better in the real
world? Well, the numbers aren't just
good, they are absolutely jaw-dropping.
Just look at this slide. This is that
same brutal reasoning test from before.
The one where the standard model totally
bombed with a success rate of basically
zero for about 16 cents a try. Now look
at the RLM wrapped version. The success
rate skyrockets from 0.04%
all the way up to 58%. It goes from
complete and total failure to getting
the answer right more often than not.
And the price it only doubles to 33. For
just a few extra pennies, you get a
performance boost of over 1,400 times. I
mean, that is just an insane return on
investment. A jump in performance this
big isn't just an improvement. It's a
fundamental gamecher. It signals a major
shift in how we're going to build and
use AI from now on. And if you want to
stay ahead of these kinds of massive
shifts, this is a great time to make
sure you're subscribed to the channel.
The RLM method also makes it finally
affordable to reason over context sizes
we could only dream of before. This
chart shows the average cost for a query
on a 1 million token context. Simple
stuff like finding a needle in a
haystack is super cheap, like 10 cents.
As the reasoning gets more complex, the
cost goes up because it has to make more
of those recursive calls. So maybe a
dollar for a linear task, 250 for a
quadratic one. But here's the kicker.
These tasks are now actually possible on
a million tokens, and the costs are
predictable and manageable. This just
blows the doors open for entirely new
kinds of applications. Of course, with
any big new paper, there's going to be
some push back. When this research
dropped, the developer community was
buzzing. A mix of real excitement, but
also a healthy dose of skepticism. So,
let's be balanced here and look at some
of the critiques that popped up on
places like Hacker News. So, a few key
arguments kept coming up. A lot of
people said, "Hey, wait a minute. This
just sounds like the AI agent frameworks
we've been playing with for a while."
Others got into the weeds on the
terminology, arguing that a simple loop
that calls a function isn't technically
the same as true recursion. A really
sharp point was that in the paper's own
examples, the recursive depth was only
one. The main model called sublms, but
those sub LLM never called more subl.
And finally, folks pointed out that it
looked a lot like earlier research on
models like Viper GPT. And you know
what? These are all really fair points.
I think this quote from one of the
commenters on Hacker News just sums up
the feeling perfectly. The sentiment was
basically that while the results are
amazing, the core ideas, having an AI
use tools, breaking problems down, these
aren't brand new. So maybe the best way
to see this is as a really elegant,
powerful, and well- tested
implementation of existing concepts
finally proven on modern powerful
models. Okay, so even with those
critiques, why is this RLM approach such
a big deal? Well, I think it's because
this is way more than just a clever
trick to get better benchmarks. It's
pointing towards a massive architectural
shift in how we're going to build AI in
the future. Let's zoom out and look at
the real big picture here. What this RLM
framework really gives us is a new super
clear division of labor. The neural
network or LLM isn't the entire brain
anymore. It's more like the creative
director, the architect. It handles all
the fuzzy stuff, intuition, getting the
gist of a text, and most importantly,
generating the plan, the code, and the
symbolic system, the Python part,
becomes the construction crew. It
handles all the rigid black and white
logic, counting things perfectly,
iterating through a list a million
times, and remembering things with
perfect precision. The LLM dreams up the
plan, and the code makes it happen. This
also highlights a huge difference
between RLMs and the retrieval augmented
generation or rag systems that are
everywhere right now. Rag is at its
heart a game of probability. It uses
vector search to hope it finds the right
data to answer your question. RLM is the
opposite. It's deterministic. By writing
and executing a loop, it follows a
precise program that guarantees it will
look at 100% of the data. It's not
hoping to find the right answer. It's
systematically working through a plan to
ensure it does. That's a giant leap
forward in terms of reliability. And all
of this leads to a pretty profound
conclusion. You know, for years, the
mantra in AI from the original
transformer paper has been attention is
all you need. What this work from MIT
shows us is a powerful update to that
idea. Attention is amazing for
understanding things at a local level,
for making sense of a sentence or a
paragraph. But for complex global
reasoning across a huge data set, it's
not enough. Attention isn't all you
need. You also need an architect. This
shift from thinking about AI as a single
giant brain to thinking about it as this
neuros symbolic team with an architect
and a crew. It's one of the most
exciting things happening in the field
right now. We're covering this future as
it unfolds, breaking down the research
that's going to define the next
generation of tech. So, if you want to
understand where all of this is headed,
make sure you subscribe so you don't
miss the next big thing. Thanks for
watching.