Why the Future of LLMs is 8x Faster & Smarter | Deep Dive into SSMs & Nested Learning

ehNWO8v4CG0 • 2025-12-02

FoundationModelsForRobotics YouTube Transcript

Transcript preview

Open

Kind: captions
Language: en
Welcome to the explainer. You know that
engine that powers pretty much all of
modern AI? Well, it's starting to
sputter. Today, we're going to look at
that foundational tech, the cracks that
are starting to show and what might just
might come next. Wow. Okay, so this is a
pretty shocking statement, right? It
comes from Leon Jones, one of the
co-authors of that game-changing 2017
paper, Attention Is All You Need. You
know, the paper that literally invented
the transformer and kicked off this
entire AI boom. So yeah, let's dive into
this. To really get his frustration and
what it means for the future of AI, we
kind of have to understand the
incredible worldchanging impact his
invention had in the first place. So
let's start with act one of our story.
The undisputed dominance of this one
architecture born from that 2017 paper,
an architecture that just completely
took over the entire field. I mean
before the transformer progress in AI
was slow incremental. But after it was
exponential. This was the breakthrough
that made things like large language
models LLM even possible. It really
became the fundamental building block
for well everything we think of as
modern AI. It's the DNA inside models
like GPT4 and Gemini. It unlocked these
weird and powerful emergent
capabilities, these unexpected skills
that just kind of pop up when the model
gets complex enough. And of course, this
drove billions in investment and made
the transformer the unquestioned default
way to build AI. And its success was
built on this really simple, powerful
idea. The bigger and deeper you build
the model, the smarter it gets. But what
if that core idea has a hidden critical
flaw? And that brings us to act two, the
conflict. This is where the cracks start
to appear in the king's armor and not
from the outside but from deep within
the math of the model itself. So the
central promise was always that stacking
more and more layers, you know, making
the model deeper would make it way more
powerful. But some really recent
research is flipping that whole idea on
its head, showing that after a certain
point deeper actually becomes weaker. A
recent paper with the very provocative
title attention is not all you need.
They identified a fundamental problem
with this deep stacking approach. They
call it rank collapse. And honestly, the
best way to think of it is like a game
of telephone. You know how the first
message is perfectly clear, but as it
passes from person to person, or in this
case from layer to layer, it gets all
distorted and simplified. By the time
that information travels through a ton
of layers, the message gets so garbled
that the signal is just lost in the
noise. the deepest parts of the network,
they basically stop learning anything
new. Their output becomes no better than
just a random guess. And check this out.
The data actually proves it. Researchers
tested how well information paths of
different links performed. And as you
can see, these short paths crossing just
one or two layers, super effective. But
look at the long paths, the ones
crossing six layers, the accuracy just
plummets down to a level that is barely
better than a coin flip. I mean, this is
a massive finding. It suggests there's a
hard mathematical limit to that bigger
is better approach. But this technical
flaw, it's only half the story. The
other problem, the one that really
frustrates Lion Jones is a human one.
Which brings us right back to the
inventor Leon Jones and why he basically
thinks the entire field has lost its
way. He argues that this flood of money
and talent into AI, it hasn't sparked
creativity, it's actually killed it. The
immense pressure from investors who are
demanding quick returns has forced
everyone to just double down on the one
thing they know works, the transformer.
And that's stifled any real fundamental
innovation. And he is not alone in
feeling this way. You go online, you
look at discussions among researchers
and they're all echoing this exact same
thing. You see phrases like the myopic
vision of industry or that it's become a
race to the bottom focused on a shinier
product, not a smarter model. One
developer put it perfectly. He said,
"The field feels stuck, just fine-tuning
the same 2017 paper like it's the
Bible." And this right here illustrates
the problem perfectly. The entire focus
of the research community isn't on
inventing something new. It's on finding
more efficient ways to patch the old
model. They're trying to fix the engine,
not design a whole new one. So, this is
the critical question, right? If the
king is flawed and the kingdom is afraid
of change, where does the next
revolution even come from? Well, this
leads us to act three, a potential new
path forward, one that learns from the
mistakes of the past. So, frustrated by
all this stagnation, some researchers
are now looking for inspiration in the
most complex and efficient learning
machine we know of, the human brain.
Okay, so one of the most promising new
ideas is called nested learning. Instead
of treating an AI as one giant single
network, it reimagines it as a system of
smaller interconnected modules. And each
module learns at a different speed, kind
of mimicking how our brains turn
short-term experiences into long-term
knowledge. [snorts] And by doing that,
it tries to solve a huge problem in AI
called catastrophic forgetting. Now,
here's a comparison that really shows
the difference. Current models, they're
static. Their knowledge is frozen after
training. And when they learn a new
task, they often forget the old ones.
Nested learning, on the other hand, aims
to create systems that can learn
continuously, consolidating memories and
building a real spectrum of short and
long-term knowledge, just like we do.
And to prove this isn't just some
theory, researchers actually went and
built a whole new architecture from the
ground up based on these ideas. They
call it hope. And the results are
incredibly promising. When they tested
it on a bunch of common sense reasoning
tasks, the hope architecture
consistently and significantly beat a
standard transformer model of about the
same size. This is a fundamental shift.
It's the difference between building
static tools that we have to constantly
throw away and replace and creating
truly dynamic systems that can adapt,
evolve, and improve all on their own
over time. So, what are the key
takeaways here? Well, first, the
transformer, the king of modern AI, has
real mathematical limits. Second, the
industry's obsession with it has created
a research bottleneck. And third, these
new ideas inspired by neuroscience, like
nested learning, are showing a potential
way out. The era of just making AI
bigger might be ending. And the era of
making it smarter in totally new ways
could be just beginning. This shift from
just scaling up to truly scaling smarter
could very well be the thing that
defines the next decade of artificial
intelligence. Thanks for joining us for
this explainer.

Resume

Berikut adalah ringkasan profesional dari transkrip yang diberikan:

### **Krisis Arsitektur Transformer: Batas Kemajuan AI dan Jalan Menuju Masa Depan**

**Inti Sari (Executive Summary)**
Video ini membahas tentang stagnasi yang mulai terjadi dalam perkembangan kecerdasan buatan (AI) modern yang selama ini bergantung pada arsitektur Transformer. Meskipun telah mendorong kemajuan pesat selama bertahun-tahun, arsitektur ini mulai menunjukkan cacat matematika yang mendasar dan hambatan kreatif di industri. Solusi potensial yang sedang dikembangkan adalah beralih ke pendekatan yang meniru cara kerja otak manusia melalui *nested learning*.

**Poin-Poin Kunci (Key Takeaways)**
*   **Dominasi Transformer:** Paper "Attention Is All You Need" (2017) menjadi fondasi AI modern (GPT-4, Gemini), dengan filosofi bahwa model yang lebih besar dan lebih dalam akan lebih cerdas.
*   **Cacat Matematika:** Fenomena "rank collapse" menyebabkan lapisan-lapisan dalam AI berhenti belajar dan menebak secara acak, mirip seperti permainan telepon yang rusak.
*   **Batas Skala:** Terdapat batas keras matematis di mana menambah kedalaman model tidak lagi meningkatkan akurasi; jalur data yang panjang (6+ lapisan) justru menurunkan performa drastis.
*   **Stagnasi Industri:** Llion Jones (penulis co-founder Transformer) menyatakan bahwa tekanan investor dan obsesi industri terhadap Transformer telah membunuh kreativitas dan inovasi baru.
*   **Solusi Neurosains:** Konsep "Nested Learning" menawarkan alternatif dengan memecah AI menjadi modul-modul kecil yang saling terhubung, meniru memori jangka pendek dan panjang manusia.
*   **Arsitektur "Hope":** Model baru bernama "Hope" yang dibangun dengan konsep *nested learning* terbukti secara konsisten mengalahkan Transformer standar dalam tugas penalaran akal sehat.

**Rincian Materi (Detailed Breakdown)**

**1. Dominasi dan Kejayaan Arsitektur Transformer**
*   Arsitektur Transformer diperkenalkan melalui paper tahun 2017 dan menjadi mesin utama di balik ledakan AI.
*   Teknologi ini memungkinkan penciptaan model bahasa besar (LLM) seperti GPT-4 dan Gemini.
*   Prinsip utama industri selama ini adalah penskalaan (*scaling*): keyakinan bahwa membuat model yang lebih besar dan lebih dalam secara otomatis akan membuatnya lebih pintar.

**2. Keretakan: Masalah Matematika dan "Rank Collapse"**
*   Sebuah paper baru berjudul "Attention is not all you need" mengidentifikasi cacat fatal bernama *rank collapse*.
*   **Analogi:** Sama seperti permainan telepon berantai, informasi yang melewati terlalu banyak lapisan menjadi terdistorsi.
*   **Fakta:**
    *   Jalur data pendek (1-2 lapisan) masih efektif.
    *   Jalur data panjang (6+ lapisan) mengalami penurunan akurasi hingga level tebakan acak (seperti melempar koin).
*   Hal ini menunjukkan bahwa ada batas matematis pada efektivitas Transformer, di mana penambahan ukuran tidak lagi memberikan manfaat.

**3. Krisis Inovasi di Industri AI**
*   Llion Jones, salah satu penulis asli paper Transformer, menyuarakan frustrasi terhadap arah industri saat ini.
*   Masalah utama bukan hanya teknis, tetapi juga manusiawi:
    *   Masuknya uang dan bakat dalam jumlah besar justru membunuh kreativitas.
    *   Tekanan investor memaksa perusahaan untuk menggandakan investasi pada Transformer yang sudah ada, alih-alih menciptakan arsitektur baru.
    *   Para peneliti menggambarkan situasi ini sebagai "visi yang sempit" dan hanya "memperbaiki" paper tahun 2017 seolah-olah itu adalah kitab suci.

**4. Jalan Keluar: Nested Learning dan Arsitektur "Hope"**
*   Para peneliti kini melihat ke otak manusia untuk inspirasi baru guna mengatasi kebuntuan ini.
*   **Konsep Nested Learning:**
    *   Membayangkan ulang AI sebagai kumpulan modul-modul kecil yang saling terhubung.
    *   Setiap modul belajar pada kecepatan yang berbeda, meniru cara otak memproses memori jangka pendek dan panjang.
    *   Metode ini menyelesaikan masalah "lupa bencana" (*catastrophic forgetting*), di mana model AI saat ini bersifat statis dan melupakan tugas lama saat mempelajari hal baru.
*   **Implementasi "Hope":**
    *   Sebuah arsitektur baru bernama "Hope" dibangun dari nol menggunakan prinsip *nested learning*.
    *   Hasilnya menunjukkan bahwa model "Hope" secara konsisten mampu mengalahkan model Transformer standar dengan ukuran yang sama dalam uji penalaran akal sehat.

**Kesimpulan & Pesan Penutup**
Meskipun Transformer telah menjadi fondasi AI modern, kini telah tercapai titik jenuh baik dari sisi matematis maupun inovasi industri. Masa depan AI tidak lagi terletak pada "membuatnya lebih besar" (*scaling up*), melainkan pada "membuatnya lebih cerdas" (*scaling smarter*) melalui arsitektur baru yang terinspirasi oleh cara kerja otak manusia, seperti yang ditunjukkan oleh keberhasilan model "Hope".

Read

file updated 2026-02-12 02:44:58 UTC