Transcript
PFPMaT7gOKw • VLA + RL: The Breakthrough Combining Vision-Language Action Models with Reinforcement Learning
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/FoundationModelsForRobotics/.shards/text-0001.zst#text/0030_PFPMaT7gOKw.txt
Kind: captions
Language: en
Have you ever seen a video of a robot
trying to do something simple and it
just fails in the most ridiculous way?
Well, it turns out that teaching a robot
is way harder than just showing it what
to do. Today, we're going to dive into
some amazing new AI that lets robots
stop just copying us and actually start
learning from their own experiences.
So, let's just start with this one big
question. What if a robot could actually
learn from its own mistakes? you know,
the way we do, not just following some
perfect pre-programmed script, but
figuring things out when they go a
little sideways. Because that one single
idea, well, it's changing everything in
robotics.
Okay, to really get how this works, we
first need to meet the hero of our
story. It's a new kind of AI called a
vision language action model, or just
VLA for short. So, what is a VA? Well,
it basically combines three superpowers
into one brain. You've got vision so it
can see the world around it. You've got
language so it can understand a command
like, "Hey, pick up the red apple." And
then most importantly, you've got
action, which translates all that
understanding into the robot's actual
physical movements. So where does all
this smarts come from? It's not some
programmer writing endless lines of
code. Nope. These models get their start
by learning from just massive amounts of
data from the internet, billions of
images and text pairings. And this gives
them something kind of like common
sense. It lets them understand concepts
and ideas they were never specifically
trained on in the lab. And this chart
really shows you what a huge leap this
is. Google's RT2, which is a VLA model,
was tested on a bunch of tasks it had
never seen before. It succeeded 62% of
the time. That's nearly double the
success of its predecessor, RT1, which
didn't have all that rich internet
pre-training. But even with all this
power, there's a pretty serious catch.
And that brings us to this huge problem
in robotics that researchers call the
imitation trap. It turns out that just
learning to copy what a human does,
well, it has a major major flaw. To
really get this, you have to think about
how these robots learn. They study a
very specific set of perfect examples,
which you can think of as its training
data. It's like their perfect little
classroom world. This quote really nails
the problem. The second that robot makes
one tiny mistake in the real world, it's
suddenly in a situation it's never seen
before. It's outside the training
distribution, and that is when
everything starts to fall apart. You
know, think of it like a chef who's only
ever cooked in a perfect TV studio
kitchen. They can follow a recipe to the
letter, but the moment you put them in a
real messy kitchen where the lighting is
weird or an ingredient is just a little
different, they totally freeze up. The
robot is exactly the same way. It's just
too brittle to handle the chaos of the
real world. And what happens is this
cascade of errors. It's like a domino
effect. The robot makes one tiny
mistake. Maybe its script is off by a
millimeter. Suddenly, the world looks
unfamiliar that makes its next move even
more likely to be wrong. And pretty
soon, all these little errors just pile
up until the robot fails the task
completely. That right there is the
imitation trap. So, how in the world do
you get a robot out of this trap? Well,
the answer is a totally different way of
learning. One that isn't just about
copying, but is about actually gaining
experience. And this is where
reinforcement learning swoops in to save
the day. Reinforcement learning or RL is
pretty much what it sounds like. It is
learning by doing. Just think about how
you learned to ride a bike. Nobody gave
you a perfect manual, right? You just
you tried, you wobbled, you probably
fell down a few times, but you got a
little reward in your brain for every
second you stayed upright. RL gives
robots that exact same ability to learn
through good old-fashioned trial and
error. And when you put these two side
by side, the difference is just night
and day. Imitation learning is all about
the how. Do exactly what I do. But
reinforcement learning is all about the
what. Here's the goal. You figure out
the best way to get there. And that
gives it this incredible power to adapt
and problem solve in a way that just
copying could never ever do. Okay, so
this is where it gets really clever. How
do you actually combine these two
incredible ideas? Well, researchers have
come up with three really brilliant
strategies for merging that huge
knowledge of VAS with the adaptive power
of reinforcement learning. And the
result is robots that are both super
knowledgeable and super resilient. So
here are the three paths they're taking.
One is to practice safely in a
simulation. The second is to learn right
on the real robot, but with a little
help from a human. And the third one is
wild. They're using RL to create
training data that's even better than
what a human could make. All right. Path
number one is all about being safe and
efficient. Instead of letting a real
super expensive robot just bang into
things, you first train what's called a
world model. It's basically a simulation
of reality. The robot can then practice
millions of times in this virtual
sandbox, learning from its mistakes with
zero real world risk. It's like a flight
simulator for robots, giving it a whole
lifetime of experience before it ever
touches a real object. Now, the second
path is called online RL, and it's all
about learning on the job. A great
example of this is a system called
Recap. The robot tries a task for real.
The moment it gets stuck, a human expert
jumps in and gives a quick correction.
This is so powerful because the AI isn't
learning from some random data set. It's
learning directly from its own actual
mistakes, which makes the process
incredibly efficient. And get this, the
results were insane. This method of
learning with real-time human
corrections literally doubled or even
tripled the number of tasks the robot
could successfully do in an hour. That's
the kind of jump you need to make these
things actually useful in the real
world. Okay, this third path is kind of
mindbending, but it's so cool. Instead
of using RL to train the robot itself,
you use an RL algorithm to generate
thousands of perfect examples of how to
do a task. And these computerenerated
examples are often way smoother and more
efficient than what a person can do. So
you're basically creating this
superhuman data to then teach the VA.
And what do you know? It works. On a
really tough benchmark with 130
different tasks, the VA that learned
from the RL generated data actually did
better than the one that learned from
real humans. The student literally
created a better teacher for itself. How
wild is that? So, what happens when you
put all of these amazing ideas together?
Well, you get robots that are finally
starting to look genuinely capable and
genuinely adaptable enough to handle our
messy, unpredictable world. And we're
not talking about simple lab demos
anymore. We're seeing robots like
physical intelligence's pi x0.6 and
mobile Aloha that can reliably do these
complex multi-step jobs. They can make
you an espresso, fold your laundry, cook
shrimp, and even call and use an
elevator to get around. I mean, this
stuff was pure science fiction just a
handful of years ago. So, at the end of
the day, here's the big takeaway. The
big picture knowledge from those vision,
language, action models gives robots
common sense. But it's the trial and
error learning from reinforcement
learning that gives them real
adaptability. And when you fuse those
two things together, that's the key
that's finally unlocking robots that can
actually function outside the lab. It
really leaves us with a pretty
mind-blowing question, doesn't it? For
decades, we have been trying to
painstakingly program robots step by
step, but now we're building robots that
can teach themselves. So when that
becomes the new normal, what problems
are they going to solve next?