TWIST2: Scalable, Portable, Mocap-Free Humanoid Data Collection and Whole-Body Control (Unitree G1)
1L6ffIBrvHk • 2025-12-04
Transcript preview
Open
Kind: captions
Language: en
Okay, let's be real. Have you ever been
wrestling with a fitted sheet and just
thought, "Man, there has to be a better
way." Well, that simple frustration
actually gets us to a really big
question in robotics. Seriously, why
can't a robot fold your laundry yet? I
mean, it seems like it should be simple,
right? But for a robot, it's
unbelievably complex. And you know, the
main reason we don't have robot butlers
buzzing around our homes boils down to
one huge thing, a massive data problem.
See, teaching a humanoid robot to move
and interact just like a person, but out
in the messy real world, that's been
practically impossible to do at any kind
of scale. Well, until now, that is. All
right, so let's break this down. For a
long, long time, robotics researchers
were stuck with this really frustrating
trade-off. On one hand, you had the
mocap lab. Think Hollywood special
effects, right? You get this incredibly
precise full body data, but it's crazy
expensive, super complex, and it's
totally stuck in one room. On the other
hand, you've got a portable VR setup.
This is way more affordable and you can
take it literally anywhere. The catch,
it usually only gives you partial
control, like just the arms and the
head. The legs kind of just follow along
with basic commands. So, you were forced
to choose amazing highquality data
that's stuck in a lab or kind of
mediocre lowquality data that you can
actually take out into the wild. And
that choice, it created this massive
bottleneck. I mean, think about it.
We've seen huge breakthroughs in pretty
much every other corner of AI, you know,
with things like language models and
image generators, and it's all been
fueled by massive amounts of data. But
for humanoid robots, that data
revolution just never happened. There
was just no good way to get enough of
that highquality realworld data to teach
them to be genuinely useful. So yeah,
the bottom line is that all the old
systems had to make some pretty major
compromises. You had what's called
decoupled control, which is kind of
wild. It's where you might have one
person controlling the robot's arms and
a completely different person driving
the legs. Then there was partial
control, where the legs are just
following these super basic speed
commands. That's not how we move at all.
The only way to get that true full body
control was to go back to those giant,
expensive, and totally non-portable
mocap labs. But what if what if you
didn't have to choose? What if you could
get the best of both worlds? Well, that
is exactly what this new system called
Twist 2 does. It's a breakthrough that
basically shatters that old trade-off.
And you can sum it up in just three
simple words. First up, it is portable.
The whole setup is designed to get out
of the lab and into the real world, an
office, your house, literally anywhere.
Next, it's scalable. This thing is built
from the ground up for efficient,
massive data collection. The idea is
that tons of different people can
contribute data, which is exactly what's
needed to solve that bottleneck we were
talking about. And finally, and this is
really the magic ingredient, it's
holistic. Twist 2 gives you full unified
whole body control. It's capturing all
the tiny, subtle, coordinated movements
of a person from their feet right up to
their head. Okay, so how on earth does
this actually work? Let's pop the hood
and see what kind of hardware and
software makes Twist 2 tick. What's so
cool about this is how simple and
accessible the parts are. We're talking
about a regular off-the-shelf VR headset
and just two little motion trackers you
strap to your calves. That's it for the
human side. Then on the robot, they've
added a custom-designed neck that can
move up and down and side to side, which
is so important for giving it that
active human-like vision. Then you've
got the software, which is the brain of
the operation, translating your
movements to the robot. And finally, a
smart AI controller. It's called a
reinforcement learning policy. Make sure
the robot carries out all those moves
smoothly and without falling over. And
the whole process is just really
elegant. It's so simple. A person just
puts on the VR gear and starts doing the
task. That's it. In real time, the
software is watching every single move
you make, walking, bending over,
reaching for something, and translating
it all into commands for the robot. The
robot copies you. And the whole time
this is happening, the system is
recording everything from the robot's
perspective, creating this perfect
highquality data that can be used later
to train a fully autonomous AI. And get
this, that custom piece of hardware, the
little mech module that makes all that
crucial Activision possible, it costs
about 250 bucks to build. That's it.
That incredibly low cost is what blows
this whole thing wide open. It's the key
that unlocks this technology for
researchers everywhere. And it really
truly democratizes the entire field. All
right, so that's the tech, but what can
you actually do with it? Let's check out
some of the real world results because
they are pretty awesome. So, here's that
scalable idea in action. Look at these
numbers. In just 18 and 12 minutes, one
single person collected 98 successful
demos of a two-handed task. 98. For a
tougher mobile task, they still got 46
demos in less than 20 minutes. And look
at that last column. 100% success rate.
Just wow. This is a game-changing pace
for collecting highquality data for
humanoids. And that incredible
efficiency means the robot can now
perform these really complex long-term
tasks. Things that need both delicate
hand movements and the ability to move
around. We're talking about folding
multiple towels in a row, which needs
that precise pinching and whole body
movement, or grabbing baskets, walking
through a dorm with them, and setting
them down. It can even do dynamic stuff
like kicking a soccer ball. This user
study is fascinating because it shows
just how much every single piece of the
system matters. Okay, so look at that
first bar on the left. With the full
Twist 2 system, it took people about 68
seconds to collect 10 demos. Not bad.
But now look what happens when you take
away the stereo vision. The time jumps
up to 98 seconds. And if you take away
that active neck module, it takes over
112 seconds. This chart is perfect proof
that those design choices are absolutely
critical for making the system easy and
fast to use. But you know, the impact of
Tibus2 is actually much bigger than just
what this one robot can do. It's really
about empowering the entire research
community. And this quote from the
project just says it all. Humanoid data
is better when universally sharable. I
love that. Their goal isn't just to
build one cool system. It's to create a
foundation that everyone else can build
on top of. And they're really putting
that philosophy into practice with a few
key principles. First, the idea that no
data set is too small. Every little bit
helps. Second, by getting everyone to
use the same standardized affordable
hardware, the whole community can move
forward faster together. And finally,
using a single unified data format means
that an AI model trained by one lab can
easily be used and improved by another.
It's literally creating a rising tide
that lifts all boats in the world of
robotics. So, this brings us all the way
back to the beginning. For the very
first time, we have a system that is
portable, scalable, and holistic. A way
to finally collect the data we need to
train truly capable humanoid robots.
That bottleneck we talked about, it's
been broken. And that leaves us with one
final and really exciting question to
think about. Now that pretty much anyone
can teach a robot, what's the first
thing we should teach them to do?
Resume
Read
file updated 2026-02-12 02:44:52 UTC
Categories
Manage