TWIST2: Scalable, Portable, Mocap-Free Humanoid Data Collection and Whole-Body Control (Unitree G1)

1L6ffIBrvHk • 2025-12-04

FoundationModelsForRobotics YouTube Transcript

Transcript preview

Open

Kind: captions
Language: en
Okay, let's be real. Have you ever been
wrestling with a fitted sheet and just
thought, "Man, there has to be a better
way." Well, that simple frustration
actually gets us to a really big
question in robotics. Seriously, why
can't a robot fold your laundry yet? I
mean, it seems like it should be simple,
right? But for a robot, it's
unbelievably complex. And you know, the
main reason we don't have robot butlers
buzzing around our homes boils down to
one huge thing, a massive data problem.
See, teaching a humanoid robot to move
and interact just like a person, but out
in the messy real world, that's been
practically impossible to do at any kind
of scale. Well, until now, that is. All
right, so let's break this down. For a
long, long time, robotics researchers
were stuck with this really frustrating
trade-off. On one hand, you had the
mocap lab. Think Hollywood special
effects, right? You get this incredibly
precise full body data, but it's crazy
expensive, super complex, and it's
totally stuck in one room. On the other
hand, you've got a portable VR setup.
This is way more affordable and you can
take it literally anywhere. The catch,
it usually only gives you partial
control, like just the arms and the
head. The legs kind of just follow along
with basic commands. So, you were forced
to choose amazing highquality data
that's stuck in a lab or kind of
mediocre lowquality data that you can
actually take out into the wild. And
that choice, it created this massive
bottleneck. I mean, think about it.
We've seen huge breakthroughs in pretty
much every other corner of AI, you know,
with things like language models and
image generators, and it's all been
fueled by massive amounts of data. But
for humanoid robots, that data
revolution just never happened. There
was just no good way to get enough of
that highquality realworld data to teach
them to be genuinely useful. So yeah,
the bottom line is that all the old
systems had to make some pretty major
compromises. You had what's called
decoupled control, which is kind of
wild. It's where you might have one
person controlling the robot's arms and
a completely different person driving
the legs. Then there was partial
control, where the legs are just
following these super basic speed
commands. That's not how we move at all.
The only way to get that true full body
control was to go back to those giant,
expensive, and totally non-portable
mocap labs. But what if what if you
didn't have to choose? What if you could
get the best of both worlds? Well, that
is exactly what this new system called
Twist 2 does. It's a breakthrough that
basically shatters that old trade-off.
And you can sum it up in just three
simple words. First up, it is portable.
The whole setup is designed to get out
of the lab and into the real world, an
office, your house, literally anywhere.
Next, it's scalable. This thing is built
from the ground up for efficient,
massive data collection. The idea is
that tons of different people can
contribute data, which is exactly what's
needed to solve that bottleneck we were
talking about. And finally, and this is
really the magic ingredient, it's
holistic. Twist 2 gives you full unified
whole body control. It's capturing all
the tiny, subtle, coordinated movements
of a person from their feet right up to
their head. Okay, so how on earth does
this actually work? Let's pop the hood
and see what kind of hardware and
software makes Twist 2 tick. What's so
cool about this is how simple and
accessible the parts are. We're talking
about a regular off-the-shelf VR headset
and just two little motion trackers you
strap to your calves. That's it for the
human side. Then on the robot, they've
added a custom-designed neck that can
move up and down and side to side, which
is so important for giving it that
active human-like vision. Then you've
got the software, which is the brain of
the operation, translating your
movements to the robot. And finally, a
smart AI controller. It's called a
reinforcement learning policy. Make sure
the robot carries out all those moves
smoothly and without falling over. And
the whole process is just really
elegant. It's so simple. A person just
puts on the VR gear and starts doing the
task. That's it. In real time, the
software is watching every single move
you make, walking, bending over,
reaching for something, and translating
it all into commands for the robot. The
robot copies you. And the whole time
this is happening, the system is
recording everything from the robot's
perspective, creating this perfect
highquality data that can be used later
to train a fully autonomous AI. And get
this, that custom piece of hardware, the
little mech module that makes all that
crucial Activision possible, it costs
about 250 bucks to build. That's it.
That incredibly low cost is what blows
this whole thing wide open. It's the key
that unlocks this technology for
researchers everywhere. And it really
truly democratizes the entire field. All
right, so that's the tech, but what can
you actually do with it? Let's check out
some of the real world results because
they are pretty awesome. So, here's that
scalable idea in action. Look at these
numbers. In just 18 and 12 minutes, one
single person collected 98 successful
demos of a two-handed task. 98. For a
tougher mobile task, they still got 46
demos in less than 20 minutes. And look
at that last column. 100% success rate.
Just wow. This is a game-changing pace
for collecting highquality data for
humanoids. And that incredible
efficiency means the robot can now
perform these really complex long-term
tasks. Things that need both delicate
hand movements and the ability to move
around. We're talking about folding
multiple towels in a row, which needs
that precise pinching and whole body
movement, or grabbing baskets, walking
through a dorm with them, and setting
them down. It can even do dynamic stuff
like kicking a soccer ball. This user
study is fascinating because it shows
just how much every single piece of the
system matters. Okay, so look at that
first bar on the left. With the full
Twist 2 system, it took people about 68
seconds to collect 10 demos. Not bad.
But now look what happens when you take
away the stereo vision. The time jumps
up to 98 seconds. And if you take away
that active neck module, it takes over
112 seconds. This chart is perfect proof
that those design choices are absolutely
critical for making the system easy and
fast to use. But you know, the impact of
Tibus2 is actually much bigger than just
what this one robot can do. It's really
about empowering the entire research
community. And this quote from the
project just says it all. Humanoid data
is better when universally sharable. I
love that. Their goal isn't just to
build one cool system. It's to create a
foundation that everyone else can build
on top of. And they're really putting
that philosophy into practice with a few
key principles. First, the idea that no
data set is too small. Every little bit
helps. Second, by getting everyone to
use the same standardized affordable
hardware, the whole community can move
forward faster together. And finally,
using a single unified data format means
that an AI model trained by one lab can
easily be used and improved by another.
It's literally creating a rising tide
that lifts all boats in the world of
robotics. So, this brings us all the way
back to the beginning. For the very
first time, we have a system that is
portable, scalable, and holistic. A way
to finally collect the data we need to
train truly capable humanoid robots.
That bottleneck we talked about, it's
been broken. And that leaves us with one
final and really exciting question to
think about. Now that pretty much anyone
can teach a robot, what's the first
thing we should teach them to do?

Resume

Berikut adalah rangkuman komprehensif dan terstruktur dari konten transkrip yang Anda berikan:

***

# Twist 2: Solusi Revolusioner Mengatasi Masalah Data dalam Robotika melalui Teleoperasi Portabel

### Inti Sari (Executive Summary)
Video ini membahas hambatan utama dalam kemajuan kecerdasan buatan (AI) robotika, yaitu kekurangan data pelatihan berkualitas tinggi yang disebabkan oleh keterbatasan teknis pengumpulan data. Sistem **Twist 2** diperkenalkan sebagai solusi inovatif yang memecahkan *trade-off* antara kualitas dan skabilitas. Dengan menggunakan perangkat VR yang portabel dan desain hardware yang terjangkau, Twist 2 memungkinkan pengumpulan data gerak seluruh tubuh (*whole-body control*) secara massal dan efisien, membuka jalan bagi revolusi data di bidang robotika.

### Poin-Poin Kunci (Key Takeaways)
*   **Masalah Utama:** Tantangan terbesar robotika saat ini bukan pada algoritma, melainkan pada "masalah data masif" (*massive data problem*) yang diperlukan untuk melatih AI.
*   **Keterbatasan Metode Lama:** Peneliti sebelumnya terjebak pada pilihan sulit antara menggunakan *Motion Capture* (Mocap) Lab yang presisi tapi mahal/statis, atau VR portabel yang terjangkau tapi hanya memiliki kontrol terbatas (parsial).
*   **Solusi Twist 2:** Sistem ini menggabungkan keunggulan portabilitas dan kontrol penuh (*holistic control*) dari ujung kaki hingga kepala.
*   **Efisiensi Biaya:** Modul leher kustom pada robot hanya membutuhkan biaya pembuatan sekitar $250, menjadikan teknologi ini dapat diakses oleh lebih banyak peneliti (*demokratisasi*).
*   **Performa Tinggi:** Sistem ini mampu mengumpulkan data dalam jumlah besar dengan cepat (ratusan demonstrasi dalam hitungan menit) dan tingkat keberhasilan 100% pada tugas-tugas tertentu.

### Rincian Materi (Detailed Breakdown)

#### 1. Latar Belakang: Masalah Data dalam Robotika
Pembahasan diawali dengan perumpamaan kesulitan melipat seprai (*fitted sheets*) untuk menggambarkan kompleksitas tugas rumah tangga bagi robot. Meskipun AI telah berkembang pesat di bidang lain (seperti LLM), robotika tertinggal karena kekurangan data. Robot membutuhkan data skala besar untuk belajar, namun cara mendapatkan data tersebut merupakan masalah besar.

#### 2. *Trade-off* dalam Pengumpulan Data
Peneliti robotika menghadapi dilema dalam metode pengumpulan data:
*   **Mocap Lab:** Menawarkan data presisi tinggi dan kontrol penuh, namun sangat mahal, kompleks, dan membatasi pergerakan hanya dalam satu ruangan (tidak portabel).
*   **VR Portabel:** Lebih murah dan bisa digunakan di mana saja, namun hanya memberikan kontrol parsial (kepala dan tangan), sementara kaki robot hanya mengikuti perintah kecepatan sederhana.
*   **Metode Lama:** Sistem sebelumnya sering menggunakan *decoupled control* (orang berbeda mengendalikan tangan dan kaki) atau kontrol parsial yang tidak efektif untuk tugas yang membutuhkan koordinasi tubuh total.

#### 3. Pengenalan Twist 2
Twist 2 hadir untuk mengatasi jalan buntu tersebut dengan tiga pilar utama:
1.  **Portable:** Dapat digunakan di mana saja (kantor, rumah, lingkungan yang bervariasi).
2.  **Scalable:** Dirancang untuk pengumpulan data masal oleh banyak orang.
3.  **Holistic:** Menawarkan kontrol terpadu seluruh tubuh (*whole-body unified control*).

#### 4. Cara Kerja dan Komponen Teknis
Sistem Twist 2 menghubungkan gerakan manusia ke robot secara *real-time*:
*   **Sisi Manusia:** Menggunakan headset VR standar yang sudah ada di pasaran ditambah dua *motion tracker* yang dipasang di betis.
*   **Sisi Robot:** Menggunakan modul leher yang didesain khusus (bukan bagian dari robot standar) yang memungkinkan gerakan ke atas/bawah dan ke samping (*active vision*).
*   **Perangkat Lunak:** Menerjemahkan gerakan manusia menjadi perintah robot. Sebuah *AI controller* (berbasis *reinforcement learning*) memastikan gerakan robot halus dan seimbang.
*   **Proses:** Sistem merekam data dari perspektif robot sendiri saat robot meniru tindakan manusia. Data ini kemudian digunakan untuk melatih kebijakan (*policy*) agar robot dapat beroperasi secara otonom.

#### 5. Biaya dan Aksesibilitas
Salah satu pencapaian utama Twist 2 adalah aspek biaya. Modul leher kustom yang vital untuk sistem ini hanya membutuhkan biaya produksi sekitar $250. Biaya rendah ini sangat penting untuk mendemokratisasi bidang robotika, memungkinkan lebih banyak laboratorium dan peneliti untuk berkontribusi dalam pengumpulan data.

#### 6. Hasil dan Demonstrasi
Sistem Twist 2 telah menunjukkan hasil yang mengesankan dalam hal kecepatan dan kemampuan:
*   **Skalabilitas Data:** Mampu mengumpulkan 98 demonstrasi tugas dua tangan dalam waktu 12–18 menit, dan 46 demonstrasi tugas bergerak (*mobile manipulation*) dalam waktu kurang dari 20 menit.
*   **Tingkat Keberhasilan:** Mencapai tingkat keberhasilan 100% pada pengujian yang dilakukan.
*   **Kemampuan Robot:** Robot berhasil melakukan tugas kompleks seperti melipat handuk (yang membutuhkan gerakan jari yang presisi serta gerakan tubuh yang selaras) dan membawa keranjang melalui lingkungan asrama.

### Kesimpulan & Pesan Penutup
Twist 2 membuktikan bahwa hambatan data dalam robotika dapat diatasi dengan pendekatan teleoperasi yang cerdas dan efisien. Dengan biaya yang rendah dan kemampuan pengumpulan data yang masif, sistem ini tidak hanya meningkatkan performa robot dalam tugas-tugas manipulasi objek dan navigasi, tetapi juga membuka peluang bagi komunitas yang lebih luas untuk berpartisipasi dalam mempercepat revolusi AI robotika. Pesan penutupnya adalah bahwa kini saatnya bagi robotika untuk mengalami ledakan data yang sama seperti yang dialami oleh bidang AI lainnya.

Read

file updated 2026-02-12 02:44:52 UTC