Transcript
AMxCUBxnMYY • RoboCOIN: The Ultimate Open-Source Multi-Embodiment Bimanual Robotics Dataset
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/FoundationModelsForRobotics/.shards/text-0001.zst#text/0014_AMxCUBxnMYY.txt
Kind: captions Language: en You know, when we think about building smarter robots, our minds usually jump to hardware, right? Better gears, stronger motors, faster chips. But the real revolution, the thing that's actually going to give us the capable, do anything robots we've been dreaming of, it's not happening in the workshop. It's happening in the data. And to really get why, we need to meet a robot that learns well, kind of like a kid. So, this is Nico. It's designed to learn not by having someone code instructions into it, but through actual physical experience, through play. Researchers had this really interesting idea. Could Nico figure out what an object is made of just by messing with it? Which of course leads to a fantastic question. I mean, you or I could pick up this duck and we'd know instantly, right? It's soft. It's rubbery. It's light. But how on earth do you teach that kind of intuition to a machine? Well, Nico's creators had it do something super simple, something any toddler would do. It dropped it. Now, this isn't just random chaotic play. There's a method to the madness, and it's called active exploration. By doing something and then just watching what happens, the robot is literally creating its own data about the world. It's learning cause and effect all on its own. And for Nico, the real trick was to use more than just one of its senses. Okay, this is where it gets really clever. Look at how Nico is processing this. On the left, it's not just looking at the duck. It's running what's called a difference of image calculation to see only the motion. And on the right, it takes the sound of the thud and turns it into a specttogram. Basically, a visual fingerprint of that noise. So, in a way, it's seen motion and it's seen sound. And this chart just shows you how powerful that is. Check it out. With sound alone, Nico was right about 60% of the time. Not bad. With vision alone, it jumped to a pretty impressive 84%. But when it combined the two, a whopping 90% accuracy. It's just like us, right? A richer multiensory experience gives you a way better understanding of things. So huge success. But this is exactly where our story gets complicated. See, the problem is a trick that works for one robot doing one simple thing often fails. And I mean fails spectacularly when you try to make it bigger and more complex. That leap from learning to drop a toy to say cleaning a kitchen. It's just immense. And you can really see that leap here. On the left you've got Nikico and its simple drop test. On the right this is a way more advanced humanoid robot called Tokabi. It's designed for really tricky two-handed jobs. The physical difference is obvious, but the real challenge, that's the data gap. Because this is where the whole approach just hit a wall. When researchers tried to use the same learning method with Toe, it just didn't work. The robot couldn't generalize. If they moved an object just a few inches away from its training spot, Toco was completely lost. All those extra joints, the moving head, it created a learning problem that was exponentially harder. This right here, this is the great data bottleneck in robotics. It really boils down to a few things. First, as robots get more complex, the data they need just explodes. Second, a lesson one robot learns doesn't just work for another because their bodies, their embodiment are different. And finally, as you're collecting all this data, how do you even know if it's any good? So, to get around this massive bottleneck, the entire field is starting to shift its strategy. Instead of thousands of little separate experiments, researchers are now teaming up to build huge shared libraries of robot experience. This quote from the Robomind project really just nails the new mission. The whole point is to create data sets that are so big and so varied that a learning model can finally start to generalize to actually understand the idea of opening a drawer, not just how to open one specific drawer from one specific angle. And the scale we're talking about here is just mindblowing. The Robomind project, for instance, has collected over 107,000 examples of a robot doing a task. Each one of those is a demonstration trajectory captured in incredible detail. And it doesn't even stop there. The Robocoin data set is even bigger with 180,000 demonstrations. But here's the crucial part. It's collected from 15 times different types of robots. That's directly attacking that embodiment divide we were just talking about. Just think about what that means. It's like they're building a YouTube or a Wikipedia, but for robots. A gigantic open-source library of skills where any new robot can basically log on, download the collective experience of thousands of others, and get a massive head start on figuring out the world. But as these data sets started to grow into the hundreds of thousands, a new, more subtle question started to pop up. Is just having more data really the answer? And it turns out the answer is no. Not at all. Think about it. A student who reads the same chapter a thousand times isn't going to learn nearly as much as a student who reads 10 different books on the subject. The variety of the data is what really truly matters. And what they found was actually pretty surprising. The biggest improvements in performance didn't come from having more types of objects or different colors. They came from showing the robot the exact same task, but from lots of different camera angles and with the objects arranged in all sorts of different starting positions. Of course, trying to manage this flood of diverse data, well, it requires a whole new level of organization. You can't just dump everything into one big folder. So researchers are now building these formal data quality frameworks. Now this chart looks super complex, I know, but the goal is actually really simple. Create a reliable, repeatable process to plan, manage, and check every single piece of data they collect. It's kind of like they're creating a Rosetta Stone for robot data, a standard process that makes sure that every bit of information, whether it comes from a lab in Berlin or a university in Seoul, is collected and documented in a way that makes it useful for everybody. It's this five-step loop. You plan, you figure out the risks, you implement, you check your work, and then you improve the process. That's becoming the new gold standard. So, what does all of this mean for the future? Well, it's really signaling a fundamental shift in how we even think about robotics. The hardest part of building a robot mind, it isn't a mechanical engineering problem anymore. It isn't even purely a software problem. It is first and foremost a data science problem. The real challenge is building and curating the right library of experiences. And that means the ultimate goal has gotten way more ambitious. This isn't about making one clever robot in one lab anymore. It's about creating a shared global foundation of physical knowledge, a collective mind that can turbocharge learning for every single robot that comes after it. And this all leads to one last really fascinating question. For all of human history, we've been the ones who create and pass down physical skills. But as we build this enormous library of machine experience, we might just find that robots, by seeing patterns in ways we never could, discover new, better, more efficient ways of doing things. And who knows, they might end up teaching us a thing or two.