MIT AGI: Building machines that see, learn, and think like people (Josh Tenenbaum)
7ROelYvo8f0 • 2018-02-08
Transcript preview
Open
Kind: captions Language: en today we have Josh Tenenbaum he's a professor here at MIT leading the computational cognitive science group among many other topics and cognition and intelligence he is fascinated with the question of how human beings learn so much from so little and how these insights can lead to build AI systems that are much more efficient at learning from data so please give Josh a warm welcome all right thank you very much thanks for having me decided to be part of what looks like really quite a very impressive lineup especially starting after today and it's I think quite a great opportunity to get to see perspectives on artificial intelligence from many of the leaders in industry and other entities working on this this great quest so I'm going to talk to you about some of the work that we do in our group but also I'm gonna try to give a broader perspective reflective of a number of MIT faculty especially those who are affiliated with the Center for brains minds and machines so you can see up there on my affiliation academically I'm part of brain and cognitive science or course nine I'm also part of csail but I'm also part of the Center for brains minds and machines which is an NSF funded Center Science and Technology Center which really stands for the bridge between the science and the engineering of intelligence it literally straddles Vassar Street and that we have csail and DCs members we also have partners at Harvard and other academic institutions and again what we stand for I want to try to convey some of the specific things we're doing in the center and where we want to go with a vision that really is about jointly pursuing the science the basic science of how intelligence arises in the human mind and brain and also the engineering enterprise of how to build something increasingly like human intelligence in machines and we deeply believe that these two projects have something to do with each other and our best pursued jointly now it's really exciting time to be doing anything related to intelligence or certainly to AI for all the reasons that you know brought you all here I don't have to tell you this we have all these ways in which AI is kind of finally here we finally live in the era of something like real practical AI or for those who've been around for a while and have seen some of the rises and falls you know AI is back in a big way but from my perspective and I think maybe this reflects you know why we distinguish what we might call a GI from AI we we don't really have any real AI basically we have what I like to call AI technologies which are systems that do things we used to think that only humans could do and now we have machines that do them often quite well maybe even better than any human who's ever lived right like a machine that plays go but none of these systems I would say are truly intelligent none of them have anything like common sense none of them have anything like the flexible general-purpose intelligence that each of you might use to learn every one of these skills or tasks right each of these systems had to be built by large teams of engineers working together often for a number of years out often at great cost to somebody who's willing to pay for it and each of them just does one thing so alphago might beat the worlds best but it can't drive to the match or even tell you that go it what go is it can't even tell you the go is a game because it doesn't even know what a game is right so what's missing why what what is it that makes every one of your brains maybe you can't beat you know the world's best didn't go but any one of you can get behind the wheel of a car I think of this because my daughter is gonna turn 16 tomorrow if she lived in California she'd have a driver's license it's a little bit down the line for us here in Massachusetts but you know she didn't have to be specially engineered by billion dollar startups and you know she got really into chess recently and now she's taught herself chess by playing just you know a handful of games basically I mean she can do any one of these activities and any one of us can so what is it what's that what makes up the difference well there's many things right I'll talk about the the focus for us and our research and a lot of us again in CBMM is summarized here um what what drives the success is right now in AI especially in industry okay and all these AI technologies is many many things many things but what's what where the progress has been made most recently and what's getting most of the attention is of course deep learning but other kinds of machine learning technologies which essentially represent the maturation of a decades-long for to solve the problem of pattern recognition that means taking data and finding patterns in the data that tells you something you care about like how to label a class or how to predict some other signal okay and pattern recognition is great it's an important part of intelligence and it's reasonable to say the deep learning as a technology has really made great strides on pattern recognition and maybe even you know has coming close to solving the problems of pattern recognition but intelligence is about many other things intelligence is about a lot more in particular it's about modeling the world and think about all the activities that a human does so model the world that that go beyond just say recognizing patterns and data but actually trying to explain and understand what we see for instance okay or to be able to imagine things that we've never seen that never seen maybe even very different from anything we've ever seen but might want to see and then to meet to set those as goals to make plans and solve problems needed to make those things real or thinking about learning again the you know some kinds of learning can be thought of as pattern recognition if you're learning sufficient statistics or weights in a neural net that are used for those purposes but many activities of learning are about building out new models right either refining reusing improving old models or actually building fundamentally new models as you've experienced more of the world and then think about sharing our models communicating our models to others modeling their models learning from them all these activities of modeling these are at the heart of human intelligence and it requires a much broader set of tools so I want to talk about the ways we're studying these activities of modeling the world and something in a pretty non-technical way about what are the kind of tools that allow us to capture these abilities now I think it's I want to be very honest up front and to say this is just the beginning of a story right when you look at deep learning successes that itself is a story that goes back decades I'll say a little bit about that history in a minute but where we are now is just looking forward to a future when we might be able to capture these abilities you know at a really mature engineering scale and I would say we are far from being able to capture the all the ways in which humans richly flexibly quickly build models of the world at the kind of scale that say Silicon Valley wants either big tech companies like Google or soft or IBM or Facebook or small startups right we can get there and I think what what I want to talk to you about here is one route for trying to get there and this is the route that CBMM stands for the idea that by reverse engineering how intelligence works in the human mind and brain that will give us a route to engineering these abilities in machines when we say reverse engineering we're talking about science but doing science like engineers this is our fundamental principle that if we approach cognitive science and neuroscience like an engineer where so the output of our science isn't just a description of the brain or the mind in words but in the same terms that an engineer would use to build an intelligence system then that will be both the basis for a much more rigorous and deeply insightful science but also direct translation of those insights into engineering applications now I said before I talk a little about history what I mean by that is is this again if if part of what brought you here is deep learning and I know even if you've never heard of deep learning before which I'm sure is unlikely you saw some you know a good spectrum of that in the in the overview session last night okay it's really interesting and important to look back on the history of where did techniques for deep learning come from or reinforcement learning those are the two tools in the in the current machine learning arsenal that are getting the most attention things like back propagation or end to end stochastic gradient descent or temporal difference learning or cue learning here's a few papers from the literature you know maybe some of you have read these original papers here's here's the original paper by rumelhart Hinton and colleagues in which they introduced the back propagation algorithm for training multi-layer perceptrons right multi-layer neural networks here's the original perceptron paper by Rosenblatt which introduced the one layer version of that architecture and the basic perceptron learning algorithm here's the first paper on sort of the temporal difference learning method for reinforcement learning from Sutton and Bartow here's the original Bolton machine paper also by Hinton and colleagues which you know again is a those you don't know that architecture they give a kind of probabilistic undirected multi-layer perceptron or for example before there were LS TMS if you know about current recurrent neural network architecture earlier as much simpler versions of the same idea were proposed by Jeff Elman and his simple recurrent networks the reason I want to put up the original papers here for you to look at both when they were published and where they were published so if you look at the dates you'll see papers going back to you know the the 80s but even the 60s or even the 1950s and look at where they were published most of them were published in psychology journals so the journal psychological review if you don't know it is like the leading journal of theoretical psychology and mathematical psychology okay or cognitive science the Journal of the cognitive science Society or the the backdrop paper was published in Nature which is a general interest science journal but by people who are mostly affiliated with an Institute for cognitive science in San Diego so what you see here is already a long history of scientists thinking like engineers these are people who are in psychology or cognitive science departments and publishing in those places but by formalizing even very basic insights about how humans might learn or how you know brains might learn in the right kind of math that led to of course progress on the science side but it led to all the engineering that we see now it wasn't sufficient right we needed we needed of course lots of innovations and advances in computing hardware and software systems right but this is where the basic the basic math came from and it came from doing science like an engineer so what I want to talk about in our vision is what is the future of this look like if we were to look 50 years into the future what would we be looking back on now or you know over this time scale well here's that here's a long-term research roadmap that reflects some of my ambitions and some of our centers goals and many others too right we'd like to be able to address basic questions fundamental questions of what it is to be and to think like a human questions for example of consciousness or meaning in language or real learning right questions like you know even beyond the individual like questions of culture or creativity so our big ideas up there and for each of these there are basic scientific questions right how do we become aware of the world in ourselves in it starts with perception but it really turns into awareness awareness of yourself and of the world and what we might call consciousness right or how does a word start to have a meaning what really is a meaning and how does a child grasp it or how did children actually learn what do babies brains actually start with are they blank slates or do they start with some kind of cognitive structure and then what is real learning look like these are just some of the questions that were we're interested in working on or when we talked about culture we mean how do you learn all the things you didn't directly experience right but that somehow you got from the accumulation of knowledge in society over many generations or how do you ever think of new ideas or answers to new questions how do you think of the new questions themselves how do you decide what to think about these are all key activities of human intelligence when we talk about how we model the world where our models come from what we do with our models this is what we're talking about and if we could get machines that could do these things well again on the bottom row think of all the actual real engineering payoffs now in our Center in both my own activities and a lot of what my group does these days and what a number of other colleagues in the Center for brains minds and machines do as well as you know brought very broadly people in VCS and csail one place where we work on the beginnings of these problems in the near term this is the long term like think 50 years okay maybe short or maybe longer I don't know but think well beyond well beyond 10 years but in the short term 5 to 10 years a lot of our focus is around visual intelligence and there's many reasons for that again we can build on the successes of deep networks and a lot of pattern recognition and machine vision it's a good way to put these ideas into practice when we when we look at the actual brain the visual system in the brain in the human and other mammalian brains for example is really very clearly the best understood part of the brain and at a circuit level it's the part of the brain that's most inspired current deep learning and neural network systems but even there there's things which we still don't really understand like engineers so here's an example of a basic problem in visual intelligence that we and others in the centre are trying to solve look around you and you feel like there's a whole world around you and there is a whole world around you feel like your brain captures it but what what the actual sense data that's coming in through your eyes looks more like this photograph here where you can see there's a crowd scene but it's mostly blurry except for a small region of high resolution in the center so that corresponds biologically to what part of the images in your fovea that's the central region of cells in the retina where you have really high-resolution visual data the size of your phobia is roughly like if you hold out your thumb at arm's length it's a little bit bigger than that but not much bigger right most of the image in terms of the actual information coming in and a bottom-up sense to your brain is really quite blurry but somehow by looking at just one part and then by secada around or making a few eye movements you get a few glimpses each not much bigger than the size of your thumb at arm's length somehow you stitch that information together into what feels like and really is a rich representation of the whole world around you and when I say around you I mean literally around you so here's another kind of demonstration um without turning around nobody's allowed to turn around ask yourself what's behind you now the answer is going to be different for different people depending on where you're sitting right for most of you you might think well there's I think there's a person pretty close behind me all right you know you're in a crowded auditorium although you haven't seen that person you know that they're there right for people in the very back row you know there isn't a person behind you and you're conscious of being in the back row right you might be conscious that there's a wall right behind you but now for the people who are in the room not in the very back think about how far behind you is the back like where's the nearest wall behind you so we can get maybe we can call out try a little demonstration so I don't know I'm pointing to someone there can you see phrase say something if you think I'm pointing at you well I could have been pointing at you but I'm pointing someone behind you okay I'll point to you yeah I'm pointing to you all right so how far is the nearest wall no you can't turn around you've blown your chance right without turning around okay so you you were laughs okay do you see I'm pointing to you there with the tie okay so without turning around how far is the nearest wall behind you that's sorry how far five meters okay well I mean that might be about right no other people can turn around how about you how far is the nearest wall behind you ten meters okay that might be right yeah how about here how what do you think twenty okay see yeah since I didn't grow up in the metric system I barely know but yeah I mean I mean the point is that like you're you're you each of you is is not surely not exactly right but you're certainly within an order of magnitude and I guess if we actually tried to measure you know you're probably my guess is you're probably right within you know fifty percent or less often you know maybe just twenty percent error okay so how do you know this I mean even if it's not what did you say twenty meters even if it's not twenty meters it's probably closer to 20 meters than it is to 5 or 10 meters and then it is 250 meters so how do you know this you haven't turned around in a while right but some part of your brain is tracking the whole world around you right and how many people are behind you yeah like a few hundred right I mean I don't know if it's 200 or 300 or but it's not a thousand I mean I don't think so and it's certainly not ten or 20 or 50 right so you track these things and you use them to plan your actions okay so again think about how instantly effortlessly and very reliably okay your brain computes all these things so the people and objects around you and it's not just you know approximations certainly when we're talking about what's what's behind you in space there's a lot of imprecision but when it comes to reaching for things right in front of you very precise shape and physical property estimates needed to pick up and manipulate objects and then when it comes to people it's not just the existence of the people but something about what's in their head right you track whether someone's paying attention to you and you're talking to them what they might want from you what they might be thinking about you what they might be thinking about other people okay so when we talk about visual intelligence this is the whole stuff we're talking about and you can start to see how it turns into basic questions I think of not of what we might call the beginnings of consciousness at least our awareness of ourself in the world and of ourselves as a self in the world but also other aspects of higher-level intelligence and cognition that are not just about perception like symbols right to describe even to ourselves what's around us and where we are and what we can do with it you have to go beyond just what we would normally call the stuff of perception to say the thoughts in somebody's head and your own thoughts about that okay so what we've been doing in CBMM is trying to develop an architecture for visual intelligence and I'm not going to go into any of the details of how this works and this is just notional this is just a picture it's like a just a sketch from a grant proposal of what we say we want to do but it's based on a lot of scientific understanding of how the brain works there are different parts of the brain that correspond to these different modules in our architecture as well as some kind of emerging engineering way to try to capture at the software and maybe even hardware levels how these modules might work so we talk about a sort of an early module of a visual or perceptual stream which like bottom-up visual or other perceptual input that's the kind of thing that is pretty close to what we currently have and say deep convolutional neural networks but then we talk about some kind of the output of that isn't just pattern class labels but what we call the cognitive core core cognition so we get an understanding of space and objects there physics other people their minds that's the real stuff of cognition that has to be the output of perception but somehow we have to we have we have to have this is what we call the brain OS in this picture we have to get there by stitching together the bottom-up inputs from glimpse here a glimpse here a little bit here and there and accessing prior knowledge that comes from our memory systems to tell us how to stitch these things together into the really core cognitive representations of what's out there in the world and then if we're going to start to talk about it in language or to build plans on top of what we have seen and understood that's where we talk about symbols coming into the picture ok the building blocks of language and plans and so on so now we might say well ok this is an architecture that is brain inspired and cognitively inspired and and we're planning to turn into real engineering and you can say well do we need that maybe you know again I know this is a question you considered in the first lecture maybe the engineering toolkit that's currently been making a lot of progress in let's say industry maybe that's good enough maybe you know let's take deep learning but to stand for a broader set of modern pattern recognition based and reinforcement learning based tools and say ok well maybe that can scale up to this and you might you know it but maybe that's that's possible I'm happy in the question period of people want to debate this my sense is no I think that it's not when I say no I don't mean like it can't happen or it won't happen what I mean is the highest value the highest expected route right now is to take this more science-based reverse engineering approach and that if at least if you follow the current trajectory that industry incentives especially optimized for it's not even really trying to take us to these things so think about for example a case study of visual intelligence that is in some ways as pattern recognition very much of a success it's again been mostly driven by industry it's something that if you read in the Jews or even play around with in certain of it publicly available datasets feels like we've made great progress and this is an aspect of visual intelligence which is sometimes called image captioning it's bate or mapping images to text you know basically there's been a bunch of systems here's a couple of press releases I guess this one's about Google Google's AI can now capture images almost as well as humans here's ones about Microsoft a couple of years ago I think there were something like eight papers all released onto archive around the same time from basically all the major industry computer vision groups as well as a couple of academic partners okay which all driven by basically the same data set produced by some Microsoft researchers and other collaborators trained a combination of deep convolutional neural networks you know state of the art visual pattern recognition with recurrent neural networks which had recently been developed for you know basically kinds of neural statistical language modeling glued them together and produced a system which which which made very impressive results in a big training set and a held-out test set where the goal was to take an image and write a sentence like a short sentence caption that that would seem like the kind of way a human would describe that image and these systems you know surpassed human level accuracy on the held-out test set from a big training set but what you can see when you really dig into these things is there's often a lot of what I would call data set overfitting it's not overfitting to the training set but it's overfitting to whatever are the particular characteristics of this data set you know wherever ever came from certain set of photographs and certain ways of captioning them okay which even a big data set it's not about quantity it's more about the quality the nature of what people are doing all right so one way to test this system is to apply it to what seems like basically the same problem but not within the a certain curated or built data set and there's a convenient Twitter bot that lets you do this so there's something called the pic desk bot which takes one of the state of the art industry AI captioning systems a very good one again this is not meant to I'm not trying to critique these systems for what they're trying to do I'm just trying to point out what they don't really even try to do so this takes the microsoft caption bot and just every couple of hours takes a random image from the web captions it and upload the results to Twitter and a couple of months ago when I prepared a first version of this talk I just took a few days in the life of this Twitter bot I didn't take every single image but I took you know most of the images in a way that was meant to be representative of the successes and the kinds of failures that such a system will make so we can go through this and it's a little bit entertaining and I think quite informative so here's just a somewhat random sample of a few days in the life of one of these caption BOTS so here we have a picture of a person holding for tonight my screen is very small here and I can't read up there so maybe you'll have to tell me was that but a person holding a cell phone I guess I'll just read along with you so have a person holding a cell phone well it's not a person holding a cell phone but it's kind of close it's a person holding some kind of machine so I don't even know what that is but it's some kind of musical instrument right so that's a mixed success or failure here's some pretty good one a group of people on a on a field playing football that's I would call that a you know a result maybe even A+ here's a group of people standing on top of a mountain so less good there's a mountain but as far as I can tell there's no people but these systems like to see people because of both the combination because in the data set they were trained on there's a lot of people and people often talk about people okay I mean and the fact that you can appreciate both what I said and why it's funny that's there you did some of my cognitive activities that this system is not even trying to do okay here we've got a building with the cake I'll go through these fast building with the cake a large stone building with the clock tower I think that's pretty good I'd give that like a b-plus there's no clock but it's plausibly right there might be a clock in there there's definitely something like that here's a truck parked on the side of a building I don't know maybe a b-minus there there is a car on the side of a building but it's not a truck and it's and it's it's not doesn't seem like the main thing in the image okay here's a necklace made of bananas here's a large ship in the water this is pretty good I give this like an a-minus or b-plus because there is a ship in the water but it's not very large it's really more of like a tugboat or something here's a sign sitting on the grass you know in some sense that's great no but it but in another sense it's really missing what's actually interesting and important and meaningful to humans here's a here's a garden is in the dirt a pizza sitting on top of the building a small house with the red brick building that's pretty good although a kind of weird way of saying it a vintage photo of a pond that's good they like vintage photos a group of people that are standing in the grass near a bridge again there's two people and there's some grass and there's a bridge but it's really not what's going on a person in the yard okay kind of a group of people standing on top of the boat there's a boat there's a group of people they're standing but again it's what the sentence that you see is is more based on a bias of what people have said in the past about images that are only vaguely like this a clock tower is a little at night that's really I think pretty impressive a large clock mounted to the side of the building a little bit less so a snow-covered feel very good a building with snow on the ground a little bit less good there's no snow white some people who I don't know them but I bet that's probably right because face identifying faces and recognizing people who are famous because they won you know medals and the Olympics probably I would trust current pattern recognition systems to get that a painting of a base in front of a mirror less good also a famous person there but we didn't get him a person walking in the rain again there is sort of a person and there's some puddles but not you know a group of stuffed animals a car parked in a parking lot that's good a car parked in front of a building less good a plate with a fork and knife a clear blue sky okay so you get the idea again like if you actually go and play with the system partly because I think Mike but my friends at Microsoft told me they've improved at some you know I this is partly for entertainment values you know I chose what also would be the funnier example so I'm quite I want to be quite honest about it and these are I'm not trying to take away what our impressive AI technologies but I think it's clear that there's a sense of understanding any one of these images that it's important to see that even when it seems to be correct right if it can make the kind of errors that it makes that even when it seems to be correct it's probably not doing what you're doing and it's probably not even trying to scale towards the dimensions of intelligence that we think about when we're talking about human intelligence okay another way to put this I'm going to show you a really insightful blog post from one of your other speakers so in a couple of days I'm not sure you're going to have Andre Karpov a who's one of the leading people in deep learning this is a really great blog post he wrote a couple of years ago when he was I think still at Stanford he got his PhD from Stanford he did he worked at Google a little bit on some early big neural net AI projects there he was an open AI he was one of the founders of open AI and recently he joined Tesla as their director of AI research but about five years ago he was looking at the state of computer vision from a human intelligence point of view and and lamenting how far away we were okay so this is the title of his blog post the state of computer vision nai-nai we are really really far away and he took this image which was a sort of a famous image in its own right it was a popular image of Obama back when he was president kind of playing around as he liked to do when he was on tour so if you take a look at this you can see you probably all can recognize the previous President of the United States but you can also get the sense of where he is and what's going on and you might see people smiling and you might get the sense that he's playing a joke on someone can you see that right so how do you know that he's playing a joke and what that joke is well as Andre goes on to talk about in his blog post too if you think about all the things that that you have to really deploy in your mind to understand that it's a huge list of course it starts with seeing people and objects and maybe doing some face recognition but you have to do things like for example notice his foot on the scale and understand enough about how scales work that when a foot presses down it exerts force that the scale is sensitive doesn't just magically measure people's weight but it does that somehow through force you have to see who can see that he's doing that and who can't who cannot see that he's doing that right in particularly the person on the scale and why some people can see that he's doing that and can see that some other people can't see it why that makes it funny to them okay and someday we should have machines that can understand this but hopefully you can see why what I would I what the kind of architecture that I'm talking about would be the building blocks of the ingredients to be able to get them to do that now I when I again I prepared a version of this talk a few months ago and I wrote to Andre and I said I was gonna use this and I was curious if he how what you know if he had any reflections on this and where he thought we were relative to five years ago because a certain a lot of progress has been made but he said here's his email I hope he doesn't mind me sharing it but I mean again he's a very honest person and that's one of the many reasons why he's such an important person right now in AI okay he's both very technically strong and honest about what we can do what we can't do and as he says well what does he say it's nice to hear from you it's funny you should bring this up I was also thinking about writing a a return to this and in short basically I don't believe we've made very much progress right he points out that in his long list of things that you'd need to understand the image we have made progress on some the ability to again detect people and do face recognition for well-known individuals okay but that's kind of about it all right and he wasn't particularly optimistic that the current route that's being pursued an industry is is anywhere close to solving or even really trying to solve these larger questions um if we give this image to that caption bot you know what we see is again represents the same point so here's the caption bot it says I think it's a group of people standing next to a man in a suit and tie right so that's right right as far as it goes it just doesn't go far enough and the current the current ideas of built a data set train a deep learning algorithm on it and then repeat um aren't really even I would venture trying to get to what we're talking about or here's another I'll just give you one other example of a couple of photographs from my recent vacation and a nice warm tropical look how which I think illustrates ways in which again the gap where we have machines that can say beat the world's best at go but can't even beat a child at tick-tack-toe now what do I mean by that well you know of course we can build we don't even need reinforcement learning or deep learning to build a machine that can they can win or tie do is do optimally in tic-tac-toe but think about this this is a real tic-tac-toe game which I saw on the grass outside my hotel right what do you have to do to look at this and recognize that it's a tic-tac-toe game you have to see the objects you have to see what's you know in some sense there's a three by three grid but it's but it's only abstract right it's only delimited by this these ropes or strings okay it's not actually a grid in any simple geometric sense all right but yet a child can look at that and indeed here's an actual child who was looking at it and recognized oh it's a game of tic-tac-toe and even know what they need to do to win we put the X and completed and now they've got three in a row right that's that's literally child's play okay you showed this sort of thing though to one of these you know image understanding caption BOTS and I think it's a close-up of a sign okay again it's not like saying that this is a close-up of a sign is is not the same thing I would venture as a as a cognitive or computational activity that's going to give us what we need to say recognize the objects to recognize it as a game to understand the goal and how to plan to achieve those goals whereas this kind of architecture is designed to try to do all of these things ultimately right and I bring in these examples of games or jokes to really show where perception goes to cognition you know that and all the way up to symbols right so to get objects and forces and mental states that's the cognitive core but to be able to get goals and plans and what do I do or how do I talk about it that's symbols okay here's another way into this and it's one that also motivates I think a lot of really good work on the engineering side and a lot of our interest in the science side is think about robotics and think about what do you have to do to you know what is the brain have to be light to control the body so again you're gonna hear from shortly I think maybe it's next week from Mark raybert who's one of the founders of Boston Dynamics which is one of my favorite companies anywhere they're without doubt the leading maker of humanoid robots legged locomoting robots in industry they have all sorts of other really cool robots robots like dogs robots that have all you know I think you'll even get to see a live demonstration of my new robots this really awesome impressive stuff okay um but what about the minds and brains of these robots well again if you ask mark ask them how much of human-like cognition do they have in their robots and I think he would say very little in fact we have asked him that and he would say very little he has said very little he's actually one of the advisors of our Center and I think in many ways were very much on the same page we both want to know how do you build the kind of intelligence that can control these bodies like the way a human does alright um here's another example of an industry robotics effort this is Google's arm farm where you know they've they've got lots of robot arms and they're trying to train them to pick up objects using various kinds of deep learning and reinforcement learning techniques and I think it's one approach I just think it's very very different from the way humans learn to say control their body and manipulate objects and you can see that in terms of things that go back to what you were saying when you're introducing me right think about how quickly we learn things right here you have these the arm farm is trying to generate you know effectively maybe if not infinite but hundreds of thousands millions of examples of reaches and pickups of objects even with just a single gripper and yet a child who in some ways can't control their body nearly as well as robots can be controlled at the low level and is able to do so much more so I'll show you two of my favorite videos from YouTube here which motivate some of the research that we're doing the one on the left is a one and a half year old and the other ones a one year old so just watch this one and a half year old here doing a popular activity for many kids as a playing hmm you see video up there I'd okay there we go okay so he's he's on doing this stacking Cup activity alright he's stacking up cups to make a tall tower he's got a stack of three and what you can see for the first part of this video is it looks like he's trying to make a second stack and that he's trying to pick up at once basically he's trying to make a stack of two that'll go on the stack of three and you know he's trying to debug his plan because it's it got a little bit stuck here but and think about I mean again if you know anything about robots manipulating objects even just what he just did no robot can decide to do that and actually do it right at some point he's almost got it it's a little bit tricky but at some point he's gonna get that stack of two he realizes he has to move that object out of the way look at what he just did move it out of the way use two hands to pick it up and now he's got a stack of two on a stack of three and suddenly you know subgoal completed he's now got a stack of five and he gives himself a hand because he know he knows he accomplished a keyway point along the way to his final goal that's a kind of early symbolic cognition right to understand that I'm trying to build a tall tower but a tower is made up of little towers it's you know it can end and you can take a tower and put it on top of another tower or stack a stack on us a can you have a bigger stack right so think about how he goes from bottom up perception to the objects of the physics needed to manipulate the objects to the ability to make even those early kinds of symbolic plans at some point he keeps doing this he puts another stack on there I'll just jump to the end oops sorry you missed it so he he gets really excited and he gives himself another big hand but falls over okay again Boston Dynamics now has robots that could pick themselves up after that that's really impressive again but all the other stuff to get to that point we don't really know how to do in a robotic setting or think about this baby here this is a younger baby this is one of the Internet's very most popular videos because it features a baby and a cat and but the babies doing something interesting he's got the same cups but he's decided he's again decided to try a new thing so this think about creativity he's decided that his goal is to stack up cups on the back of a cat I guess he's asking how many cups can I fit on the back of a cat well three let's see can I fit more let's try another one okay well he can't fit more than three it turns out and then he then does it's not working so he changes his goal now his goal appears to be to get the cups on the other side of the cat now watch that part when he reaches back behind him there that's I'll just pause it there for a moment umm someone he just reached back there that's a particularly striking moment in the video it shows a very strong form of what we call in cognitive science object permanence okay that's the idea that you represent objects as these permanent enduring entities in the world even when you can't see them in this case he hadn't seen or touched that object behind him for like at least a minute right maybe much longer I don't know and yet he still knew it was there and he was able to incorporate it in his plan right there's a moment before that when he's about to reach for it but then he sees this other one right and it's only when he's now exhausted all the other objects here that he can see he's like okay now time to get this object and bring it into play right so think about what has to be going on in his brain for him to be able to do that right that's like the analog of you understanding what's behind you okay um it's not that these things are impossible to capture machines far from it it's just that like training a deep neural network or any kind of pattern recognition system we don't think is going to do it but we think by reverse engineering how it works in the brain we might be able to do it I think we can can do it okay it's not just humans that do this kind of activity here's a couple of again rather famous videos you can watch all of these on YouTube crows are famous object manipulators and tool users but also orangutangs other primates rodents we can watch if we just hey let me pause this one for a second if we watch this orangutan here he's got a bunch of big legos and over the course of this video he's building up a stack legos it's really quite impressive you're just jumping to the end there's actually some controversy out there of whether this video is a fake but the controversy isn't about you know it's not like whether it was I don't know dumb with computer animation some people think the video was actually filmed backwards that a human built up the stack and the orangutan just slowly disassembled it piece by piece and it turns out it's remarkably hard to tell whether it's played forward or backwards in time and people have argued over little details because you know it would be quite impressive if an orangutan actually was able to build up this really impressive stack of Legos but I would submit that it would be almost as impressive if he disassembled it think about the activity I mean if I wanted to disassemble that the easiest thing to do would just be to knock it over that's really all most robots could do but to piece by piece disassemble it even if it's played backwards like this that's still a really impressive act of symbolic planning on physical objects or here you've got this this famous Mouse this you can find on the internet under the mouse versus cracker video and what you'll see here over the course of this video is a mouse valiantly and mostly hopelessly struggling with a cracker that they're hoping to bring back to their nest I guess it's a very appealing big meal and at some point after just trying to get it over the over the wall at some point the mouse just gives up because it's just never gonna happen and he just goes away except that because even Mouse's can dream or mice can dream some point he decides okay I'm just gonna come out for one more try and he tries one more time and this time valiantly gets it over yeah isn't that very impressive congratulations guys okay you don't have to clap form you can clap for me at the end or clap for whoever later okay but I want to applaud the mouse there every time I see that okay but again think what had to be going on in his brain able to do that all right it's a crazy thing and yet he formulated the goal and was able to achieve it I'll just show one more video that is really more about science these other ones are you know some of them actually were from scientific experiments but this is one that motivates a lot of the science that I do and it's to me it sets up kind of a grand cognitive science challenge for AI and robotics it's from an experiment with humans again eighteen month olds or one-and-a-half year old so the the kids in this experiment were the same age is the first baby I showed you the one who did the stacking and 18 months is really a very very good age to study if you're interested in intelligence for reasons we can talk about later if you're interested this is from a very famous experiment done by two psychologists Felix Warren akin and Michael Tomasello and it was studying the spontaneous helping behavior of young children it also contrasted humans and chimps and the punchline is that chips sometimes do things that are kind of like what this human did but not nearly as reliably or as flexibly okay so not nearly it is and I'll show you a particular kind of unusual situation where human kids had relatively little trouble figuring out kind of what to do or even whether they should do it whereas basically no chimp did what you're gonna see humans sometimes doing here so the experimenter in this movie I'll turn on the sound here if you can hear it the experimenter is the tall guy and the participant is the little kid in the corner there there there's sound but no words right and at some point he stops and then the kid just does whatever they want to do so watch what he does he goes over he opens the cabinet looks inside then he steps back and he looks up at felix and then looks down okay and then the action is completed now well wonder I want you to watch it one more time and think about what's gotta be going inside the kid's head to understand this to understand like so it seems like what it looks like to us is the kid figured out that this guy needed help and helped him and the paper is full of many other situations like this this is just one OK but the key idea is that the situation is somewhat novel people have seen people holding books and opening cabinets but probably it's very rare to see this kind of situation exactly right it's different in some important details from what you might have seen before and there's other ones in there that are really truly novel because they just made up a machine right there okay but somehow he has to understand causally from the way the guy's banging the books against the thing that it's it's sort it's sort of both a symbol but it's also somehow he's got to understand what he can do and what he can't do and then what the kid can do to help and I'll show this again but really just watch the main part I want you to see is I'll just sort of skip ahead so watch this part here let's say I'll just jump right when he watch right now he's about to look up he looks up and makes eye contact and then his eyes look down so again he looks up he looks up and then a saccade a sudden rapid eye movement down down to his hands up down okay so that's again that's this brain OS in action right he's making one glance small glance at the big guy's eyes just to make eye contact to see to get a signal did I understand what you wanted and did you did you register that joint attention and then he makes a prediction about what the guy's gonna do so he looks right down he doesn't just like look around randomly he looks right down to the guy's hands to track the action that he expects to see happening if I did the right thing to help you then I expect you're gonna put the books there okay so you can see these things happening and we want to know what's going on inside the mind that guides all of that all right so that's the sort of big scientific agenda that we're working on over the next few years where we think some kind of human understanding of human intelligence in scientific terms could lead to all sorts of AI payoffs in particular suppose we could build a robot that could do what this kid and many other kids and these experiments do just say help you out around the house without having to be programmed or even really instructed just to kind of get a sense oh yeah you need to have at that shirt let me help you out okay even 18 month olds will do that sometimes
Resume
Categories