Sergey Levine: Robotics and Machine Learning | Lex Fridman Podcast #108
kxi-_TT_-Nc • 2020-07-14
Transcript preview
Open
Kind: captions Language: en the following is a conversation with Sergey Levine a professor at Berkeley and a world-class researcher in deep learning reinforcement learning robotics and computer vision including the development of algorithms for end-to-end training of neural network policies that combine perception and control scalable algorithms for inverse reinforcement learning and in general deep r.l algorithms quick summary of the ads to sponsors cash app and expressvpn please consider supporting the podcast by downloading cash app and using collects pot cast and signing up at expressvpn comm / flex pod click the links buy the stuff it's the best way to support this podcast and in general the journey I'm on if you enjoy this thing subscribe on YouTube review it with five stars an apple podcast follow on Spotify supported on patreon or connect with me on Twitter at lex friedman as usual i'll do a few minutes of as now and never any ads in the middle that can break the flow of the conversation this show is presented by cash app the number one finance app in the App Store when you get it used colex podcast cash app lets you send money to friends buy bitcoin and invest in the stock market with as little as one dollar since cash app does fractional share trading let me mention that the order execution algorithm that works behind the scenes to create the abstraction of the fractional orders is an algorithmic marvel so big props the cash app engineers are taking a step up to the next layer of abstraction over the stock market making trading more accessible for new investors and diversification much easier so again if you get cash out from the App Store Google Play and use the code lex podcast you get $10 and cash up will also donate $10 the first an organization that is helping to advanced robotics and stem education for young people around the world this show is also sponsored by expressvpn get it at expressvpn comm / Lex pod to support this podcast and to get an extra three months free on a one-year package I've been using expressvpn for many years I love it I think expressvpn is the best VPN out there they told me to say it but it happens to be true my humble opinion it doesn't lock your data it's crazy fast and as easy to use literally just one big power on button again it's probably obvious to you but I should say it again it's really important that they don't log your data it works on Linux and every other operating system but Linux of course is the best operating system shout out to my favorite flavor Ubuntu mottai 2004 once again get it at expressvpn calm / relax pod to support this podcast and to get an extra three months free on a one-year package and now here's my conversation sergey Lavigne what's the difference between a state-of-the-art human such as you and I well I don't know if we qualify Stata they're humans but a state-of-the-art human and a state-of-the-art robot it's a very interesting question robot capability is it's kind of a I think it's a very tricky thing to to understand because there are some things that are difficult that we wouldn't think are difficult and some things that are easy that we wouldn't think ever you see and there's also a really big gap between capabilities of robots in terms of hardware and their physical capability and capabilities of robots in terms of what they can do autonomously there is a little video that I think robotics researchers really like to show a special Robotics learning researchers like myself from 2004 from Stanford which demonstrates a prototype robot called the PR one and the PR one was a robot that was designed as a home assistance robot and there's this beautiful video showing the pr1 tidying up a living room putting away toys and at the end bringing a beer to the person sitting on the couch which looks really amazing and then the punch line is that this is entirely controlled by person yes so you can so that in some ways the gap between a state-of-the-art human state-of-the-art robot if the robot has a human brain is actually not that large now obviously like human bodies are sophisticated and very robust and resilient in many ways but on the whole if we're willing to like spend a bit of money and do a bit of engineering we can kind of close the hardware gap almost but the intelligence gap that one is very wide and when you say hardware you you're referring to the physical sort of the actuators the actual body the robot is opposed to the hardware on which the cognition the nervous the hardware of the nervous system yes exactly I'm referring to the body rather than the mind so what so that means that the kind of the work is cut out for us like while we can still make the body better we kind of know that the big bottleneck right now is really the mind and how big is that gap how big is the how big is the difference in your in your sense of ability to learn a bit ability to reason ability to perceive the world between humans and our best robots the gap is very large and the gap becomes larger the more unexpected events can happen in the world so essentially the spectrum along which you can measure the the size of that gap is the spectrum of how open the world is if you control everything in the world very tightly if you put the robot in like a factory and you tell it where everything is and you rigidly program its motion then it can do things you know one might even say in a superhuman way it can move faster it's stronger it can lift up a car and things like that but as soon as anything starts to vary in the environment now it'll trip up and if many many things vary like they would like in your kitchen for example then things are pretty much like wide open now again we're gonna stick a bit on the philosophical questions but how much on the human side of the cognitive abilities in your sense is nature versus nurture so so how much of it is product of evolution and how much of it something we'll learn from sort of scratch yeah well from the day were born I'm going to read into your question as asking about the implications of this for AI really by biologists I can't really like speak authoritative also until in garnet if if it's so if it's all about learning then there's more hope for am so the way that I look at this is that you know well first of course biology is very messy and it's if you ask the question how does a person do something or has a person's mind do something you come up with a bunch of hypotheses and oftentimes you can find support for many different often conflicting hypotheses one way that we can approach the question of what the implication of this for AI R is we can think about what's sufficient so you know maybe a person is from birth very very good at some things like for example recognizing faces there's a very strong evolutionary pressure to do that if you can recognize your mother's face then you're more likely to survive and therefore people are good at this but we can also ask like what's what's the minimum sufficient thing right and one of the ways that we can study the minimal sufficient thing is we could for example see what people do in unusual situations if you present them of things that evolution couldn't have prepared them for you know our daily lives actually do this to us all the time we we didn't evolve to deal with you know automobiles and spaceflight and whatever so they're all these situations that we can find ourselves in and we do very well they're like I can give you a joystick to control a robotic arm which you've never used before and you might be pretty bad for the first couple of seconds but if I tell you like your life depends on using this robotic arm to like open this door you'll probably manage it even though you've never seen this device before you even even ever used the joys to control us and you'll kind of muddle through it and that's not your evolved natural ability that's your fear flexibility your your adaptability and that's exactly why our current robotic systems really kind of fall flat but I wonder how much general almost what we think of as common sense pre-trained models underneath all that so that ability to adapt to a joystick is requires you to have a kind of you know I'm human so it's hard for me to introspect all the knowledge I have about the world but it seems like there might be an iceberg underneath of the amount of knowledge you actually bring to the table now that's kind of the open question there's absolutely an iceberg of knowledge that we bring to the table but I think it's very likely that iceberg of knowledge is actually built up over our lifetimes because we have you know we have a lot of prior experience to draw on and it kind of makes sense that the right way for us to you know to optimize our efficiency our evolutionary fitness and so on is to utilize all that experience to build up the best iceberg we can get and that's actually one you know well that sounds an awful lot like what machine learning actually does I think that for modern machine learning it's actually a really big challenge to take this unstructured massive experience and distill out something that looks like a common sense understanding of the world and perhaps part of that isn't it's not because something about machine learning itself is is broken or hard but because we've been a little too rigid in subscribing to a very supervised very rigid notion of learning you know kind of the input-output excess goes go to why sort of model and maybe what we really need to to do is to view the world more as like a massive experience that is not necessarily providing any rigid supervision but sort of providing many many instances of things that could be and then you take that and you distill it into some sort of common sense understanding I see what you're you're painting an optimistic beautiful picture especially from the robotics perspective because that means we just need to invest in both better learning algorithms figure out how we can get access to more and more data for those learning L goes to extract signal from and then accumulate that iceberg of knowledge it's a beautiful picture it's a hopeful one I think it's potentially a little bit more than just that and this is this is where we perhaps reach the limits of our current understanding but one thing that I think that the research community hasn't really resolved in a satisfactory way is how much it matters where that experience comes from like you know do just like download everything on the intranet and cram it into essentially the 21st century analog of the giant language model and then see what happens or does it actually matter whether your machine experiences the world or in a sense that actually attempts things observes the outcome of its actions and kind of augments the experience that way that it chooses which parts of the world it gets to interact with and observe and learn from right it may be that the world is so complex that simply obtaining a large mass of sort of iid samples of the world is is a very difficult way to go but if you are actually interacting with the world and essentially performing this sort of hard- mining by attempting what you think might work observing the sometimes happy and sometimes sad outcomes of that and augmenting your understanding using that experience and you're just doing this continually for many years maybe that sort of data in some sense is actually much more favourable to obtaining a common sense understanding well one reason we might think that this is true is that you know the what we associate with common sense or lack of common sense is often characterized by the ability to reason about kind of counterfactual questions like you know I if I were to you know here I'm this bottle of water sitting on the table everything is fine far knock it over which I'm not going to do but if I were to do that what would happen and I know that nothing good would happen from that but if I have a bad understanding of the world I might think that that's a good way for me to like you know gain more utility if I actually go about the daily life doing the things that my current understanding of the world suggests will give me high utility in some ways I'll get exactly the the right supervision to tell me not to do those those bad things and to keep doing the good things so there's a spectrum between iid random walk through the space of data and then there's and what we humans do or I don't even know if we do it through optimal but there might be beyond what so this open question that you raised where do you think systems intelligent systems that would be able to deal with this world fall can we do pretty well by reading all of Wikipedia sort of randomly sampling it like language models do or do we have to be exceptionally selective and intelligent about which aspects of the wall we eat chocolate so I think this is first an open scientific problem and I don't have like a clear answer but I can speculate a little bit and what I would speculate is that you don't need to be super super careful I think it's less about like being careful to avoid the useless stuff and more about making sure that you hit on the really important stuff so perhaps it's okay if you spend part of your day just you know guided by your curiosity visiting interesting regions of the of your state space but it's important for you to you know every once in a while make sure that you really try out the solutions that your current model of the world suggests might be effective and observe whether those solutions are working as you expect or not and perhaps some of that is really essential to have kind of a perpetual improvement loop like this perpetual improvement loop is really like but that's really the key the key that's going to potentially distinguish the best current methods from the best methods of tomorrow in a sense how important do you think is exploration or total out-of-the-box thinking exploration in this space is you jump to totally different domain so you kind of mentioned there's an optimization problem you kind of kind of explore the specifics of a particular strategy whatever the thing you're trying to solve how important is it to explore totally outside of the strategies they've been working for you so far what's your intuition there yeah I think it's a very problem dependent kind of question and I think that that's actually you know in some ways that question gets at one of the big differences between sort of the classic formulation of a reinforcement learning problem and some of the sort of more open-ended reformulations of that problem that have been explored in recent years so classically reinforcement learning is framed as a problem of maximizing utility like any kind of rational AI agent and then anything you do is in service to maximizing that utility but a very interesting kind of way to look at I'm not necessary saying that's the best way to look at it but an interesting alternative way to look at these problems as as something where you first get to explore the world however you please and then afterwards you will be tasked with doing something and that might suggest to somewhat different solutions so if you don't know what you're going to be tasked with doing and you just want to prepare yourself optimally for whatever you're uncertain future holds maybe then you will choose to attain some sort of coverage build up sort of an arsenal of cognitive tools if you will such that later on when someone tells you now your job is to fetch the coffee for me you'll be well prepared to undertake that task and that you see that as the modern formulation of the reinforcement learning problem as the kind of the more multi task the general intelligence kind of formulation I think that's one possible vision of where things might be headed I don't think that's by any means the mainstream or standard way of doing things and it's not like if I had to but I like it it's a beautiful vision so maybe you actually take a step back what is the goal of robotics what's the general problem of robotics of trying to solve you actually kind of painted two pictures here one of the narrow one is the general what in your view is the big problem of robotics again ridiculously philosophical questions I think that you know maybe there are two ways I can answer this question one is there's a very pragmatic problem which was like what would make robots what would sort of maximize the usefulness of robots and there the answer might be something like a system where a system that can perform whatever task a human user sets for it you know within the physical constraints of course if you tell it to teleport to another planet but probably can't do that but if you if you ask it to do something that's within its physical capability then potentially with a little bit of additional training or a little bit of additional trial and error it ought to be able to figure it out in much the same way as like a human tele operator ought to figure out how to drive the robot to do that that's kind of a very pragmatic view of what it would take to kind of solve the the robotics problem if you will but I think that there is a second answer and that answer that the answer is a lot closer to why I want to work on on robotics which is that I think it's it's less about what it would take to do a really good job in the world of robotics but more the other way around what robotics can bring to the table to help us understand artificial intelligence so your dream fundamentally is to understand intelligence yes I think that's the dream for many people who actually work in this space I think that there is there's something very pragmatic and very useful about studying robotics but I do think that a lot of people that go into this field actually you know the things that they draw inspiration from are the potential for robots to like help us learn about intelligence and about ourselves that's that's fascinating that robotics is basically the space by which you can get closer to understanding the fundamentals of artificial intelligence so what is it about robotics that's different from some of the other approaches so if we look at some of the early breakthroughs in deep learning or in the computer vision space and the natural language processing there was really nice clean benchmarks that a lot of people competed on and thereby came out with a lot of building ideas what's the fundamental difference to you between computer vision purely define an image net and kind of the bigger robotics problem so there are a couple of things one is that with robotics you kind of have you kinda have to take away many of the crutches so you have to deal with with both the the the particular problems of perception control and so on but you also have to deal with the integration of those things and you know classically we've always thought of the integration as kind of a separate problem so a class a kind of modular engineering approaches that we solve individual subproblems then wire them together and then the whole thing works and one of the things that we've been seeing over the last couple of decades is that well maybe studying the thing as a whole might lead to just like very different solutions now if we were to study the parts and wire them together so the integrative nature of robotics research helps us see you know the different perspectives on the problem another part of the answer is that with robotics it it casts a certain paradox into very clever relief so this is sometimes referred to as more of expert on the idea that in artificial intelligence things that are very hard for people can be very easy for machines and vice versa things that are very easy for people can be very hard for machines so you know integral and differential calculus is pretty difficult to learn for people but if you program a computer do it it can derive derivatives and integrals for you all day long without any trouble whereas some things like you know drinking from a cup of water very easy for a person to do very hard for a robot to deal with and sometimes when we see such blatant discrepancies that give us a really strong hint that we're missing something important so if we really try to zero in on those discrepancies we might find that little bit that we're missing and it's not that we need to make machines better or worse at math and better at drinking water but just that by studying those discrepancies you might find some new insight so that that could be that could be in any space it doesn't have to be robotics but you're saying yeah I get it's kind of interesting that robotics seems to have a lot of those discrepancies so the the the Hans more of a paradox is probably referring to the space of the the physical interaction I think you said object manipulation walking all the kind of stuff we do in the physical world that well how do you make sense if you were to try to disentangle the the Marwick paradox like why is there such a gap in our intuition about it why do you think manipulating objects is so hard from everything you've learned from applying reinforcement learning in this space yeah I think that one reason is maybe that for many of the problems for many of the other problems that we've studied in AI and computer science and so on the notion of input/output and supervision is much much cleaner so computer vision for example deals with very complex inputs but it's comparatively a bit easier at least up to some level of abstraction to cast it as a very tightly supervised problem it's comparatively much much harder to cast robotic manipulation as a very tightly supervised problem you can do it it just doesn't work all that well so you could say that well maybe we get a label data set where we know exactly which motor commands to send and then we train on that but for various reasons that's not actually like such a great solution and it also doesn't seem to be even remotely similar to how people and animals learn to do things because we're not told by like our parents here is how you fire your muscles in order to walk we you know we do get some guidance but the really low-level detailed stuff we figure out most of them our own and that's what you mean by tightly coupled that every single little sub action gets a supervised signal of whether it's a good one or not right so so while in computer vision you could sort of imagine up to a level of abstraction that maybe you know somebody told you this is a car and this is a cat and this is a dog in motor control it's very clear that that was not the case if we look I said of the sub spaces of Robotics that again as you said robotics integrates all of them together and we'll get to see how this beautiful mess into place but so there's nevertheless still perception so it's the the computer vision problem broadly speaking understanding the environment then there's also maybe you can correct me on this kind of categorization of the space then there's prediction in trying to anticipate what things are going to do into the future in order for you to be able to act in that world and then there's also this game theoretic aspect of how your actions will change the behavior of others in this kind of space what and this is bigger than reinforcement learning this is just broadly looking at the problem of Robotics what's the hardest problem here or is there or is what you said true that when you start to look at all of them together that's an int that's a whole nother thing like you can't even say which one individually is harder because all of them together you should only be looking at them all together I think when you look at them all together some things actually become easier and I think that's actually pretty important so we had you know back in 2014 we had some work basically our first work on end to end enforced learning for robotic manipulation skills from vision which you know at the time was something that seemed a little inflammatory and controversial in the robotics world but other than the the inflammatory and controversial part of it the point that we were actually trying to make in that work is that for the particular case of combining perception and control you could actually do better if you treat them together then if you try to separate them and the way that we try to demonstrate this as we picked a fairly simple motor control task where a robot had to insert a little red trapezoid into a trapezoidal hole and we had our separated solution which involved first detecting the hole using a pose detector and then actuated arm to put it in and then our intent solution which just mapped pixels to the torques and one of the things we observed is that if you use the intense solution essentially the pressure on the perception part of the model is actually lower like it doesn't have to figure out exactly where the thing is in 3d space it just needs to figure out where it is you know distributing the errors in such a way that the horizontal difference matters more than the vertical difference because vertically just pushes it down all the way until it can't go any further and their perceptual errors are a lot less harmful whereas a perpendicular to the direction of motion perceptual errors are much more harmful so the point is that if you combine these two things you can trade off errors between the components optimally to best accomplish the task and the components can should be weaker while still leading to better overall performance as a profound idea I mean in in the space of pegs and things like that is quite simple it almost is tempting to overlook but that's seems to be at least intuitively an idea that should generalize to basically all aspects of perception control of course when one strengthens the other yeah and and we you know people who have studied sort of perceptual heuristics in humans and animals find things like that all the time so one one very well-known example this is something called the gaze heuristic which is a little trick that you can use to intercept a flying object so if you want to catch a ball for instance you could try to localize it in 3d space estimate its velocity estimate the effect of wind resistance solve a complex system of differential equations in your head or you can maintain a running speed so the object stays in the same position as in your field of view so if it dips a little bit you speed up if it rises a little bit you slow down and if you follow the simple rule you'll actually arrive at exactly the place where the object lands and you'll catch it and humans use it when they play baseball human pilots use it when they fly airplanes to figure out if they're about to collide with somebody frogs use this to catch insects and so on and so on so this is something that actually happens in nature and I'm sure this is just one instance of it that we were able to identify just because it's you know that scientists are able to identify that goes so prevalent with our probably many others do you ever just who can zoom in as we talk about robotics they have a canonical problem sort of a simple clean beautiful representative problem in robotics they you think about when you're thinking about some of these problems we talked about robotic manipulation to me that seems intuitively at least the robotics community is converging towards that as a space that's the canonical problem if you agree that maybe you zoom in in some particular aspect of that problem that you just like like if we solve that problem perfectly it'll unlock a major step in towards human level intelligence I don't think I have like a really great answer to that and I think partly the reason I don't have a great answer kind of has to do with the it has to do with the fact that the difficulty is really in the flexibility and adaptability rather than in doing a particular thing really really well so it's hard to just say like oh if you can I don't know like shuffle a deck of cards as fast as like a Vegas right a casino dealer then you'll you'll be very proficient it's really the ability to quickly figure out how to do some arbitrary new thing well enough so like you know to move on to the next arbitrary thing but the the source of newness and uncertainty have you found problems in which it's easy to generate new noonah sness messes yeah new types of newness yeah so a few years ago is so if you'd asked me this question around like 2016 maybe I would have probably said that robotic grasping is a really great example of that because it's a task with great real-world utility like you will get a lot of money if you can do it well when is the robotic grasping picking up any object with a robotic hand exactly so you'll get a lot of money if you do it well because lots of people want to run warehouses with robots and it's highly non-trivial because very different objects will require very different grasping strategies but actually since then people have gotten really good at building systems to solve this problem as to the point where I'm not actually sure how much more progress we can make with that as like the main guiding thing but it's kind of interesting to see the kind of methods that have what actually worked well in that space because a robotic grasping classically used to be regarded very much as kind of an almost like a geometry problem so you people who have studied the history of computer vision will find this very familiar that it's kind of in the same way that in the early days of computer vision people thought of it very much it's like an inverse graphics thing in robotic grasping people thought of it as an inverse physics problem essentially you look at what's in front of you figure out the shapes then use your best estimate of the laws of physics to figure out where to put your fingers on you pick up the thing and it turns out that what works really well for robotic grasping instantiated in many different recent works including our own but also ones from many other labs is to use learning methods with some combination of either exhaustive simulation or like actual real-world trial-and-error and turns out that those things actually work really well and then you don't have to worry about solving geometry problems or physics problems so what are just by the way and the grasping what are the difficulties that have been worked on so one is like the materials of things maybe occlusions and the perception side why is it such a difficult why is picking stuff up such a difficult problem yeah it's a difficult problem because the number of things that you might have to deal with or the variety of things that you have to deal with is extremely large and oftentimes things that work for one class of objects won't work for other class of objects so if you if you get really good at picking up boxes and now you have to pick up plastic bags you know you just need to employ a very different strategy and there are many properties of objects that are more than just their geometry it has to do with you know the bits that that are easier to pick up the bits that are hard to pick up the bits that are more flexible the bits that will cause the thing to pivot and Bend and drop out of your hand versus the bits that resulted in I secure grasp things that are flexible things that if you pick them up the wrong way they'll fall upside down and the contents will spill out so there's all these little details that come up but the task is still kind of can be characterized as one task like there's a very clear notion of you did it or you didn't do it so in terms of spilling things there creeps in this notion that starts the sound and feel like common sense reasoning do you think solving the general problem of Robotics requires common sense reasoning requires general intelligence this kind of human level capability of you know like you said be robust and deal with uncertainty but also be able to sort of reason and assimilate different pieces of knowledge that you have yeah what do you what are your thoughts on the needs of common sense reasoning in the space of the general robotics problem so I'm gonna slightly dodge that question and say that I think I think maybe actually it's the other way around is that studying robotics can help us understand how to put common sense into our AI systems one way to think about common sense is that and and why our current systems might lack common sense is that common sense is a property is an emergent property of actually having to interact with a particular world a particular universe and get things done in that universe so you might think that for instance like a an image captioning system maybe it looks at pictures of the world and it types out English sentences so it kind of it kind of deals with our world and then you can easily construct situations where image captioning systems do things that defy common sense like give it a picture of a person wearing fur coat and we'll say it's a teddy bear but I think what's really happening in those settings is that the system doesn't actually live in our world it lives in its own world that consists of pixels and English sentences and doesn't actually consist of like you know having to put on a fur coat in the winter so you don't get cold so perhaps the the reason for the disconnect is that the systems that we have now is simply inhabit a different universe and if we build AI systems that are forced to deal with all of the messiness and complexity of our universe maybe they will have to acquire our common sense to essentially maximize their utility whereas the systems we're building now don't have to do that they can take some shortcut that's fascinating you've a couple of times already sort of reframed the role of robotics and this whole thing and for some reason I don't know if my way of thinking is common but I thought like we need to understand and solve intelligence in order to solve robotics and you're kind of framing it as no robotics is one of the best ways to just study artificial intelligence and build sort of like robotics is like the right space in which you get to explore some of the fundamental learning mechanisms fundamental sort of multimodal multitask aggregation of knowledge mechanisms that are required for general intelligence this really interesting way to think about it but let me ask about learning can the general sort of robotics the epitome of the robotics problem be solved purely through learning perhaps and to end learning sort of learning from scratch as opposed to injecting human expertise and rules and heuristics and so on I think that in terms of the spirit of the question I I would say yes I mean I think that in though in some ways it may be like an overly sharp dichotomy like you know I think that in some ways when we build algorithms we you know at some point a person does something like yeah there's always a person turned on the computer first you know implemented tensorflow but yeah I think that in terms of the in terms of the point that you're getting and I do think the answer is yes I think that I think that we can solve many problems that have previously required meticulous manual engineering through automated optimization techniques and actually one thing I will say on this topic is I don't think this is actually a very radical or very new idea I think people have have been thinking about automated optimization techniques as a way to do control for a very very long time and in some ways what's changed is really more than aim so you know today we would say that oh my robot does machine learning it does reinforcement learning maybe in the 1960s you'd say oh my robot is doing optimal control and maybe the difference between typing out a system of differential equations and doing feedback linearization versus training and neural net it's not such a large difference it's just you know pushing the optimization deeper and deeper into the thing well you think that were but with the especially deep learning that the accumulation of experiences in data form to form deep representations starts to feel like knowledge is supposed to optimal control so this feels like there's an accumulation of knowledge to the learning process yes yeah so I think that is a good point that one big difference between learning based systems and classic optimal control systems is that learning based systems and principle should get better and better the more they do something right and I do think that that's actually a very very powerful difference so if you look back at the world of expert systems is symbolic AI and so on of using logic to accumulate expertise human expertise human encoded expertise but do you think that will have a role the some points that the you know deep learning machine learning reinforcement learning has been in incredible results and breaks there wasn't just inspired thousands maybe millions of researchers but you know there's this less popular now but it used to be part of the idea of symbolic AI do you think that will have a role I think in some ways the kind of the the descendants of symbolic I actually already have a role so you know this is the the highly biased history from my perspective you say that well initially we thought that rational decision-making involves logical manipulation so you have some model the world expressed in term in terms of logic you have some query like what action do I take in order to for X to be true and then you manipulate your logical symbolic representation to get an answer what that turned into somewhere in the 1990s is well instead of building kind of predicates and statements that have true or false values will build probablistic systems where things have probabilities associated and probabilities of being true and false not turning the Bayes nets and that provided sort of a boost to what we're really you know still essentially logical inference systems just probabilistic logical inference systems and then people said well let's actually learn the individual probabilities inside these models and then people said well let's not even specify the nodes and the models let's just put a big neural net in there but in many ways I see these as actually can descendants from the same idea it's essentially instantiating rational decision-making by means of some inference process and learning by means of an optimization process so so in a sense I would say yes that it has a place and in many ways that place is or you know it already holds that place it's already in there yeah it's just by different it looks slightly different than there was before yeah but but at some there are some things that that we can think about that make this a little bit more obvious like if I train a big neural net model to predict what will happen in response to my robots actions and then I run probablistic inference meaning I invert that model to figure out the actions that lead to some plausible outcome like to me that seems like a kind of logic you have a model of the world it just happens to be expressed by a neural net and you are doing some inference procedure some sort of manipulation on that model to figure out you know the answer to a query that you have it's the interpretability it's the explained ability though that seems to be lacking more so because the nice thing about sort of experts systems is you can follow the reasoning of the system that to us mere humans is somehow compelling it it would it's just I don't know what to make of this fact that there's a human desire for intelligence systems to be able to convey in a poetic way to us why made the decisions it did like tell a convincing story and perhaps that's like a silly human thing like we shouldn't expect that of intelligent systems like we should be super happy that there is intelligent systems out there but if I were to sort of psychoanalyze the researchers at the time I would say expert systems connected to that part that desire for AI researchers for systems to be explainable I mean maybe on that topic do you have a hope that sort of inferences source of learning based systems will be as explainable as the dream was with expert systems for example I think it's a very complicated question because I think that in some ways the question of explain ability is kind of very closely tied to the question of of like performance like you know why do you want your system to explain itself well so that it's so that when it screws up you can kind of figure out why it did it right but it's nice but in some ways that that's a much bigger problem extra like your system might screw up and then it might screw up at how it explains itself or you might have some bugs somewhere so that it's not actually doing what was supposed to do so you know maybe a good way to view that problem is really as a problem as a bigger problem of verification and validation of which explained abilities sort of what one component I see I just see differently I see explained ability you you put it beautifully I think you actually summarized the field of explained ability but to me there's another aspect of explained ability which is like storytelling that has nothing to do with errors or with like the the survey it doesn't it uses errors as as elements of its story as opposed to a fundamental need to be explainable when errors occur it's just that for other intelligence systems to be in our world we seem to want to tell each other stories and that that's true in the political world is true in the academic world and that I you know neural networks are less capable of doing that or perhaps they're equally capable a storytelling storytelling may be it doesn't matter what the fundamentals of the system are you just need to be a good storyteller maybe one specific story I can tell you about in that space is actually about some work that was done by by my former collaborator who's now a professor at MIT named Jacob Andreas Jacob actually works on natural language processing but he had this idea to do a little bit of work in reinforcement learning and how on how natural language can basically structure the internals of policies trained with RL and one of the things he did is he set up a model that attempts to perform some tasks that's defined by a reward function but the model reads in a natural language instruction so this is a pretty common thing to do in instruction following so you tell it like you know go to the Red House and then supposed to go to the Red House but then one of the things that Jacob did is he treated that sentence not as a command from a person but as a representation of the internal kind of state of the of the of the mind of this policy essentially so that when it was faced with a new task what it would do is it would basically try to think of possible language descriptions attempt to do them and see if they led to the right outcome so it would kind of think out loud like you know I'm faced with this new task what am I gonna do let me go to the red house now that didn't work let me go to the Blue Room or something let me go to the green plant and once it got some reward it would say oh go to the green plant that's what's working I'm gonna go to the green plant and then you could look at the string that it came up with and that was a description of how it thought it should solve the problem so you could do you could basically incorporate language as internal state and you can start getting some handle on these kinds of things and then what I was kind of trying to get to is that also if you add to the reward function the convincing nough story hmm so I have another reward signal of like people who review that story how much they like it I says that you you know and initially that could be a hyper parameter or sort of hard-coded heuristic type of thing but it's an interesting notion of the convincing 'no story becoming part of the reward function the objective function of the explained ability it's in the world of sort of twitter and fake news that might be a scary notion that the the nature of truth may not be as important as the convincing 'no some the how convinced you are in telling the story around the facts well let me ask the the basic question you're one of the world-class researchers in reinforcement learning deeper and forceful learning certainly in the robotic space what is reinforcement learning i think that reinforcement learning refers to today is really just the kind of the modern incarnation of learning based control so classically reinforcement learning has a much more narrow definition which is that it's you know literally learning from reinforcement like the thing does something and then it gets a reward or punishment but really i think the way the term is used today is it's used for for more broadly to learning based control so some kind of system that's supposed to be controlling something and it uses data to get better and what is control means is action is the fundamental element yeah it means making rational decisions now and rational decisions are decisions that maximize a measure of utility and sequentially see many decisions time and time and time again now like so it's easier to see that kind of idea in the space of maybe games in the space of robotics do you see is bigger than that is it applicable like word were the limits of the applicability of reinforcement learning yeah so rational decision-making is essentially the the encapsulation of the AI problems you didn't through a particular lens so any problem that we would want a machine to do intelligent machine can likely be represented as a decision-making problem you're classifying images is a decision-making problem although not a sequential one typically you know controlling a chemical plant as a decision-making problem deciding what videos to recommend on YouTube is a decision-making problem and one of the really appealing things about reinforcement learning is if it does encapsulate the range of all these decision-making problems perhaps working on reinforcement learning is you know one of the ways to reach a very broad swath of AI problems but what what do you use the fundament the difference between reinforcement learning and maybe supervised machine learning so the reinforcement learning can be viewed as a generalization of supervised machine learning you can certainly cast supervised learning as a reinforcement learning problem you can just say your loss function is the negative of your reward but you have stronger assumptions you have the assumption that someone actually told you what the correct answer was that your data was iid and so on so you could view reinforcement learning is essentially relaxing some of those assumptions now that's not always a very productive way to look at it because if you actually have a supervised learning problem you'll probably solve it much more effectively by using supervised learning methods because it's easier but you can view reinforcement as a journalist a tional know for sure but they're fundamentally that's a mathematical statement that's absolutely correct but it seems that reinforcement learning the kind of tools we'll bring to the table today of today so maybe down the line everything will be a reinforcement learning problem just like you said image classification should be mapped to a reinforcement learning problem but today the tools and ideas the way we think about them are different sort of supervised learning has been used very effectively to solve basic narrow AI problems the reinforcement learning kind of represents the dream of AI it's very much so in the research space now in two captivating the imagination of people what we can do with intelligent systems but it hasn't yet had as wide of an impact as the supervised learning approaches so that so that I my question comes from more practical sense like what do you see is the gap between the more general reinforcement learning and the very specific yes it's a question decision-making with one sequence one step in the sequence of the supervised learning so for a practical standpoint I think that one one thing that is you know potentially a little tough now and this is I think something that we'll see this is a gap that we might see closing over the next couple of years is the ability of reinforcement learning algorithms to effectively utilize large amounts of prior data so one of the reasons why it's a bit difficult today to use reinforcement learning for all the things that we might want to use it for is that in most of the settings where we want to do rational decision-making it's a little bit tough to just deploy some policy that does crazy stuff and learns purely through trial and error it's much easier to collect a lot of data a lot of logs of some other policy that you've got and then maybe you you know if you can get a good policy out of that then you deploy it and let it kind of fine-tune a little bit but algorithmically it's quite difficult to do that so I think that once we figure out how to get reinforcement learning to bootstrap effectively from large data sets then we'll see very very rapid growth and applications of these technologies so this is what's referred to as off policy reinforcement learning or offline RL or batch RL and I think we're seeing a lot of research right now that that's bringing us closer and closer to that can you maybe paint a picture of the different methods she said off policy what's value-based reinforcement learning what's policy based was modelled based with soft policy on policy what are the different categories of reinforcement yeah so one way we can think about reinforcement learning is that it's um it's in some very fundamental way it's
Resume
Categories