File TXT tidak ditemukan.
Pieter Abbeel: Deep Reinforcement Learning | Lex Fridman Podcast #10
l-mYLq6eZPY • 2018-12-16
Transcript preview
Open
Kind: captions Language: en the following is a conversation with Petera Beal he's a professor UC Berkeley and the director of the Berkeley robotics learning lab he's one of the top researchers in the world working on how we make robots understand and interact with the world around them especially using imitation and deeper enforcement learning this conversation is part of the MIT course and artificial general intelligence and the artificial intelligence podcast if you enjoy it please subscribe on YouTube iTunes where your podcast provider of choice or simply connect with me on Twitter at Lex Friedman spelled Fri D and now here's my conversation with Peter a Biel you've mentioned that if there was one person you could meet you'll be Roger Federer so let me ask when do you think we will have a robot that fully autonomously can beat Roger Federer at tennis Roger Federer level player at tennis huh well first if you can make it happen for me to meet Roger let me know terms of getting a robot to beat him at tennis it's kind of an interesting question because for a lot of the challenges we think about in AI the software is really the missing piece but for something like this the hardware is nowhere near either like to really have a robot that can physically run around the Boston Dynamics robots are starting to get there but still not really human level ability to to run around and then swing a racket that's a hardware problem I don't think it's a harder problem only I think it's a hardware and a software problem I think it's both and I think they'll they'll have independent progress so I'd say the the hardware maybe in 10-15 years I'm just late not grass I've dressed with a sliding yeah oh plague I'm not sure what's Carter grass or clay the clay involves sliding which might be harder to master actually yeah but you're not limited to bipedal I mean I'm sure there's I can build a machine it's a whole different question of course you know you can if you can say okay this robot can be on wheels they can move around on wheels and can be designed differently then I think that that can be done sooner probably than a full humanoid type of setup what do you think is swing a racket so you've worked at basic manipulation how hard do you think is the task of swinging or racket would be able to hit a nice backhand or a forehand okay let's say let's say we just set up stationary a nice robot arm let's say you know a standard industrial arm and it can wash the ball come and then swing the racket it's a good question I'm not sure it would be super hard to do I mean I'm sure it would require a lot if we do it breed with reinforced Maleny would require a lot of trial and error it's not gonna swing it right the first time around but yeah I don't I don't see why I couldn't see the right way I think it's learn about I think if you set up a ball machine let's say on one side and then a robot with a tennis racket on the other side I think it's learn about and maybe a little bit of pre training and simulation yeah I think that's I think that's feasible I think I think the swinging the racket is feasible I'd be very interesting to see how much precision it can get listen I mean that's that's where I mean some of the human players can hit it on the lines which is very high precision with spin this win is it is an interesting whether RL can learn to put a spin on the ball well you got me interested maybe someday we'll set this is your answer is basically okay for this problem it sounds fascinating but for the general problem of a tennis player we might be a little bit farther away what's the most impressive thing you've seen a robot do in the physical world so physically for me it's the Boston Dynamics videos always just ring home and just super impressed recently the robot running up the stairs doing the parkour type thing I mean yes we don't know what's underneath they don't really write a lot of detail but even if it's hard coded underneath which you might or might not be just the physical abilities of doing that parkour that's a very impressive so a lot right there have you met spot many or any of those robots in person might spot mini last hearing in April at the Mars event that Jeff Bezos organizes they brought it out there and it was nicely falling around Jeff when Jeff left the room they had it follow him along which is pretty impressive so I think there's some confidence to know that there's no learning going on in those robots the psychology of it so while knowing that while knowing there's not if there's any learning going on it's very limited I met spot Minnie earlier this year and knowing everything that's going on having one-on-one interaction so I got to spend some time alone and there's a immediately a deep connection on the psychological level even though you know the fundamentals how it works there's something magical so do you think about the psychology of interacting with robots in the physical world even you just showed me the pr2 the the robot and and there was a little bit something like a face head a little bit something like a face there's something that immediately draws you to it do you think about that aspect of of the robotics problem well it's very hard with bread here we'll give him a name Berkeley robot for the elimination of tedious tasks is very hard to not think of the robot as a person and it seems like everybody calls him a he for whatever reason but that also makes it more a person than if it was a it and it's it seems pretty natural to think of it that way this past weekend really struck me I've seen pepper many times on on videos but then I was at an event organized by this was by fidelity and they had scripted pepper to help moderate some sessions and yet scripted pepper to have the personality of a child a little bit and it was very hard to not think of it as its own person in some sense because it was just kind of jumping it would just jump into conversation making it very interactive moderate will be saying pepper just jump in hold on how about me can I participate in this doing it just like I heard this is like like a person and I was 100% scripted and even then it was hard not to have that sense of somehow there is something there so as we have robots interact in this physical world is that a signal that can be used in reinforcement learning you've you've worked a little bit in this direction but do you think that's that psychology can be somehow pulled in now so that's a question I would say a lot a lot of people ask and I think part of why they ask it is they're thinking about how unique are we really still ask people like after they see some results they see a computer play go to say computer do this that they're like ok but can it really have emotion can it really interact with us in that way and then once you're around robots you already start feeling it and I think that kind of maybe mythologically the way that I think of it is if you run something like reinforce some Linux about optimizing some objective and there's no reason that D object couldn't be tied into how much there's a person like interacting with this system and why could not the reinforcement learning system optimized for their robot being fun to be around and why wouldn't it then naturally become more and more interactive and more and more maybe like a person or like a pet I don't know what it would exactly be but more more have those features and acquire them automatically as long as you can formalize an objective of what it means to like something what how you exhibit what's the ground truth how do you how do you get the reward from human cause you have to somehow collect that information within you human but you you're saying if you can formulate as an objective it can be learned there is no reason it couldn't emergent through learning and maybe one way to formulate has an objective you wouldn't have to necessarily score it explicitly so standard rewards are numbers and numbers are hard to come by this is a 1.5 or 0.7 on some scale it's very hard to do for a person but much easier is for a person to say okay what you did the last five minutes was much nicer than we did the previous five minutes and that now gives a comparison compare and in fact there have been some results in that for example Paul Christiana and collaborators at open e I had the hopper madoka Hopper one legged robot the Batman's little back flips yeah purely from feedback I like this better than that that's kind of equally good and after a bunch of interactions it figured out what it was the person was asking for it namely a back flip and so I think the same thing od wasn't trying to do a back flip it was just getting a score from the comparison score from the person based on hers and having a mind in their own mind what I wanted to do a back flip but the robot didn't know what it was supposed to be doing it just knew that sometimes the person said this is better this is worse and then the robot figure it out what the person was actually after was a back flip and I'd imagine the same would be true for things like more interactive robots that the robot would figure out over time oh this kind of thing apparently has appreciated more than this other kind of thing so when I first picked up Sutton's Richard Sutton's reinforcement learning book before sort of this deep learning before the re-emergence of neural networks is a powerful mechanism for machine learning IRL seemed to me like magic as a as beautiful so that seemed like what intelligence is RL reinforcement learning so how do you think we can possibly learn anything about the world when the reward for the actions is delayed is so sparse like where is why do you think RL works why do you think you can learn anything under such sparse awards whether it's regular reinforcement learning a deeper enforcement learning what's your intuition the kind of part of that is why is RL why does it need so many samples so many experiences to learn from because really what's happening is when you have a sparse reward you do something maybe for like I don't know you take a hundred actions and then you get a reward and maybe get like a score of three and I'm like okay three not sure what that means you go again and now I get to and now you know that that sequence of hundred actions that you did the second time around somehow was worse than the sequence of hundred actions you did the first time around but that's tough to now know which one of those were better or worse some might have been good and bad in either one and so that's why I need so many experience but once you have enough experiences effectively rlist easing that apart it's time to say okay when what is consistently there when you get a higher reward and what's consistently there when you get a lower reward and then kind of the magic of sums is the policy grant update is to say now let's update the neural network to make the actions that were kind of present when things are good more likely and make the actions that are present when things are not as good less likely so that's that is the counterpoint but it seems like you would need to run it a lot more than you do even though right now people could say that RL is very inefficient but it seems to be way more efficient than one would imagine on paper that the the simple updates to the policy the policy gradient that that's somehow you can learn is exactly users said what are the common actions that seem to produce some good results that that somehow can learn anything it seems counterintuitive at least did is there some intuition behind yeah so I think there's a few ways to think about this the way I Tennant about it mostly originally when so when we started working on deep reinforcement learning here at Berkeley which was maybe two thousand eleven twelve thirteen around that time challenge Schulman was a PhD student initially kind of driving it too forward here and did it the way we thought about it at the time was if you think about rectified linear units or kind of break the fire type neural networks what do you get you get something that's piecewise linear feedback control and if you look at the literature linear feedback control is extremely successful can solve many many problems surprisingly well I remember for example when we did helicopter flight if you're in a stationary flight regime not a non station by the stationary flight regime like hover you can use linear feedback control to stabilize a helicopter a very complex dynamical system but the controller is relatively simple and so I think that's a big part of is that if you do feedback control even though the system you control can be very very complex often relatively simple control architectures can already do a lot but then also just linear is not good enough and so one way you can think of these neural networks is that in sometimes they tile the space which people were already trying to do more by hand or with finite state machines say this linear controller here this leaner controller here you'll network learns that alva spins a linear controller here another linear controller here but it's more subtle than that yeah and so it's benefiting from this linear control aspect is benefiting from the tiling but it's somehow tiling it one dimension at a time because if let's say you have a two layer network even the hidden layer you make a transition from active to inactive or the other way around that is essentially one axis but not acts as a line but one direction that you change and so you have this kind of very gradual tiling of the space we have a lot of sharing between the linear controllers that tile the space and that was always my intuition s of why to expect that this might work pretty well it's essentially leveraging the fact that linear feedback control is so good but of course not enough and this is a gradual tiling of the space with linear feedback controls that share a lot of expertise across them so that that's that's really nice intuition do you think that scales to the more and more general problems of when you start going up the number of controllers dimensions when you start going down in terms of how often you get a clean reward signal does that intuition carry forward to those crazy or weird or worlds that we think of as the real world so I think where things get really tricky in the real world compared to the things we've looked at so far with great success in reinforcement learning is the time skills which takes us to an extreme so when you think about the real world I mean I don't know maybe some student decided to do a a PhD here right okay that's that's the decision that's a very high-level decision but if you think about their lives I mean any person's life it's a sequence of muscle fiber contractions and relaxations and that's how you interact with the world and that's a very high frequency control thing but it's ultimately what you do and how you affect the world until I guess we have brain readings and you can maybe do it slightly differently but typically that's how you affect the world and the decision of doing a PhD is like so abstract relative to what you're actually doing in the world and I think that's where credit assignment becomes just completely beyond what any current RL algorithm can do and we need hierarchical reasoning at a level that is just not available at all yet where do you think we can pick up hierarchical reasoning by which mechanisms yeah so maybe let me highlight what I think the limitations are of what already was done 20-30 years ago in fact you'll find reasoning systems that reason over relatively long horizons but the problems that they were not grounded in the real world so people would have to hand design some kind of logical dynamical descriptions of the world and that didn't tie into perception and so then time to real objects and so forth and so that that was a big gap now with deep learning we start having the ability to really see with sensors process that and understand what's in the world and so it's a good time to try to bring these things together one I see a few ways of getting there one way to get there would be to say deep learning can get bolted on somehow to some of these more traditional approaches now bolted on would probably mean you need to do some kind of end-to-end training where you say my deep learning processing somehow leads to a representation that in Perm uses some kind of traditional underlying dynamical systems that can be used for planning and that's for example the direction Aviv Tamar and the North Korea touch here have been pushing with causal info gone and of course other people to that that's that's one way can we somehow force it into the form factor that is amenable to reasoning another direction we've been thinking about for a long time and they didn't make any progress on was more information theoretic approaches so the idea there was that what it means to take high-level action is to take and choose a latent variable now that tells you a lot about what's gonna be the case in the future because that's what it means to to take a high-level action I say what I decide I'm gonna navigate to the gas station because need to get gas for my car well that'll now take five minutes to get there but the fact that I get there I could already tell that from the high-level action it took much earlier that we had a very hard time getting success with not saying it's a dead-end necessarily but we had a lot of trouble getting that to work and then we start revisiting the notion of what are we really trying to achieve what we're trying to achieve is non ously hierarchy per se but you could think about what does hierarchy give us what it's we hope it would give us is better credit assignment kind of what is better credit ominous is given is giving us it gives us faster learning right and so faster learning is ultimately maybe what we're after and so that's what we ended up with the RL squared paper on learning - reinforcement learn which at a time rocky duan LED and that's exactly the meta learning approach or is say okay we don't know how to design hierarchy we know what we want to get from it let's just enter an optimize for what want to get from it and see if it might emerging we saw things emerge the maze navigation had consistent motion down hallways which is what you want a hierarchical control should say I want to go down this hallway and then when there is an option to take a turn I can this art will take a turn or not and repeat even had the notion of where have you been before or not do not revisit places you've been before it still didn't scale yet to the real world kind of scenarios I think you had in mind but it was some sign of life that maybe you can meta learn these hierarchal concepts I mean it seems like through these meta learning concepts get at the what I think is one of the hardest and most important problems of AI which is transfer learning so it's generalization how far along this journey towards building general systems are we being able to do transfer learning well so there's some signs that you can generalize a little bit but do you think we're on the right path or it's totally different breakthroughs are needed to be able to transfer knowledge between different learned models yeah I'm I'm pretty tired on this and then I think there are some very many there there's just some very impressive results already right I mean yes I would say when even with the initial and a big breakthrough in 2012 with Aleks net right the initial the initial thing is okay great this does better on imagenet hands image recognition but then immediately thereafter that was of course the notion that Wow what was learned on image net and you now want to solve a new task you can fine-tune Aleks net for new tasks and that was often found to be the even bigger deal that you learned something that was reusable which was not often the case before usually machine learning you learned something for one scenario and that was it and that's really exciting I mean that's just a huge application that's probably the biggest success of transfer learning today in terms of scope and impact that was huge breakthrough and then recently I feel like similar kind of but by scaling things up it seems like this has been expanded upon like people training even bigger networks they might transfer even better if you looked at for example some of the opening eye results on language models and some of the recent Google results on language models they are learned for just prediction and then they get reused for other tasks and so I think there is something there where somehow if you train a big enough model on enough things it seems to transfer some deepmind results I thought were very impressive unreal results where it was learned to navigate mazes in ways where it wasn't just reinforcement learning going to have other objectives was optimizing for so I think there's a lot of interesting results already I think maybe words hard to wrap my head around this to which extend or when do we call something generalization right or the levels of generalization involved in these different tasks alright so you draw this by the way just to frame things you've heard you say somewhere it's the difference between learning to master versus learning to generalize that it's a nice line to think about and it guess you're saying that's a gray area of what learning to master and learning to generalize where once think I might have heard this I might have heard it somewhere else and I think it might have been one of one of your interviews and maybe the one with yo show Benjamin on hundred percent sure but I like the example I'm gonna act not sure who it was but the example was essentially if you use current deep learning techniques what we're doing to predict let's say the relative motion of our planets it would do pretty well but then now if a massive new mass enters our solar system it would prompt predict what will happen right and that's a different kind of journal is a Shahnaz a generalization that relies on the ultimate simplest simplest explanation that we have available today to explain the motion of planets where I was just pattern recognition could predict our current solar system motion pretty well no problem and so I think that's an example of a kind of generalization that is a little different from what we've achieved so far and it's not clear if just you know regularizing more I'm forcing it to come up with a simpler simpler simple experience but it's not simple but that's what physics researchers do right to say can I make this even simpler how simple can I get this what's a simplest equation I can explain everything right yeah the master equation for the entire dynamics of the universe we haven't really pushed that direction as hard in in deep learning I would say not sure if it should be pushed but it seems a kind of generalization you get from that that you don't get in our current methods so far so I just talked to vladimir vapnik for example who was a statistician the statistical learning and he kind of dreams of creating these are the a equals e equals mc-squared for learning right the general theory of learning do you think that's a fruitless pursuit in the near term in within the next several decades I think that's a really interesting pursuit and in the following sense and that there is a lot of evidence that the brain is pretty modular and so I wouldn't maybe think of it as the theory maybe the the underlying theory but more kind of the principle where there have been findings where people who are blind will use the part of the brain usually used for vision for other functions and even after some kind of if people will get rewired in some way they might I'm able to reuse parts of their brain for other functions and so what that suggests is some kind of modularity and I think it is a pretty natural thing to strive forward to see can we find that modularity can we find this thing of course it's not every part of the brain is not exactly the same not everything can be rewired arbitrarily but if you think of things like the neocortex which is pretty big part of the brain that seems fairly modular from what the findings so far can you design something equally modular and if you can just grow it it becomes more capable probably I think that would be the kind of interesting underlying principle to shoot for that is not unrealistic do you think you prefer math or empirical trial and error for the discovery of the essence of what it means to do something intelligent so reinforcement learning embodies both groups right then prove that something converges prove the bounds and then at the same time a lot of those successes are well let's try this and see if it works so which do you gravitate towards how do you think of those two parts of your brain so maybe I would prefer we could make the progress with mathematics and the reason maybe I would prefer that is because because often if you have something you can mathematically formalise you can leapfrog a lot of experimentation and experimentation takes a long time to get through and a lot of trial and error kind of reinforcement learning your research process but you need to do a lot of trial and error before you get to a success so if we can leapfrog doubt in my mind that's what the math is about and hopefully once you do a bunch of experiments you start seeing a pattern you can do some derivations that leapfrog some experiments but I agree with you I mean in practice a lot of the progress has been such that we have not been able to find the math that allows it to leapfrog ahead and we are kind of making gradual progress one step at a time a new experiment here a new experiment there that gives us new insights and gradually building up but not getting to something yet where we're just okay here's an equation that now explains how you know that would be have been two years of experimentation to get there but this tells us what the results going to be unfortunately not so much yes not so much yeah but your hope is there in trying to teach robots or systems to do everyday tasks or even in simulation what what do you think you're more excited about imitation learning or self play so letting robots learn from humans or letting robots plan their own to try to figure out in their own way and eventually play eventually interact with humans or to solve whatever problem is what's the more exciting to you what's more promising you think as a research direction so when we look at self play what's so beautiful about it is goes back to kind of the challenges in reinforcement learning so the challenge of reinforced learning is getting signal and if you don't never succeed you don't get any signal in self play you're on both sides so one of you succeeds and the beauty is also one of you fails and so you see the contrast you see the one version of me that it better the other version and so every time you play yourself you get signal and so whenever you can turn something into self play you're in a beautiful situation where you can naturally learn much more quickly than in most other reinforced learning environments so I think I think if somehow we can turn more reinforcement learning problems into self play formulations that would go real really far so far south play has been largely around games where there is natural opponents but if we could do self play if for other things and let's say I don't know a robot learns to build a house I mean that's a pretty advanced thing to try to do for a robot but maybe it tries to build a hut or something if that can be done through self play it would learn a lot more quickly if somebody can figure that out and I think that would be something where it goes closer to kind of the mathematical leap frogging where somebody figures out a formalism to it's okay any RL problem by playing this and this idea you can turn it into a self play problem where you get signal a lot more easily reality is many problems we don't know how to turn the self lay and so either we need to provide detailed reward that doesn't just reward for achieving a goal but rewards for making progress and that becomes time-consuming and once you're starting to do that let's say you want a robot to do something you need to give all this detailed reward well why not just give a demonstration right because why not just show the robot and now the question is how do you show the robot one way to show is to tally operate the robot and then the robot really experiences things and that's nice because that's really high signal-to-noise ratio data and we've done a lot of that and you teach your robot skills in just 10 minutes you can teach your robot a new basic skill like okay pick up the bottle place it somewhere else that's a skill no matter where the bottle starts maybe it always goes on to a target or something that's fairly is a teacher about with tally up now what's even more interesting if you can now teach robot through third person learning where the robot watches you do something and doesn't experience it but just watches it and says okay well if you're showing me that that means I should be doing this and I'm not gonna be using your hand because I don't get to control your hand but I'm gonna use my hand I'd do that mapping and so that's where I think one of the big breakthroughs has happened this year this was led by Chelsea Finn here it's almost like machine translation for demonstrations were you have a human demonstration and the robot learns to translated into what it means for the robot to do it and that was a meta learning for a Malaysian learn from one to get the other and that I think opens up a lot of opportunities to learn a lot more quickly so my focus is on autonomous vehicles do you think this approach of third-person watching is about the autonomous driving is amenable to this a kind of approach so for autonomous driving I would say it's third-person is slightly easier and the reason I'm gonna say slightly easier to do a third-person is because the hard dynamics are very well understood so the easier than of first-person you mean or easier so I think the distinction between third-person and first-person is not a very important distinction for autonomous driving they're very similar because the distinction is really about who turns the steering wheel and or maybe I'll let me put it differently how to get from a point where you are now to a point let's say a couple meters in front of you and that's a problem that's very well understood and that's the only distinction being third and first-person there whereas with the robot manipulation interaction forces are very complex and it's still a very different thing for autonomous driving I think there is still the question imitation versus RL so imitation gives you a lot more signal I think where imitation is lacking and needs some extra machinery is it doesn't in its normal format doesn't think about goals or objectives and of course there are versions of imitation learning inverse reinforce learning type imitation which also thinks about goals I think then we're getting much closer but I think it's very hard to think of a fully reactive car generalizing well if it really doesn't have a notion of objectives to generalize well to the kind of general that you would want you'd want more than just that reactivity that you get from just behavioral cloning / supervised learning so a lot of the work whether its self play imitation learning would benefit significantly from simulation from effective simulation and you're doing a lot of stuff in the physical world and in simulation do you have hope for greater and greater power of simulation loop being boundless eventually to where most of what we need to operate in the physical world would could be simulated to a degree that's directly transferable to the physical world are we still very far away from that so I think we could even rephrase that question in some sense please so the power of simulation right simulators get better and better of course become stronger and we can learn more in simulation but there's also another version which is where you said the simulator doesn't even have to be that precise as long as is somewhat representative and instead of trying to get one simulator that is sufficiently precise to learn in and transfer really well to the real world I'm gonna build many simulators ensemble of simulators ensemble of simulators not any single one of them is sufficiently representative of the real world such that it would work if you train in there but if you train in all of them then there is something that's good in all of them the real world will just be you know another one that's you know cannot identical to any one of them but just another one of them another sample from the distribution of simulators exact we do live in a simulation so this is just like oh one other one I'm not sure about that video it's definitely a very advanced simulator if it is yeah it's pretty good one I've talked to this to Russell is something you think about a little bit too of course you're like really trying to build these systems but do you think about the future of AI a lot of people have concerned about safety how do you think about AI safety as you build robots that are operating in the physical world what what is uh yeah how do you approach this problem in an engineering kind of way in a systematic way so what a robot is doing things you kind of have a few notions of safety to worry about one is that Throwbot is physically strong and of course could do a lot of damage same for cars which we can think of as robots do in some way and this could be completely unintentional so it could be not the kind of long-term AI safety concerns that okay a is smarter than us and now what do we do but it could be just very practical okay this robot if it makes a mistake whether the results going to be of course simulation comes in a lot there too to test in simulation it's a difficult question and I'm always wondering like I was wondering at let's go back to drivings a lot of people know driving well of course what do we do to test somebody for driving right to get a driver's license what do they really do I mean you fill out some test and then you drive and I mean perfume in suburban California the driving test is just you drive around the block pull over you do a stop sign successfully and then you know you pull over again and you pretty much done and you're like okay if a self-driving car did dad would you trust it that it can drive and be like no that's not enough for me to trust but somehow for humans we've figured out that somebody being able to do that it's representative of them being able to do a lot of other things and so I think somehow for you must we figured out representative tests of what it means if you can do this what you can really do of course testing you must you must all want to be tested at all times self-driving cars the robots can be tested more often probably you can have replicas that get testament are known to be identical because they use the same neural net and so forth but still I feel like we don't have this kind of unit tests or proper tests for for robots and I think there's something very interesting to be thought about there especially as you update things your software improves you have a better self driving car suite you updated how do you know it's indeed more capable on everything than what you had before that you didn't have any bad things creep into it so I think that's a very interesting direction of research that there is no real solution yet except that's somehow for you must we do because we say okay you have a driving test you passed you can go on the road now and you must have accents every like a million or ten million miles something something pretty phenomenal compared to that short test yeah that is being done so let me ask you've mentioned you mentioned that Andrew Aang by example showed you the value of kindness and to do you think the space of policies good policies for humans and for AI is populated by policies that with kindness or ones that are the opposite exploitation even evil so if you just look at the sea of policies we operate under as human beings or if AI system had to operate in this real world do you think it's really easy to find policies that are full of kindness like we naturally fall into them or is it like a very hard optimization problem I mean there is kind of two optimizations happening for humans right so for you most was kinda the very long-term optimization which evolution has done for us and we're kind of predisposed to like certain things and that's in sometimes what makes our learning easier because I mean we know things like pain and hunger and thirst and the fact that we know about those is not something that we were taught that's kind of innate when we're hungry were unhappy when we're thirsty were unhappy when we have pain we're unhappy and ultimately evolution built that into us to think about this thing so so I think there is a notion that it seems somehow humans evolved in general to prefer to get along in some ways but at the same time also to be very territorial and kind of centric to their own tribe is it like it seems like that's the kind of space we converge down to it I mean I'm not an expert in anthropology but it seems like we're very kind of good within our own tribe but need to be taught but to be nice to other tribes well if you look at Steven Pinker he highlights is pretty nicely in better better angels of our nature where he talks about violence decreasing over time consistently so whatever attention whatever teams we pick it seems that the long arc of history goes towards us getting along more and more so I hope so so do you think that do you think it's possible to cheat teach RL bass robots the this kind of kindness this kind of ability to interact with humans this kind of policy even - let me ask let me ask a fun one do you think it's possible to teach RL based robot to love a human being and to inspire that human to love the robot back so - like RL based algorithm that leads to a happy marriage that's interesting question maybe I'll oh I'll answer it with with another question right I mean it's it but I'll come back to it so another question you can have is okay I mean how close does some people's happiness get from interacting with just a really nice dog like I mean dogs you come home that's what dogs did they greet you they're excited it makes you happy when you're coming home to your dog just like okay this is exciting they're always happy when I'm here and if they don't greet you because maybe whatever your partner took them on a trip or something you might not be nearly as happy when you get home right and so the kind of it seems like the level of reasoning a dog houses is pretty sophisticated but then it's still not yet at the level of human reasoning and so it seems like we don't even need to achieve human love reason to get like very strong affection with humans and so my thinking is why not right why couldn't with an AI couldn't we achieve the kind of level of affection that humans feel among each other or with friendly animals and so forth it's a question is it a good thing for us or not that misses another going right because I mean but I don't see why not why not yeah so he almost says love was the answer maybe he should say love is the objective function and then RL is the answer maybe I'll Peter thank you so much I don't want to take up more of your time thank you so much for talking today well thanks for coming by great to have you visit you
Resume
Categories