File TXT tidak ditemukan.
Transcript
MrIFte_rOh0 • What is Deep Reinforcement Learning? (David Silver, DeepMind) | AI Podcast Clips
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0380_MrIFte_rOh0.txt
Kind: captions Language: en if it's okay can we take a step back and kind of ask the basic question of what is to you reinforcement learning so reinforcement learning is the study and the science and the problem of intelligence in the form of an agent that interacts with an environment so the problem is trying to self is represented by some environment like the world in which that agent is situated and the goal of RL is clear that the agent gets to take actions those actions have some effect on the environment and the environment gives back an observation to the agent saying you know this is what you see your sense and one special thing which it gives back is it's called the reward signal how well it's doing in the environment and the reinforcement learning problem is to simply take actions over time so as to maximize that reward signal so a couple of basic questions what types of RL approaches are there so I don't know if there's a nice brief in words way to paint a picture of sort of value based model based policy based reinforcement learning yeah so now if we think about okay so there's this ambitious problem definition of RL it's really you know it's truly ambitious it's trying to capture and encircle all of the things in which an agent interacts with an environment and say well how can we formalize and understand what it means to to crack that now let's think about the solution method well how do you solve a really hard problem like that well one approach you can take is is to decompose that that very hard problem into into pieces that work together to solve that hard problem and and so you can kind of look at the decomposition that's inside the agents head if you like and ask well what form does that decomposition take and some of the most common pieces that people use when they're kind of putting this system the solution method together some of the most common pieces that people use are whether or not that solution has a value function that means is it trying to predict explicitly trying to predict how much reward it will get in the future does it have a representation of a policy that means something which is deciding how to pick actions is is that decision-making process explicitly represented and is there a model in the system is there something which is explicitly trying to predict what will happen in the environment and so those three pieces are to me some of the most common building blocks and I understand the different choices in RL as choices of whether or not to use those building blocks when you're trying to decompose the solution no should I have a value function represented so they have a policy represented should I have a model represented and there are combinations of those pieces and of course other things that you could add to add into the picture as well but those those three fundamental choices give rise to some of the branches of RL with which we have very familiar and so those as you mentioned there is the choice of what's specified or modeled explicitly and the idea is that all of these are somehow implicitly learned within the system so it's almost the choice of how you approach a problem do you see those as fundamental differences are these almost like small specifics like the details of how you solve the problem but they're not fundamentally different from each other I think the the fundamental idea is is may be at the higher level the fundamental idea is the first step of the decomposition is really to say well how are we really going to solve any kind of problem where you're trying to figure out how to take actions and just from this stream of observations you know you've got some agents situated it's sensorimotor stream and getting all these observations here and getting to take these actions and what should it do how can even broach that problem you know maybe the complexity of the world is so great that you can't even imagine how to build a system that would that would understand how to deal with that and so the first step of this decomposition is to say well you have to learn the system has to learn for itself and so note that the reinforcement learning problem doesn't actually stipulate that you have to learn but you could maximize your rewards without learning it would just I wouldn't do a very good job event yes so learning is required because it's the only way to achieve good performance in any sufficiently large and complex environment so so that's the first step so that's deputy of commonality to all of the other pieces because now you might ask well what should you be learning what is learning even mean you know in this sense you know learning might mean well you're trying to update the parameters of some system which is then the thing that actually picks the actions and and those parameters could be representing anything they could be parameterizing a value function or a model or a policy and so in that sense there's a lot of commonality in that whatever is being represented there is the thing which is being learned and it's being learned with the ultimate goal of maximizing rewards but that the way in which you decompose the problem is is is really what gives the semantics to the whole system like are you trying to learn something to predict well like a value function or a model are you learning something to perform well like a policy and and the form of that objective like it's kind of giving the semantics to the system and so it really is at the next level down a fundamental choice and we have to make those fundamental choices a system designers or enable our our algorithms to be able to learn how to make those choices for themselves so then the next step you mentioned the very for the very first thing you have to deal with is can you even take in this huge stream of observations and do anything with it so the natural next basic question is what is the what is deeper enforceable learning and what is this idea of using neural networks to deal with this huge incoming stream so amongst all the approaches for reinforcement learning deep reinforcement learning is one family of solution methods that tries to utilize powerful representations that are offered by neural networks to represent any of these different components of the solution of the agent like whether it's the value function or the model or the policy the idea of deep learning is to say well here's a powerful tool kit that's so powerful that it's it's Universal in the sense that it can represent any function and it can learn any function and so if we can leverage that universality that means that whatever whatever we need to represent for our policy or a value function for a model deep learning can do it so that deep learning is is one approach that offers us a toolkit that is has no ceiling to its performance that as we start to put more resources into the system or more memory and more computation and more more data more experience of more interactions with the environment that these are systems that can just get better and better and better at doing whatever the job is they've asked them to do whatever we've asked that function to represent it can learn a function that does a better and better job of representing that that knowledge whether that knowledge be estimating how well you're going to do in the world the value function whether it's going to be choosing what to do in the world the policy or whether it's understanding the world itself what's going to happen next the model nevertheless the the the fact that neural networks are able to learn incredibly complex representations that allow you to do the policy the model or the value function is at least to my mind exceptionally beautiful and surprising like what was it is it surprising was it surprising to you can you still believe it works as well as it does do you have good intuition about why it works at all and works as well as it does I think let me take two parts to that question I think it's not surprising to me that the idea of reinforcement learning works because in some sense I think it's the I feel it's the only thing which can ultimately and so I feel we have to we have to address it and there must be success is possible because we have examples of intelligence and it must at some level be able to possible to acquire experience and use that experience to do better in a way which is meaningful to environments at the complexity that humans can deal with it must be am I surprised that our current systems can do as well as they can do I think one of the big surprises for me and a lot of the community is really the fact that deep learning can continue to perform so well despite than the fact that these neural networks that they're representing have these incredibly nonlinear kind of bumpy surfaces which to our kind of low dimensional intuitions make it feel like surely you're just gonna get stuck and learning will get stuck because you won't be able to make any further progress and yet the big surprise is that learning continues and and these what appear to be local Optima turned out not to be because in high dimensions when we make really big neural nets there's always a way out and there's a way to go even lower and then he's still not another local Optima because there's some other pathway that will take you out and take you lower still and so no matter where you are learning can proceed and do better and better and breath better without bound and so that is a surprising and beautiful property of neural nets which I find elegant and beautiful and and somewhat shocking that it turns out to be the case as you said surely like to our low dimensional intuitions that's surprising yeah yeah we're very we're very tuned to working within a three-dimensional environment and so to start to visualize what a billion dimensional neural network surface that you're trying to optimize over what that even looks like is very hard for us and so I think that really if you try to account for for the essentially the AI winter we're where people gave up on unil networks I think it's really down to that that lack of ability to generalize from from low dimensions to high dimensions because back then we were in the low dimensional case people could only build neural Nets with you know 50 nodes in them or something and to to imagine that it might be possible to build a billion dimensionally on there and it might have a completely different qualitatively different property was very hard to anticipate and I think even now we're starting to build the the theory to support that and and it's incomplete at the moment but all of the theory seems to be pointing in the direction that indeed this is an approach which which truly is universal both in its representational capacity which was known but also in its learning ability which is which is surprising and it makes one wonder what else were missing yes for a low demand intuitions yet there will seem obvious once is discovered I often wonder you know when we one day do have a eyes which are superhuman in their abilities to to understand the world what will they think of the algorithms that we developed back now will it be you know looking back at these these days and you know and and and thinking that well will we look back and feel that these algorithms were were naive fair steps or will they still be the fundamental ideas which are used even in 100 thousand ten thousand years yeah I Nels and I they'll they'll watch back to this conversation and I would the smile maybe a little bit of a laugh I mean my senses I think it just like on we used to think that the Sun revolved around the earth they'll see our systems of today reinforcement learning as too complicated that the answer was simple all along there's something I just just think you said in a game of Go I mean I love the systems of like cellular automata that there's simple rules from which incredible complexity emerges so it feels like there might be some very simple approaches just like where Sutton says right these simple methods or with compute over time seem to prove to be the most effective I have 100% agree I think that if we try to anticipate what will generalize well into the future I think it's likely to be the case that it's the simple clear ideas which will have the longest legs and what which will care furthest into the future nevertheless we're in a situation where we need to make things work day and today and sometimes that requires putting together more complex systems where we don't have the full answers yet as to what those minimal ingredients might be you