File TXT tidak ditemukan.
Transcript
MrIFte_rOh0 • What is Deep Reinforcement Learning? (David Silver, DeepMind) | AI Podcast Clips
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0380_MrIFte_rOh0.txt
Kind: captions
Language: en
if it's okay can we take a step back and
kind of ask the basic question of what
is to you reinforcement learning so
reinforcement learning is the study and
the science and the problem of
intelligence in the form of an agent
that interacts with an environment so
the problem is trying to self is
represented by some environment like the
world in which that agent is situated
and the goal of RL is clear that the
agent gets to take actions those actions
have some effect on the environment and
the environment gives back an
observation to the agent saying you know
this is what you see your sense and one
special thing which it gives back is
it's called the reward signal how well
it's doing in the environment and the
reinforcement learning problem is to
simply take actions over time so as to
maximize that reward signal so a couple
of basic questions what types of RL
approaches are there so I don't know if
there's a nice brief in words way to
paint a picture of sort of value based
model based policy based reinforcement
learning yeah so now if we think about
okay so there's this ambitious problem
definition of RL it's really you know
it's truly ambitious it's trying to
capture and encircle all of the things
in which an agent interacts with an
environment and say well how can we
formalize and understand what it means
to to crack that now let's think about
the solution method well how do you
solve a really hard problem like that
well one approach you can take is is to
decompose that that very hard problem
into into pieces that work together to
solve that hard problem and and so you
can kind of look at the decomposition
that's inside the agents head if you
like and ask well what form does that
decomposition take and some of the most
common pieces that people use when
they're kind of putting this system the
solution method together some of the
most common pieces that people use are
whether or not that solution has a value
function that means is it trying to
predict explicitly trying to predict how
much reward it will get in the future
does it have a representation of a
policy that means something which is
deciding how to pick actions is is that
decision-making process explicitly
represented
and is there a model in the system is
there something which is explicitly
trying to predict what will happen in
the environment and so those three
pieces are to me some of the most common
building blocks and I understand the
different choices in RL as choices of
whether or not to use those building
blocks when you're trying to decompose
the solution no should I have a value
function represented so they have a
policy represented should I have a model
represented and there are combinations
of those pieces and of course other
things that you could add to add into
the picture as well but those those
three fundamental choices give rise to
some of the branches of RL with which we
have very familiar and so those as you
mentioned there is the choice of what's
specified or modeled explicitly and the
idea is that all of these are somehow
implicitly learned within the system so
it's almost the choice of how you
approach a problem do you see those as
fundamental differences are these almost
like small specifics like the details of
how you solve the problem but they're
not fundamentally different from each
other I think the the fundamental idea
is is may be at the higher level the
fundamental idea is the first step of
the decomposition is really to say well
how are we really going to solve any
kind of problem where you're trying to
figure out how to take actions and just
from this stream of observations you
know you've got some agents situated
it's sensorimotor stream and getting all
these observations here and getting to
take these actions and what should it do
how can even broach that problem you
know maybe the complexity of the world
is so great that you can't even imagine
how to build a system that would that
would understand how to deal with that
and so the first step of this
decomposition is to say well you have to
learn the system has to learn for itself
and so note that the reinforcement
learning problem doesn't actually
stipulate that you have to learn but you
could maximize your rewards without
learning it would just I wouldn't do a
very good job event yes so learning is
required because it's the only way to
achieve good performance in any
sufficiently large and complex
environment so so that's the first step
so that's deputy of commonality to all
of the other pieces because now you
might ask well what should you be
learning what is learning even mean you
know in this sense you know learning
might mean well you're trying to update
the parameters of some system which is
then the thing that actually picks the
actions and and those parameters could
be representing anything they could be
parameterizing a value function or a
model or a policy and so in that sense
there's a lot of commonality in that
whatever is being represented there is
the thing which is being learned and
it's being learned with the ultimate
goal of maximizing rewards but that the
way in which you decompose the problem
is is is really what gives the semantics
to the whole system like are you trying
to learn something to predict well like
a value function or a model are you
learning something to perform well like
a policy and and the form of that
objective like it's kind of giving the
semantics to the system and so it really
is at the next level down a fundamental
choice and we have to make those
fundamental choices a system designers
or enable our our algorithms to be able
to learn how to make those choices for
themselves so then the next step you
mentioned the very for the very first
thing you have to deal with is can you
even take in this huge stream of
observations and do anything with it so
the natural next basic question is what
is the what is deeper enforceable
learning and what is this idea of using
neural networks to deal with this huge
incoming stream so amongst all the
approaches for reinforcement learning
deep reinforcement learning is one
family of solution methods that tries to
utilize powerful representations that
are offered by neural networks to
represent any of these different
components of the solution of the agent
like whether it's the value function or
the model or the policy the idea of deep
learning is to say well here's a
powerful tool kit that's so powerful
that it's it's Universal in the sense
that it can represent any function and
it can learn any function and so if we
can leverage that universality that
means that
whatever whatever we need to represent
for our policy or a value function for a
model deep learning can do it so that
deep learning is is one approach that
offers us a toolkit that is has no
ceiling to its performance that as we
start to put more resources into the
system or more memory and more
computation and more more data more
experience of more interactions with the
environment that these are systems that
can just get better and better and
better at doing whatever the job is
they've asked them to do whatever we've
asked that function to represent it can
learn a function that does a better and
better job of representing that that
knowledge whether that knowledge be
estimating how well you're going to do
in the world the value function whether
it's going to be choosing what to do in
the world the policy or whether it's
understanding the world itself what's
going to happen next the model
nevertheless the the the fact that
neural networks are able to learn
incredibly complex representations that
allow you to do the policy the model or
the value function is at least to my
mind exceptionally beautiful and
surprising like what was it is it
surprising was it surprising to you can
you still believe it works as well as it
does do you have good intuition about
why it works at all and works as well as
it does I think let me take two parts to
that question I think it's not
surprising to me that the idea of
reinforcement learning works because in
some sense I think it's the I feel it's
the only thing which can ultimately and
so I feel we have to we have to address
it and there must be success is possible
because we have examples of intelligence
and it must at some level be able to
possible to acquire experience and use
that experience to do better in a way
which is meaningful to environments at
the complexity that humans can deal with
it must be am I surprised that our
current systems can do as well as they
can do I think one of the big surprises
for me and a lot of the community is
really the fact that deep learning can
continue to perform so well despite than
the fact that these neural networks that
they're representing have these
incredibly nonlinear kind of bumpy
surfaces which to our kind of low
dimensional intuitions make it feel like
surely you're just gonna get stuck and
learning will get stuck because you
won't be able to make any further
progress and yet the big surprise is
that learning continues and and these
what appear to be local Optima turned
out not to be because in high dimensions
when we make really big neural nets
there's always a way out
and there's a way to go even lower and
then he's still not another local Optima
because there's some other pathway that
will take you out and take you lower
still and so no matter where you are
learning can proceed and do better and
better and breath better without bound
and so that is a surprising and
beautiful property of neural nets which
I find elegant and beautiful and and
somewhat shocking that it turns out to
be the case as you said surely like to
our low dimensional intuitions that's
surprising yeah yeah we're very we're
very tuned to working within a
three-dimensional environment and so to
start to visualize what a billion
dimensional neural network surface that
you're trying to optimize over what that
even looks like is very hard for us and
so I think that really if you try to
account for for the essentially the AI
winter we're where people gave up on
unil networks I think it's really down
to that that lack of ability to
generalize from from low dimensions to
high dimensions because back then we
were in the low dimensional case people
could only build neural Nets with you
know 50 nodes in them or something and
to to imagine that it might be possible
to build a billion dimensionally on
there and it might have a completely
different qualitatively different
property was very hard to anticipate and
I think even now we're starting to build
the
the theory to support that and and it's
incomplete at the moment but all of the
theory seems to be pointing in the
direction that indeed this is an
approach which which truly is universal
both in its representational capacity
which was known but also in its learning
ability which is which is surprising and
it makes one wonder what else were
missing yes for a low demand intuitions
yet there will seem obvious once is
discovered I often wonder you know when
we one day do have a eyes which are
superhuman in their abilities to to
understand the world what will they
think of the algorithms that we
developed back now will it be you know
looking back at these these days and you
know and and and thinking that well will
we look back and feel that these
algorithms were were naive fair steps or
will they still be the fundamental ideas
which are used even in 100 thousand ten
thousand years yeah I Nels and I they'll
they'll watch back to this conversation
and I would the smile maybe a little bit
of a laugh I mean my senses I think it
just like on we used to think that the
Sun revolved around the earth they'll
see our systems of today reinforcement
learning as too complicated that the
answer was simple all along there's
something I just just think you said in
a game of Go I mean I love the systems
of like cellular automata that there's
simple rules from which incredible
complexity emerges so it feels like
there might be some very simple
approaches just like where Sutton says
right these simple methods or with
compute over time seem to prove to be
the most effective I have 100% agree I
think that if we try to anticipate what
will generalize well into the future
I think it's likely to be the case that
it's the simple clear ideas which will
have the longest legs and what which
will care
furthest into the future nevertheless
we're in a situation where we need to
make things work day and today and
sometimes that requires putting together
more complex systems where we don't have
the full answers yet as to what those
minimal ingredients might be
you