Sergey Levine: Robotics and Machine Learning | Lex Fridman Podcast #108
kxi-_TT_-Nc • 2020-07-14
Transcript preview
Open
Kind: captions
Language: en
the following is a conversation with
Sergey Levine a professor at Berkeley
and a world-class researcher in deep
learning reinforcement learning robotics
and computer vision including the
development of algorithms for end-to-end
training of neural network policies that
combine perception and control scalable
algorithms for inverse reinforcement
learning and in general deep r.l
algorithms quick summary of the ads to
sponsors cash app and expressvpn please
consider supporting the podcast by
downloading cash app and using collects
pot cast and signing up at expressvpn
comm / flex pod click the links buy the
stuff it's the best way to support this
podcast and in general the journey I'm
on if you enjoy this thing subscribe on
YouTube review it with five stars an
apple podcast follow on Spotify
supported on patreon or connect with me
on Twitter at lex
friedman as usual i'll do a few minutes
of as now and never any ads in the
middle that can break the flow of the
conversation this show is presented by
cash app the number one finance app in
the App Store when you get it used colex
podcast cash app lets you send money to
friends buy bitcoin and invest in the
stock market with as little as one
dollar since cash app does fractional
share trading let me mention that the
order execution algorithm that works
behind the scenes to create the
abstraction of the fractional orders is
an algorithmic marvel so big props the
cash app engineers are taking a step up
to the next layer of abstraction over
the stock market making trading more
accessible for new investors and
diversification much easier so again if
you get cash out from the App Store
Google Play and use the code lex podcast
you get $10 and cash up will also donate
$10 the first an organization that is
helping to advanced robotics and stem
education for young people around the
world this show
is also sponsored by expressvpn get it
at expressvpn comm / Lex pod to support
this podcast and to get an extra three
months free on a one-year package I've
been using expressvpn for many years I
love it
I think expressvpn is the best VPN out
there they told me to say it but it
happens to be true my humble opinion it
doesn't lock your data it's crazy fast
and as easy to use literally just one
big power on button again it's probably
obvious to you but I should say it again
it's really important that they don't
log your data
it works on Linux and every other
operating system but Linux of course is
the best operating system shout out to
my favorite flavor
Ubuntu mottai 2004 once again get it at
expressvpn calm / relax pod to support
this podcast and to get an extra three
months free on a one-year package and
now here's my conversation sergey
Lavigne what's the difference between a
state-of-the-art human such as you and I
well I don't know if we qualify Stata
they're humans but a state-of-the-art
human and a state-of-the-art robot it's
a very interesting question
robot capability is it's kind of a I
think it's a very tricky thing to to
understand because there are some things
that are difficult that we wouldn't
think are difficult and some things that
are easy that we wouldn't think ever you
see and there's also a really big gap
between capabilities of robots in terms
of hardware and their physical
capability and capabilities of robots in
terms of what they can do autonomously
there is a little video that I think
robotics researchers really like to show
a special Robotics learning researchers
like myself from 2004 from Stanford
which demonstrates a prototype robot
called the PR one and the PR one was a
robot that was designed as a home
assistance robot and there's this
beautiful video showing the pr1 tidying
up a living room putting away toys and
at the end bringing a beer to the person
sitting on the couch which looks really
amazing and then the punch line is that
this
is entirely controlled by person yes so
you can so that in some ways the gap
between a state-of-the-art human
state-of-the-art robot if the robot has
a human brain is actually not that large
now obviously like human bodies are
sophisticated and very robust and
resilient in many ways but on the whole
if we're willing to like spend a bit of
money and do a bit of engineering we can
kind of close the hardware gap almost
but the intelligence gap that one is
very wide and when you say hardware you
you're referring to the physical sort of
the actuators the actual body the robot
is opposed to the hardware on which the
cognition the nervous the hardware of
the nervous system yes exactly I'm
referring to the body rather than the
mind so what so that means that the kind
of the work is cut out for us like while
we can still make the body better we
kind of know that the big bottleneck
right now is really the mind and how big
is that gap how big is the how big is
the difference in your in your sense of
ability to learn a bit ability to reason
ability to perceive the world between
humans and our best robots the gap is
very large and the gap becomes larger
the more unexpected events can happen in
the world so essentially the spectrum
along which you can measure the the size
of that gap is the spectrum of how open
the world is if you control everything
in the world very tightly if you put the
robot in like a factory and you tell it
where everything is and you rigidly
program its motion then it can do things
you know one might even say in a
superhuman way it can move faster it's
stronger it can lift up a car and things
like that but as soon as anything starts
to vary in the environment now it'll
trip up and if many many things vary
like they would like in your kitchen for
example then things are pretty much like
wide open now again we're gonna stick a
bit on the philosophical questions but
how much on the human side of the
cognitive abilities in your sense is
nature versus nurture so so how much of
it is product of evolution and how much
of it something we'll learn from sort of
scratch yeah well from the day were born
I'm going to read into your question as
asking about the implications of this
for AI really
by biologists I can't really like speak
authoritative also until in garnet if if
it's so if it's all about learning then
there's more hope for am so the way that
I look at this is that you know well
first of course biology is very messy
and it's if you ask the question how
does a person do something or has a
person's mind do something you come up
with a bunch of hypotheses and
oftentimes you can find support for many
different often conflicting hypotheses
one way that we can approach the
question of what the implication of this
for AI R is we can think about what's
sufficient so you know maybe a person is
from birth very very good at some things
like for example recognizing faces
there's a very strong evolutionary
pressure to do that if you can recognize
your mother's face then you're more
likely to survive and therefore people
are good at this but we can also ask
like what's what's the minimum
sufficient thing right and one of the
ways that we can study the minimal
sufficient thing is we could for example
see what people do in unusual situations
if you present them of things that
evolution couldn't have prepared them
for you know our daily lives actually do
this to us all the time we we didn't
evolve to deal with you know automobiles
and spaceflight and whatever so they're
all these situations that we can find
ourselves in and we do very well they're
like I can give you a joystick to
control a robotic arm which you've never
used before and you might be pretty bad
for the first couple of seconds but if I
tell you like your life depends on using
this robotic arm to like open this door
you'll probably manage it even though
you've never seen this device before you
even even ever used the joys to control
us and you'll kind of muddle through it
and that's not your evolved natural
ability that's your fear flexibility
your your adaptability and that's
exactly why our current robotic systems
really kind of fall flat but I wonder
how much general almost what we think of
as common sense
pre-trained models underneath all that
so that ability to adapt to a joystick
is requires you to have a kind of you
know I'm human so it's hard for me to
introspect all the knowledge I have
about the world but it seems like there
might be an iceberg underneath of the
amount of knowledge
you actually bring to the table now
that's kind of the open question there's
absolutely an iceberg of knowledge that
we bring to the table but I think it's
very likely that iceberg of knowledge is
actually built up over our lifetimes
because we have you know we have a lot
of prior experience to draw on and it
kind of makes sense that the right way
for us to you know to optimize our
efficiency our evolutionary fitness and
so on is to utilize all that experience
to build up the best iceberg we can get
and that's actually one you know well
that sounds an awful lot like what
machine learning actually does I think
that for modern machine learning it's
actually a really big challenge to take
this unstructured massive experience and
distill out something that looks like a
common sense understanding of the world
and perhaps part of that isn't it's not
because something about machine learning
itself is is broken or hard but because
we've been a little too rigid in
subscribing to a very supervised very
rigid notion of learning you know kind
of the input-output excess goes go to
why sort of model and maybe what we
really need to to do is to view the
world more as like a massive experience
that is not necessarily providing any
rigid supervision but sort of providing
many many instances of things that could
be and then you take that and you
distill it into some sort of common
sense understanding I see what you're
you're painting an optimistic beautiful
picture especially from the robotics
perspective because that means we just
need to invest in both better learning
algorithms figure out how we can get
access to more and more data for those
learning L goes to extract signal from
and then accumulate that iceberg of
knowledge it's a beautiful picture it's
a hopeful one I think it's potentially a
little bit more than just that and this
is this is where we perhaps reach the
limits of our current understanding but
one thing that I think that the research
community hasn't really resolved in a
satisfactory way is how much it matters
where that experience comes from like
you know do just like download
everything on the intranet and cram it
into essentially the 21st century analog
of the giant language model and then see
what happens or does it actually matter
whether your machine
experiences the world or in a sense that
actually attempts things observes the
outcome of its actions and kind of
augments the experience that way that it
chooses which parts of the world it gets
to interact with and observe and learn
from right it may be that the world is
so complex that simply obtaining a large
mass of sort of iid samples of the world
is is a very difficult way to go but if
you are actually interacting with the
world and essentially performing this
sort of hard- mining by attempting what
you think might work observing the
sometimes happy and sometimes sad
outcomes of that and augmenting your
understanding using that experience and
you're just doing this continually for
many years maybe that sort of data in
some sense is actually much more
favourable to obtaining a common sense
understanding well one reason we might
think that this is true is that you know
the what we associate with common sense
or lack of common sense is often
characterized by the ability to reason
about kind of counterfactual questions
like you know I if I were to you know
here I'm this bottle of water sitting on
the table everything is fine far knock
it over which I'm not going to do but if
I were to do that what would happen and
I know that nothing good would happen
from that but if I have a bad
understanding of the world I might think
that that's a good way for me to like
you know gain more utility if I actually
go about the daily life doing the things
that my current understanding of the
world suggests will give me high utility
in some ways I'll get exactly the the
right supervision to tell me not to do
those those bad things and to keep doing
the good things so there's a spectrum
between iid random walk through the
space of data and then there's and what
we humans do or I don't even know if we
do it through optimal but there might be
beyond what so this open question that
you raised where do you think systems
intelligent systems that would be able
to deal with this world fall can we do
pretty well by reading all of Wikipedia
sort of randomly sampling it like
language models do or do we have to be
exceptionally selective and intelligent
about which aspects of the wall we eat
chocolate so I think this is first an
open scientific problem and I don't have
like a clear answer but I can speculate
a little bit and what I would speculate
is that you don't need to be super super
careful I think it's less about like
being careful to avoid the useless stuff
and more about making sure that you hit
on the really important stuff so perhaps
it's okay if you spend part of your day
just you know guided by your curiosity
visiting interesting regions of the of
your state space but it's important for
you to you know every once in a while
make sure that you really try out the
solutions that your current model of the
world suggests might be effective and
observe whether those solutions are
working as you expect or not and perhaps
some of that is really essential to have
kind of a perpetual improvement loop
like this perpetual improvement loop is
really like but that's really the key
the key that's going to potentially
distinguish the best current methods
from the best methods of tomorrow in a
sense how important do you think is
exploration or total out-of-the-box
thinking exploration in this space is
you jump to totally different domain so
you kind of mentioned there's an
optimization problem you kind of kind of
explore the specifics of a particular
strategy whatever the thing you're
trying to solve how important is it to
explore totally outside of the
strategies they've been working for you
so far what's your intuition there yeah
I think it's a very problem dependent
kind of question and I think that that's
actually you know in some ways that
question gets at one of the big
differences between sort of the classic
formulation of a reinforcement learning
problem and some of the sort of more
open-ended reformulations of that
problem that have been explored in
recent years so classically
reinforcement learning is framed as a
problem of maximizing utility like any
kind of rational AI agent and then
anything you do is in service to
maximizing that utility but a very
interesting kind of way to look at
I'm not necessary saying that's the best
way to look at it but an interesting
alternative way to look at these
problems as as something where you first
get to explore the world
however you please and then afterwards
you will be tasked with doing something
and that might suggest to somewhat
different solutions so if you don't know
what you're going to be tasked with
doing and you just want to prepare
yourself optimally for whatever you're
uncertain future holds maybe then you
will choose to attain some sort of
coverage build up sort of an arsenal of
cognitive tools if you will such that
later on when someone tells you now your
job is to fetch the coffee for me you'll
be well prepared to undertake that task
and that you see that as the modern
formulation of the reinforcement
learning problem as the kind of the more
multi task the general intelligence kind
of formulation I think that's one
possible vision of where things might be
headed I don't think that's by any means
the mainstream or standard way of doing
things and it's not like if I had to but
I like it it's a beautiful vision so
maybe you actually take a step back what
is the goal of robotics what's the
general problem of robotics of trying to
solve you actually kind of painted two
pictures here one of the narrow one is
the general what in your view is the big
problem of robotics again ridiculously
philosophical questions I think that you
know maybe there are two ways I can
answer this question one is there's a
very pragmatic problem which was like
what would make robots what would sort
of maximize the usefulness of robots and
there the answer might be something like
a system where a system that can perform
whatever task a human user sets for it
you know within the physical constraints
of course if you tell it to teleport to
another planet but probably can't do
that but if you if you ask it to do
something that's within its physical
capability then potentially with a
little bit of additional training or a
little bit of additional trial and error
it ought to be able to figure it out in
much the same way as like a human tele
operator ought to figure out how to
drive the robot to do that that's kind
of a very pragmatic view of what it
would take to kind of solve the the
robotics problem if you will but I think
that there is a second answer and that
answer that the answer is a lot closer
to why I want to work on on robotics
which is that I think it's it's less
about what it would take to do a really
good job
in the world of robotics but more the
other way around what robotics can bring
to the table
to help us understand artificial
intelligence so your dream fundamentally
is to understand intelligence yes I
think that's the dream for many people
who actually work in this space I think
that there is there's something very
pragmatic and very useful about studying
robotics but I do think that a lot of
people that go into this field actually
you know the things that they draw
inspiration from are the potential for
robots to like help us learn about
intelligence and about ourselves that's
that's fascinating that robotics is
basically the space by which you can get
closer to understanding the fundamentals
of artificial intelligence so what is it
about robotics that's different from
some of the other approaches so if we
look at some of the early breakthroughs
in deep learning or in the computer
vision space and the natural language
processing there was really nice clean
benchmarks that a lot of people competed
on and thereby came out with a lot of
building ideas what's the fundamental
difference to you between computer
vision purely define an image net and
kind of the bigger robotics problem so
there are a couple of things one is that
with robotics you kind of have you kinda
have to take away many of the crutches
so you have to deal with with both the
the the particular problems of
perception control and so on but you
also have to deal with the integration
of those things and you know classically
we've always thought of the integration
as kind of a separate problem so a class
a kind of modular engineering approaches
that we solve individual subproblems
then wire them together and then the
whole thing works and one of the things
that we've been seeing over the last
couple of decades is that well maybe
studying the thing as a whole might lead
to just like very different solutions
now if we were to study the parts and
wire them together so the integrative
nature of robotics research helps us see
you know the different perspectives on
the problem another part of the answer
is that with robotics it it casts a
certain paradox into very clever relief
so this is sometimes referred to as more
of expert on the idea that in artificial
intelligence things that are very
hard for people can be very easy for
machines and vice versa things that are
very easy for people can be very hard
for machines so you know integral and
differential calculus is pretty
difficult to learn for people but if you
program a computer do it it can derive
derivatives and integrals for you all
day long without any trouble
whereas some things like you know
drinking from a cup of water very easy
for a person to do very hard for a robot
to deal with and sometimes when we see
such blatant discrepancies that give us
a really strong hint that we're missing
something important so if we really try
to zero in on those discrepancies we
might find that little bit that we're
missing and it's not that we need to
make machines better or worse at math
and better at drinking water but just
that by studying those discrepancies you
might find some new insight so that that
could be that could be in any space it
doesn't have to be robotics but you're
saying yeah I get it's kind of
interesting that robotics seems to have
a lot of those discrepancies so the the
the Hans more of a paradox is probably
referring to the space of the the
physical interaction I think you said
object manipulation walking all the kind
of stuff we do in the physical world
that well how do you make sense if you
were to try to disentangle the the
Marwick paradox like why is there such a
gap in our intuition about it why do you
think manipulating objects is so hard
from everything you've learned from
applying reinforcement learning in this
space yeah I think that one reason is
maybe that for many of the problems for
many of the other problems that we've
studied in AI and computer science and
so on the notion of input/output and
supervision is much much cleaner so
computer vision for example deals with
very complex inputs but it's
comparatively a bit easier at least up
to some level of abstraction to cast it
as a very tightly supervised problem
it's comparatively much much harder to
cast robotic manipulation as a very
tightly supervised problem you can do it
it just doesn't
work all that well so you could say that
well maybe we get a label data set where
we know exactly which motor commands to
send and then we train on that but for
various reasons that's not actually like
such a great solution and it also
doesn't seem to be even remotely similar
to how people and animals learn to do
things because we're not told by like
our parents here is how you fire your
muscles in order to walk we you know we
do get some guidance but the really
low-level detailed stuff we figure out
most of them our own and that's what you
mean by tightly coupled that every
single little sub action gets a
supervised signal of whether it's a good
one or not right so so while in computer
vision you could sort of imagine up to a
level of abstraction that maybe you know
somebody told you this is a car and this
is a cat and this is a dog in motor
control it's very clear that that was
not the case if we look I said of the
sub spaces of Robotics that again as you
said robotics integrates all of them
together and we'll get to see how this
beautiful mess into place but so there's
nevertheless still perception so it's
the the computer vision problem
broadly speaking understanding the
environment then there's also maybe you
can correct me on this kind of
categorization of the space then there's
prediction in trying to anticipate what
things are going to do into the future
in order for you to be able to act in
that world and then there's also this
game theoretic aspect of how your
actions will change the behavior of
others in this kind of space what and
this is bigger than reinforcement
learning this is just broadly looking at
the problem of Robotics what's the
hardest problem here or is there or is
what you said true that when you start
to look at all of them together that's
an int that's a whole nother thing like
you can't even say which one
individually is harder because all of
them together you should only be looking
at them all together I think when you
look at them all together some things
actually become easier and I think
that's actually pretty important so we
had you know back in 2014 we had some
work basically our first work on end to
end
enforced learning for robotic
manipulation skills from vision which
you know at the time was something that
seemed a little inflammatory and
controversial in the robotics world but
other than the the inflammatory and
controversial part of it
the point that we were actually trying
to make in that work is that for the
particular case of combining perception
and control you could actually do better
if you treat them together then if you
try to separate them and the way that we
try to demonstrate this as we picked a
fairly simple motor control task where a
robot had to insert a little red
trapezoid into a trapezoidal hole and we
had our separated solution which
involved first detecting the hole using
a pose detector and then actuated arm to
put it in and then our intent solution
which just mapped pixels to the torques
and one of the things we observed is
that if you use the intense solution
essentially the pressure on the
perception part of the model is actually
lower like it doesn't have to figure out
exactly where the thing is in 3d space
it just needs to figure out where it is
you know distributing the errors in such
a way that the horizontal difference
matters more than the vertical
difference because vertically just
pushes it down all the way until it
can't go any further and their
perceptual errors are a lot less harmful
whereas a perpendicular to the direction
of motion perceptual errors are much
more harmful so the point is that if you
combine these two things you can trade
off errors between the components
optimally to best accomplish the task
and the components can should be weaker
while still leading to better overall
performance as a profound idea I mean in
in the space of pegs and things like
that is quite simple it almost is
tempting to overlook but that's seems to
be at least intuitively an idea that
should generalize to basically all
aspects of perception control of course
when one strengthens the other yeah and
and we you know people who have studied
sort of perceptual heuristics in humans
and animals find things like that all
the time so one one very well-known
example this is something called the
gaze heuristic which is a little trick
that you can use to intercept a flying
object so if you want to catch a ball
for instance you could try to localize
it in 3d space estimate its velocity
estimate the effect of wind resistance
solve a complex system of differential
equations in your head or you can
maintain a running speed so the object
stays in the same position as in your
field of view so if it dips a little bit
you speed up if it rises a little bit
you slow down and if you follow the
simple rule you'll actually arrive at
exactly the place where the object lands
and you'll catch it and humans use it
when they play baseball human pilots use
it when they fly airplanes to figure out
if they're about to collide with
somebody frogs use this to catch insects
and so on and so on so this is something
that actually happens in nature and I'm
sure this is just one instance of it
that we were able to identify just
because it's you know that scientists
are able to identify that goes so
prevalent with our probably many others
do you ever just who can zoom in as we
talk about robotics they have a
canonical problem sort of a simple clean
beautiful representative problem in
robotics they you think about when
you're thinking about some of these
problems we talked about robotic
manipulation to me that seems
intuitively at least the robotics
community is converging towards that as
a space that's the canonical problem if
you agree that maybe you zoom in in some
particular aspect of that problem that
you just like like if we solve that
problem perfectly it'll unlock a major
step in towards human level intelligence
I don't think I have like a really great
answer to that and I think partly the
reason I don't have a great answer kind
of has to do with the it has to do with
the fact that the difficulty is really
in the flexibility and adaptability
rather than in doing a particular thing
really really well so it's hard to just
say like oh if you can I don't know like
shuffle a deck of cards as fast as like
a Vegas right a casino dealer then
you'll you'll be very proficient it's
really the ability to quickly figure out
how to do some arbitrary new thing well
enough so like you know to move on to
the next arbitrary thing but the the
source of newness and uncertainty have
you found problems in which it's easy to
generate new noonah sness messes yeah
new types of newness yeah so
a few years ago is so if you'd asked me
this question around like 2016 maybe I
would have probably said that robotic
grasping is a really great example of
that because it's a task with great
real-world utility like you will get a
lot of money if you can do it well when
is the robotic grasping picking up any
object with a robotic hand exactly so
you'll get a lot of money if you do it
well because lots of people want to run
warehouses with robots and it's highly
non-trivial because very different
objects will require very different
grasping strategies but actually since
then people have gotten really good at
building systems to solve this problem
as to the point where I'm not actually
sure how much more progress we can make
with that as like the main guiding thing
but it's kind of interesting to see the
kind of methods that have what actually
worked well in that space because a
robotic grasping classically used to be
regarded very much as kind of an almost
like a geometry problem so you people
who have studied the history of computer
vision will find this very familiar that
it's kind of in the same way that in the
early days of computer vision people
thought of it very much it's like an
inverse graphics thing in robotic
grasping people thought of it as an
inverse physics problem essentially you
look at what's in front of you figure
out the shapes then use your best
estimate of the laws of physics to
figure out where to put your fingers on
you pick up the thing and it turns out
that what works really well for robotic
grasping instantiated in many different
recent works including our own but also
ones from many other labs is to use
learning methods with some combination
of either exhaustive simulation or like
actual real-world trial-and-error and
turns out that those things actually
work really well and then you don't have
to worry about solving geometry problems
or physics problems so what are just by
the way and the grasping what are the
difficulties that have been worked on so
one is like the materials of things
maybe occlusions and the perception side
why is it such a difficult why is
picking stuff up such a difficult
problem yeah it's a difficult problem
because the number of things that you
might have to deal with or the variety
of things that you have to deal with is
extremely large
and oftentimes things that work for one
class of objects won't work for other
class of objects so if you if you get
really good at picking up boxes and now
you have to pick up plastic bags you
know you just need to employ a very
different strategy and there are many
properties of objects that are more than
just their geometry it has to do with
you know the bits that that are easier
to pick up the bits that are hard to
pick up the bits that are more flexible
the bits that will cause the thing to
pivot and Bend and drop out of your hand
versus the bits that resulted in I
secure grasp things that are flexible
things that if you pick them up the
wrong way they'll fall upside down and
the contents will spill out so there's
all these little details that come up
but the task is still kind of can be
characterized as one task like there's a
very clear notion of you did it or you
didn't do it so in terms of spilling
things there creeps in this notion that
starts the sound and feel like common
sense reasoning do you think solving the
general problem of Robotics requires
common sense reasoning requires general
intelligence this kind of human level
capability of you know like you said be
robust and deal with uncertainty but
also be able to sort of reason and
assimilate different pieces of knowledge
that you have yeah what do you what are
your thoughts on the needs of common
sense reasoning in the space of the
general robotics problem so I'm gonna
slightly dodge that question and say
that I think I think maybe actually it's
the other way around is that studying
robotics can help us understand how to
put common sense into our AI systems one
way to think about common sense is that
and and why our current systems might
lack common sense is that common sense
is a property is an emergent property of
actually having to interact with a
particular world a particular universe
and get things done in that universe so
you might think that for instance like a
an image captioning system maybe it
looks at pictures of the world and it
types out English sentences so it kind
of it kind of deals with our world
and then you can easily construct
situations where image captioning
systems do things that defy common sense
like give it a picture of a person
wearing fur coat and we'll say it's a
teddy bear but I think what's really
happening in those settings is that the
system doesn't actually live in our
world it lives in its own world that
consists of pixels and English sentences
and doesn't actually consist of like you
know having to put on a fur coat in the
winter so you don't get cold so perhaps
the the reason for the disconnect is
that the systems that we have now is
simply inhabit a different universe and
if we build AI systems that are forced
to deal with all of the messiness and
complexity of our universe maybe they
will have to acquire our common sense to
essentially maximize their utility
whereas the systems we're building now
don't have to do that they can take some
shortcut that's fascinating
you've a couple of times already sort of
reframed the role of robotics and this
whole thing and for some reason I don't
know if my way of thinking is common but
I thought like we need to understand and
solve intelligence in order to solve
robotics and you're kind of framing it
as no robotics is one of the best ways
to just study artificial intelligence
and build sort of like robotics is like
the right space in which you get to
explore some of the fundamental learning
mechanisms fundamental sort of
multimodal multitask aggregation of
knowledge mechanisms that are required
for general intelligence this really
interesting way to think about it but
let me ask about learning can the
general sort of robotics the epitome of
the robotics problem be solved purely
through learning perhaps and to end
learning sort of learning from scratch
as opposed to injecting human expertise
and rules and heuristics and so on I
think that in terms of the spirit of the
question I I would say yes I mean I
think that in though in some ways it may
be like an overly sharp dichotomy like
you know I think that in some ways when
we build algorithms we you know at some
point a person does something like yeah
there's always a person turned on the
computer first
you know implemented tensorflow but yeah
I think that in terms of the in terms of
the point that you're getting and I do
think the answer is yes I think that I
think that we can solve many problems
that have previously required meticulous
manual engineering through automated
optimization techniques and actually one
thing I will say on this topic is I
don't think this is actually a very
radical or very new idea I think people
have have been thinking about automated
optimization techniques as a way to do
control for a very very long time and in
some ways what's changed is really more
than aim so you know today we would say
that oh my robot does machine learning
it does reinforcement learning maybe in
the 1960s you'd say oh my robot is doing
optimal control and maybe the difference
between typing out a system of
differential equations and doing
feedback linearization versus training
and neural net it's not such a large
difference it's just you know pushing
the optimization deeper and deeper into
the thing well you think that were but
with the especially deep learning that
the accumulation of experiences in data
form to form deep representations starts
to feel like knowledge is supposed to
optimal control so this feels like
there's an accumulation of knowledge to
the learning process yes yeah so I think
that is a good point that one big
difference between learning based
systems and classic optimal control
systems is that learning based systems
and principle should get better and
better
the more they do something right and I
do think that that's actually a very
very powerful difference so if you look
back at the world of expert systems is
symbolic AI and so on of using logic to
accumulate expertise human expertise
human encoded expertise but do you think
that will have a role the some points
that the you know deep learning machine
learning reinforcement learning has been
in incredible results and breaks there
wasn't just inspired thousands maybe
millions of researchers but you know
there's this less popular now but it
used to be part of the idea of symbolic
AI do you think that will have a role
I think in some ways the kind of the the
descendants of symbolic I actually
already have a role so you know this is
the the highly biased history from my
perspective you say that well initially
we thought that rational decision-making
involves logical manipulation so you
have some model the world expressed in
term in terms of logic you have some
query like what action do I take in
order to for X to be true and then you
manipulate your logical symbolic
representation to get an answer what
that turned into somewhere in the 1990s
is well instead of building kind of
predicates and statements that have true
or false values will build probablistic
systems where things have probabilities
associated and probabilities of being
true and false not turning the Bayes
nets and that provided sort of a boost
to what we're really you know still
essentially logical inference systems
just probabilistic logical inference
systems and then people said well let's
actually learn the individual
probabilities inside these models and
then people said well let's not even
specify the nodes and the models let's
just put a big neural net in there but
in many ways I see these as actually can
descendants from the same idea it's
essentially instantiating rational
decision-making by means of some
inference process and learning by means
of an optimization process so so in a
sense I would say yes that it has a
place and in many ways that place is or
you know it already holds that place
it's already in there yeah it's just by
different it looks slightly different
than there was before yeah but but at
some there are some things that that we
can think about that make this a little
bit more obvious like if I train a big
neural net model to predict what will
happen in response to my robots actions
and then I run probablistic inference
meaning I invert that model to figure
out the actions that lead to some
plausible outcome like to me that seems
like a kind of logic you have a model of
the world it just happens to be
expressed by a neural net and you are
doing some inference procedure some sort
of manipulation on that model to figure
out you know the answer to a query that
you have it's the interpretability it's
the explained ability though that seems
to be lacking more so because the nice
thing about sort of experts
systems is you can follow the reasoning
of the system that to us mere humans is
somehow compelling it it would it's just
I don't know what to make of this fact
that there's a human desire for
intelligence systems to be able to
convey in a poetic way to us why made
the decisions it did like tell a
convincing story and perhaps that's like
a silly human thing like we shouldn't
expect that of intelligent systems like
we should be super happy that there is
intelligent systems out there but if I
were to sort of psychoanalyze the
researchers at the time I would say
expert systems connected to that part
that desire for AI researchers for
systems to be explainable
I mean maybe on that topic do you have a
hope that sort of inferences source of
learning based systems will be as
explainable as the dream was with expert
systems for example I think it's a very
complicated question because I think
that in some ways the question of
explain ability is kind of very closely
tied to the question of of like
performance like you know why do you
want your system to explain itself well
so that it's so that when it screws up
you can kind of figure out why it did it
right but it's nice but in some ways
that that's a much bigger problem extra
like your system might screw up and then
it might screw up at how it explains
itself or you might have some bugs
somewhere so that it's not actually
doing what was supposed to do so you
know maybe a good way to view that
problem is really as a problem as a
bigger problem of verification and
validation of which explained abilities
sort of what one component I see I just
see differently I see explained ability
you you put it beautifully I think you
actually summarized the field of
explained ability but to me there's
another aspect of explained ability
which is like storytelling that has
nothing to do with errors or with
like the the survey it doesn't it uses
errors as as elements of its story as
opposed to a fundamental need to be
explainable when errors occur it's just
that for other intelligence systems to
be in our world we seem to want to tell
each other stories and that that's true
in the political world is true in the
academic world and that I you know
neural networks are less capable of
doing that or perhaps they're equally
capable a storytelling storytelling may
be it doesn't matter what the
fundamentals of the system are you just
need to be a good storyteller maybe one
specific story I can tell you about in
that space is actually about some work
that was done by by my former
collaborator who's now a professor at
MIT named Jacob Andreas Jacob actually
works on natural language processing but
he had this idea to do a little bit of
work in reinforcement learning and how
on how natural language can basically
structure the internals of policies
trained with RL and one of the things he
did is he set up a model that attempts
to perform some tasks that's defined by
a reward function but the model reads in
a natural language instruction so this
is a pretty common thing to do in
instruction following so you tell it
like you know go to the Red House and
then supposed to go to the Red House but
then one of the things that Jacob did is
he treated that sentence not as a
command from a person but as a
representation of the internal kind of
state of the of the of the mind of this
policy essentially so that when it was
faced with a new task what it would do
is it would basically try to think of
possible language descriptions attempt
to do them and see if they led to the
right outcome so it would kind of think
out loud like you know I'm faced with
this new task what am I gonna do let me
go to the red house now that didn't work
let me go to the Blue Room or something
let me go to the green plant and once it
got some reward it would say oh go to
the green plant that's what's working
I'm gonna go to the green plant and then
you could look at the string that it
came up with and that was a description
of how it thought it should solve the
problem so you could do you could
basically incorporate language as
internal state and you can start getting
some handle on these kinds of things and
then what I was kind of trying to get to
is that also if you add to the reward
function
the convincing nough story hmm so I have
another reward signal of like people who
review that story how much they like it
I says that you you know and initially
that could be a hyper parameter or sort
of hard-coded heuristic type of thing
but it's an interesting notion of the
convincing 'no story becoming part of
the reward function the objective
function of the explained ability it's
in the world of sort of twitter and fake
news that might be a scary notion that
the the nature of truth may not be as
important as the convincing 'no some the
how convinced you are in telling the
story around the facts well let me ask
the the basic question you're one of the
world-class researchers in reinforcement
learning deeper and forceful learning
certainly in the robotic space
what is reinforcement learning i think
that reinforcement learning refers to
today is really just the kind of the
modern incarnation of learning based
control so classically reinforcement
learning has a much more narrow
definition which is that it's you know
literally learning from reinforcement
like the thing does something and then
it gets a reward or punishment but
really i think the way the term is used
today is it's used for for more broadly
to learning based control so some kind
of system that's supposed to be
controlling something and it uses data
to get better and what is control means
is action is the fundamental element
yeah it means making rational decisions
now and rational decisions are decisions
that maximize a measure of utility and
sequentially see many decisions time and
time and time again now like so it's
easier to see that kind of idea in the
space of maybe games in the space of
robotics
do you see is bigger than that is it
applicable like word were the limits of
the applicability of reinforcement
learning yeah so rational
decision-making is essentially the the
encapsulation of the AI problems you
didn't through a particular lens so any
problem that we would want a machine to
do intelligent machine can likely be
represented as a decision-making problem
you're classifying images is a
decision-making problem although not a
sequential one typically you know
controlling a chemical plant as a
decision-making problem deciding what
videos to recommend on YouTube is a
decision-making problem and one of the
really appealing things about
reinforcement learning is if it does
encapsulate the range of all these
decision-making problems perhaps working
on reinforcement learning is you know
one of the ways to reach a very broad
swath of AI problems but what what do
you use the fundament the difference
between reinforcement learning and maybe
supervised machine learning so the
reinforcement learning can be viewed as
a generalization of supervised machine
learning you can certainly cast
supervised learning as a reinforcement
learning problem you can just say your
loss function is the negative of your
reward but you have stronger assumptions
you have the assumption that someone
actually told you what the correct
answer was that your data was iid and so
on so you could view reinforcement
learning is essentially relaxing some of
those assumptions now that's not always
a very productive way to look at it
because if you actually have a
supervised learning problem you'll
probably solve it much more effectively
by using supervised learning methods
because it's easier but you can view
reinforcement as a journalist a tional
know for sure but they're fundamentally
that's a mathematical statement that's
absolutely correct but it seems that
reinforcement learning the kind of tools
we'll bring to the table today of today
so maybe down the line everything will
be a reinforcement learning problem just
like you said
image classification should be mapped to
a reinforcement learning problem but
today the tools and ideas the way we
think about them are different sort of
supervised learning has been used very
effectively to solve basic narrow AI
problems the reinforcement learning kind
of represents the dream of AI it's very
much so in the research space now in two
captivating the imagination of people
what we can do with intelligent systems
but it hasn't yet had as wide of an
impact as the supervised learning
approaches so that so that I my question
comes from more practical sense like
what do you see is the gap between the
more general reinforcement learning
and the very specific yes it's a
question decision-making with one
sequence one step in the sequence of the
supervised learning so for a practical
standpoint I think that one one thing
that is you know potentially a little
tough now and this is I think something
that we'll see this is a gap that we
might see closing over the next couple
of years is the ability of reinforcement
learning algorithms to effectively
utilize large amounts of prior data so
one of the reasons why it's a bit
difficult today to use reinforcement
learning for all the things that we
might want to use it for is that in most
of the settings where we want to do
rational decision-making it's a little
bit tough to just deploy some policy
that does crazy stuff and learns purely
through trial and error it's much easier
to collect a lot of data a lot of logs
of some other policy that you've got and
then maybe you you know if you can get a
good policy out of that then you deploy
it and let it kind of fine-tune a little
bit but algorithmically it's quite
difficult to do that so I think that
once we figure out how to get
reinforcement learning to bootstrap
effectively from large data sets then
we'll see very very rapid growth and
applications of these technologies so
this is what's referred to as off policy
reinforcement learning or offline RL or
batch RL and I think we're seeing a lot
of research right now that that's
bringing us closer and closer to that
can you maybe paint a picture of the
different methods she said
off policy what's value-based
reinforcement learning what's policy
based was modelled based with soft
policy on policy what are the different
categories of reinforcement yeah so one
way we can think about reinforcement
learning is that it's um it's in some
very fundamental way it's 
Resume
Read
file updated 2026-02-13 13:25:47 UTC
Categories
Manage