Leslie Kaelbling: Reinforcement Learning, Planning, and Robotics

File TXT tidak ditemukan.

Leslie Kaelbling: Reinforcement Learning, Planning, and Robotics | Lex Fridman Podcast #15

Er7Dy8rvqOc • 2019-03-12

Transcript preview

Open

Kind: captions
Language: en
the following is a conversation with
lesie Cale bling she's a roboticist and
professor at MIT she's recognized for
her work and reinforcement learning
planning robot navigation and several
other topics in AI she won the IAI
computers and thought award and was the
editor-in-chief of the prestigious
Journal machine learning
research this conversation is part of
the artificial intelligence podcast at
MIT and Beyond if you enjoy it subscribe
on YouTube iTunes or simply connect with
me on Twitter at Lex Freedman spelled f
r d and now here's my conversation with
lesie
cing what made me get excited about AI I
can say that is I read girdle eer Bach
when I was in high school that was
pretty formative for
me because it
exposed uh the interestingness of
Primitives and combination and how you
can make complex things out of simple
parts and ideas of AI and what kinds of
programs might generate intelligent
Behavior so so you first fell in love
with AI reasoning logic versus robots
yeah the robots came because um my first
job so I finished an undergraduate
degree in philosophy at Stanford and was
about to finish a masters in computer
science and I got hired at
Sr uh in their AI lab and they were
building a robot it was a kind of a
follow on to shaky but all the shaky
people were not there anymore and so my
job was to try to get this robot to do
stuff and that's really kind of what got
me interested in robots so maybe taking
a small step back your Bachelor's in
Stanford and philosophy did Masters in
PhD and computer science but the
Bachelor of philosophy uh so what was
that Journey like what elements of
philosophy do you think you bring to
your work in computer science so it's
surprisingly relevant so the part of the
reason that I didn't do a computer
science undergraduate degree was that
there wasn't one at Stanford at the time
but that there's part of philosophy and
in fact Stanford has a special sub major
in something called now symbolic systems
which is logic model Theory formal
semantics of natural language and so
that's actually a perfect preparation
for work in Ai and computer science that
that's kind of interesting so if you
were interested in artificial
intelligence what what kind of Majors
were people even thinking about taking
what in NEOS science was so besides
philosophies what what were you supposed
to do if you were fascinated by the idea
of creating intelligence there weren't
enough people who did that for that even
to be a conversation okay I mean I think
probably probably philosophy I mean it's
interesting in my class my graduating
class of undergraduate philosophers
probably maybe slightly less than half
went on in computer science slightly
less than half went on in law and like
one or two went on in
philosophy uh so it was a common kind of
connection do you think AI researchers
have a role be part-time philosophers or
should they stick to the solid science
and engineering without sort of taking
the philosophizing tangents I mean you
work with robots you think about what it
takes to create intelligent beings uh
aren't you the perfect person to think
about the big picture philosophy of it
all the parts of philosophy that are
closest to AI I think or at least the
closest to AI that I think about are
stuff
like belief and knowledge and denotation
and that kind of stuff and that's you
know it's quite formal and it's like
just one step away from the kinds of
computer science work that we do kind of
routinely I think that there are
important questions still
about what you can do with a machine and
what you can't and so on although at
least my personal view is that I'm
completely a materialist and I don't
think that there's any reason why
we can't make a robot be behaviorally
indistinguishable from a human and the
question of whether it's in
distinguishable internally whether it's
a zombie or not in philosophy terms I
actually don't I don't know and I don't
know if I care too much about that right
but there there is a philosophical
Notions they're mathematical and
philosophical because we don't know so
much of how difficult it is how
difficult is the perception problem how
difficult is the planning problem how
difficult is it to operate in this world
successfully because our robots are not
currently as successful as human beings
in many tasks the the question about the
gap between current robots and human
beings borders a little bit on
philosophy uh you know the the expanse
of knowledge that's required to operate
in this world the ability to uh form
Common Sense knowledge the ability to
reason about uncertainty much of the
work you've been doing there's there's
open questions there that uh I I don't
know required to activate a
certain big picture of view to me that
doesn't seem like a philosophical Gap at
all that's just to me it's there is a
big technical Gap there's a huge
technical Gap but I don't see any reason
why it's more than a technical Gap
perfect so when you mentioned AI you
mentioned SRI and uh maybe can you
describe to me when you first fell in
love with robotics with robots were
inspired uh which so you mentioned uh
flaky or shaky shaky flaky and what what
what was the robot that first captured
your imagination what's possible right
well so the first robot I worked with
was flaky shaky was a robot that the SRI
people had built but by the time I think
when I arrived it was sitting in a
corner of somebody's office dripping
hydraulic fluid into a
pan uh but it's iconic and really every
everybody should read the shaky tech
report because it has so many good ideas
in it I mean they invented a star search
and symbolic planning and learning macro
operators they had uh low-level kind of
configuration space planning for their
robot they had Vision they had all this
the basic ideas of a ton of things can
you take a step by shaky have arms that
what was the job what was the goals
Shakey was a mobile robot but it could
push objects and so it would move things
around with which actuated with with it
s with it with its base okay great um so
it could but it and they had painted the
base boards black uh so it used it used
Vision to localize itself in a map it
detected objects it could detect objects
that were surprising to it uh it would
plan and replan based on what it saw it
reasoned about whether to look and take
pictures I mean it really had the basics
of
of so many of the things that we think
about now um how did it represent the
space around it so it had
representations at a bunch of different
levels of abstraction so it had I think
a kind of an occupancy grid of some sort
at the lowest level uh at the high level
it was uh abstract symbolic kind of
rooms and connectivity so where does
flaky come in yeah okay so I should up
at SRI and the we were building a brand
new robot as I said none of the people
from the previous project were kind of
there or involved anymore so we were
kind of starting from scratch and my
advisor uh was Stan resen Shin he ended
up being my thesis adviser and he was
motivated by this idea of situated
computation or situated at Toma and the
idea
was that the tools of logical reasoning
were
important but possibly
only for the engineers or designers to
use in the analysis of a system but not
necessarily to be manipulated in the
head of the system itself right so I
might use logic to prove a theorem about
the behavior of my robot even if the
robot's not using logic in its head to
prove theorems right so that was kind of
the
distinction and so the idea was to kind
of use those principles to make a robot
do stuff but a lot of the basic things
we had to kind of learn for ourselves
cuz I had zero background in robotics I
didn't know anything about control I
didn't know anything about sensors so we
reinvented a lot of wheels on the way to
getting that robot to do stuff do you
think that was an advantage or hindrance
oh no it's I I I mean I I'm big in favor
of wheel reinvention actually I mean I
think you learned a lot by doing it yes
uh it's important though to eventually
have the pointers to so that you can see
what's really going on but I think you
can appreciate much better the the good
Solutions once you've messed around a
little bit on your own and found a bad
one yeah I think you mentioned
Reinventing reinforcement learning yeah
and referring to uh rewards as Pleasures
pleasure yeah I or I think which I think
is a nice name for it yeah it seems good
to me it's more it's more fun almost do
you think you could tell the history of
AI machine learning reinforcement
learning and how you think about it from
the 50s to now one thing is that it's
oscillates right so Things become
fashionable and then they go out and
then something else becomes cool and
that goes out and so on and I think
there's so there's some interesting
sociological process that actually
drives a lot of what's going on early
days was kind of cybernetics and control
right and the idea that of homeostasis
right people who made these robots that
could I don't know try to plug into the
wall when they needed power and then
come loose and roll around and do stuff
and then I think over time the thought
well that was inspiring but people said
no no no we want to get maybe closer to
what feels like real intelligence or
human intelligence
mhm and then maybe the expert systems
people tried to do
that but maybe a little too
superficially right so oh we get this
surface understanding of what
intelligence is like because I
understand how a steel mill works and I
can try to explain it to you and you can
write it down in logic and then we can
make a computer in for that and then
that didn't work out but what's
interesting I think is when a thing
starts to not be working very well it's
not only do we change methods we change
problems right so it's not like we have
better ways of doing the problem with
the expert systems people were trying to
do we have no ways of trying to do that
problem oh yeah know I think or maybe a
few but we kind of give up on that
problem and we switch to a different
problem and we we work that for a while
and we make progress as a broad
Community as a community and there's a
lot of people who would argue you don't
give up on the problem it's just you uh
decrease the number of people work on it
you almost kind of like put on a shelf
say we'll come back to this 20 years
later yeah that kind of I think that's
right or you might decide that it's
malformed like you might
say it's wrong to just try to make
something that does Superficial symbolic
reasoning behave like a doctor you can't
do that until you've had
the sensory motor experience of being a
doctor or something right so there's
arguments that say that that's problem
was not well formed or it could be that
it is well formed but but we just
weren't approaching it well so you me
mentioned that your favorite part of
logic and symbolic systems is that they
give short names for large sets so there
is some use to this uh they use to some
symbolic reasoning so looking at expert
systems and symbolic Computing what do
you think think are the roadblocks that
were hit in the 80s and 90s ah okay so
right so the fact that I'm not a fan of
expert systems doesn't mean that I'm not
a fan of some kinds of symbolic
reasoning right
so let's see roadblocks well the main
road block I think was that the idea
that humans could articulate their
knowledge effectively into into you know
some kind of logical statements so it's
not just the cost the effort but just
the capability of doing it right because
we're all experts in Vision right but
totally don't have introspective access
into how we do that right and it's true
that I mean I think the idea was well of
course even people then would know of
course I wouldn't ask you to please
write down the rules that you use for
recognizing a water bottle that's crazy
and everyone understood that but we
might ask you to please write down the
rules you use for deciding I don't know
what tie to put on or how to set up a
microphone or something like that but
even those things I think people maybe I
think what they found I'm not sure about
this but I think what they found was
that the so-called experts could give
explanations that sort of post Hawk
explanations for how and why they did
things but they weren't necessarily very
good and then they def they depended
on maybe some kinds of perceptual things
which which again they couldn't really
Define very well so I think I think
fundamentally I think the the underlying
problem with that was the assumption
that people could articulate how and why
they make their
decisions right so it's almost en
encoding the knowledge uh from
converting from expert to something that
a machine could understand and reason
with no no no no not even just in coding
but getting it out of you just right not
not not writing it I mean yes hard also
to write it down for the computer yeah
but I don't think that people can
produce it you can tell me a story about
why you do stuff but I'm not so sure
that's the
why great so there are still on the
hierarchical planning
side places where symbolic reasoning is
very useful so um as as you've talked
about
so where right so don't where's the Gap
yeah okay good so saying that humans
can't provide a description of their
reasoning
processes that's okay fine but that
doesn't mean that it's not good to do
reasoning of various Styles inside a
computer those are just two orthogonal
points so then the question is uh what
kind of reasoning should you do inside a
computer right uh and the answer is I
think you need to do all different kinds
of reasoning inside a computer depending
on what kinds of problems you face
I guess the question is what kind of
things can
you uh
encode symbolically so you can reason
about I think the idea about and and
even symbolic I don't even like that
terminology because I don't know what it
means technically and formally I do
believe in abstractions so abstractions
are critical right you cannot reason at
completely fine grain about everything
in your life right you can't make a plan
at the level of images and torqus for
getting a PhD right so you have to
reduce the size of the state space and
you have to reduce the Horizon if you're
going to reason about getting a PhD or
even buying the ingredients to make
dinner and so so how can you reduce the
spaces and the Horizon of the reasoning
you have to do and the answer is
abstraction spatial abstraction temporal
abstraction I think abstraction along
the lines of goals is also interesting
like you might or well abstraction and
decomposition goals is maybe more of a
decomposition thing so I think that's
where these kinds of if you want to call
it symbolic or discret models come in
you you talk about a room of your house
instead of your pose you talk about uh
you know doing something during the
afternoon instead of at
2:54 and you do that because it makes
you reasoning problem easier and also
because you have you don't don't have
enough information to reason in High
Fidelity about your pose of your elbow
at 2:35 this afternoon anyway right when
you're trying to get a PhD that when
you're doing anything really oh yeah
okay uh except for at that moment at
that moment you do have to reason about
the pose of your elbow maybe but then
you maybe you do that in some continuous
joint space kind of model it so I again
I my biggest point about all of this is
that there should be that Dogma is not
the thing right we shouldn't it
shouldn't be that I in favor against
symbolic reasoning and you're in favor
against neural networks it should be
that just just computer science tells us
what the right answer to all these
questions is if we were smart enough to
figure it out well yeah when you try to
actually solve the problem with
computers the right answer comes out but
you mentioned abstractions I mean NE
networks form abstractions or uh
rather there's there's automated ways to
form abstractions and there's expert
driven way to form abstractions and uh
expert human driven ways and humans just
seems to be way better at forming
abstractions currently and certain
problems so when you're referring to
2:45 a uh p.m. versus
afternoon how do we construct that
taxonomy is there any room for automated
construction of such abstractions oh I
think eventually yeah I mean I think
when we get to be better and machine
learning Engineers will build algorithms
that build awesome abstractions that are
useful in this kind of way that you're
describing yeah so let's then step
from the the abstraction discussion and
let's talk about uh bomb
mdp's partially observable marov
decision processes so uncertainty so
first what are marov decision processes
what are Market decision and maybe how
much of our world can be models mdps how
much when when you wake up in the
morning and making breakfast how do do
you think of yourself as an mdp and so
how do you think about mdps and how they
relate to our world well so there's a
stance question right so a stance is a
position that I take with respect to a
problem so I as a researcher or a person
who designed systems can decide to make
a model of the world around me in some
terms right so I take this messy world
and I say I'm going treat it as if it
were a problem of this formal kind and
then I can apply solution Concepts or
algorithms or whatever to solve that
formal thing right so of course the
world is not anything it's not an mdp or
a pomdp I don't know what it is but I
can model aspects of it in some way or
some other way and when I model some
aspect of it in a certain way that gives
me some set of algorithms I can use you
can model the world in all kinds of ways
uh some have some are more accepting of
uncertainty more easily modeling
uncertainty of the world some really
Force the world to be
deterministic and so certainly
mdps uh model the uncertainty of the
world yes model some uncertainty they
model not present State uncertainty but
they model uncertainty in the way the
future will unfold right yeah so what
are Markov decision process so Markov
decision process is a model it's a kind
of a model that you could make that says
I I know completely the current state of
my system and what it means to be a
state is that I that all the I have all
the information right now that will let
me make predictions about the future as
well as I can so that remembering
anything about my history wouldn't make
my predictions any better
um and but but then it also says that
that then I can take some actions that
might change the state of the world and
that I don't have a deterministic model
of those changes I have a a
probabilistic model of how the world
might change uh it's a it's a useful
model for some kinds of systems I think
it's a I mean it's
certainly not a good model
for most problems I think because for
most problems you don't actually know
the
state uh for most problems you it's
partially observed so that's now a
different problem class so okay that's
where the PM DPS the POS obser Markov
decision processes step in so how do
they address the fact that you can't
observe most uh you have incomplete
information about most of the world
around you right so now the idea is we
still kind of postulate that there
exists a state we think that there is
some information about the world out
there such that if we knew that we could
make good predictions but we don't know
the state and so then we have to think
about how but we do get observations
maybe I get images or I hear things or I
feel things and those might be local or
noisy and so therefore they don't tell
me everything about what's going on and
then I have to reason about given the
history of actions I've taken and
observations I've gotten what do I think
is going on in the world and then given
my own kind of uncertainty about what's
going on in the world I can decide what
actions to take and so how difficult is
this problem of planning under
uncertainty in your view in your long
experience with modeling the world
trying to deal with this uncertainty in
especially in World Systems optimal
planning for even discret pom DPS can be
undecidable depending on how you set it
up and for
so lots of people say I don't use pomdps
because they are
intractable and I think that that's a
kind of a very funny thing to say
because the problem you have to solve is
the problem you have to solve so if the
problem you have to solve is intractable
that's what makes us AI people right so
uh we solve we understand that the
problem we're solving is is complet
wildly intractable that we can't we will
never be able to solve it optimally at
least I don't yeah right so later we can
come back to an idea about bounded
optimality and something but anyway I we
can't come up with Optimal Solutions to
these problems so we have to make
approximations approximations in
modeling approximations in solution
algorithms and so on and so I don't have
a problem with
saying yeah my problem actually it is
pomdp and continuous space with
continuous observations and it's so
computationally complex I can't even
think about it's you know bigo
whatever but that doesn't prevent me
from it helps me gives me some clarity
to think about it that way and to then
take steps to make approximation after
approximation to get down to something
that's like computable in some
reasonable time when you think about
optimality you know the community
broadly has shifted on on that I think a
little bit
and how much they value the idea of uh
optimality of chasing an optimal
solution how is your views of chasing an
optimal solution uh changed over the
years and when you work with robots
that's interesting I I think we have a
little bit of a
methodological crisis actually from the
theoretical side I mean I do think that
theory is important and that right now
we're not doing much of it so there's
lots of empirical hacking around and
training this and doing that and
Reporting numbers but is it good is it
bad we don't know we it's very hard to
say
things and if you look at like computer
science theory so people talked for a
while everyone was about solving
problems optimally or completely and and
then there were interesting relaxations
right so people look at oh can I are
there regret bounds or can I do some
kind of um you know approximation can I
prove something that I can approximately
solve this problem or that I get closer
to the solution as I spend more time and
so on what's interesting I think is that
we don't have good approximate solution
concepts
for very difficult problems right I like
to you know I like to say that I I'm
interested in doing a very bad job of
very big problems
uh
quote right so very job very big
problems I like to do that but I would I
wish I could say something I wish I had
a I don't know some kind of a of a
formal solution concept that I could use
to say oh this this algorithm actually
it it gives me something like I know
what I'm going to get I can do something
other than just run it and get out 6
that notion is still somewhere deeply
compelling to you the notion that you
can say you can drop thing on the table
says this you can expect that this ALG
will give me some good results I hope
there's I hope science will I mean
there's engineering and there's science
I think that they're not exactly the
same and I think right now we're making
huge engineering like Leaps and Bounds
so that engineering is running way ahead
of the science which is cool and often
how it goes right so we're making things
and nobody knows how and why they work
roughly but we need to turn that into
science I think there's some form it's
uh yeah there's some room for
formalizing we need to know what the
principles are why does this work why
does that not work I mean for a while
people build Bridges by trying but now
we can often predict whether it's going
to work or not without building it can
we do that for learning systems or for
robots see your hope is from a
materialistic perspective that
intelligence artificial intelligence
systems robots I kind I just more
fancier Bridges
belief space what's the difference
between belief space and state space so
you mentioned mdps spam DPS you
reasoning uh about you sense the world
there's a state uh what What's this
belief space idea yeah that sounds so
good it sounds good so belief space that
is instead of thinking about what's the
state of the world and trying to control
that as a robot I think about what is
the space of belief belief that I could
have about the world what's if I think
of a belief as a probability
distribution of our ways the world could
be a belief State as a distribution and
then my control problem if I'm reasoning
about how to move through a world I'm
uncertain
about my control problem is actually the
problem of controlling my beliefs so I
think about taking actions not just what
effect they'll have on the world outside
but what effect I'll have on my own
understanding of the world outside and
so that might compel me to ask a
question or look somewhere to gather
information which may not really change
the world state but it changes my own
belief about the world that's a powerful
way
to to empower the agent to reason about
the world to explore the world uh what
kind of problems does it allow you to
solve to to uh consider belief Space
versus just State space well any problem
that requires deliberate information
gathering right so if in some
problems like chess there's no
uncertainty or maybe there's uncertainty
about the opponent um there's no
uncertainty about the state uh and some
problems there's uncertainty but you
gather information as you go right you
might say oh I'm driving my autonomous
car down the road and it doesn't know
perfectly where it is but the Liars are
all going all the time so I don't have
to think about whether to gather
information but if you're a human
driving down the road you sometimes look
over your shoulder to see what's going
on behind you in the lane and you have
to decide whether you should do that now
and you have to trade off the fact that
you're not seeing in front of you when
you're looking behind you and how
valuable is that information and so on
and so to make choices about information
gathering you have to reason in belief
space Also also I mean also to
just take into account your own
uncertainty before trying to do things
so
you might say if I understand where I'm
standing relative to the door jam uh
pretty accurately then it's okay for me
to go through the door but if I'm really
not sure where the door is then it might
be better to not do that right now the
degree of your uncertainty about about
the world is actually part of the thing
you're trying to optimize in forming the
plan right that's right so this idea of
a long Horizon of planning for a PhD or
just even how to get out of the house or
how to make breakfast you show this
presentation of the the WTF where's the
fork uh of robot looking at a sink uh
and uh uh can you describe how we plan
in this world of this idea of
hierarchical planning we've
mentioned so so yeah how can a robot
hope to plan about
something this was such a long heride
where the goal is quite far away people
since probably reasoning began have
thought about hierarchical reasoning the
temporal hierarchy in partic well there
spal hierarchy but let's talk about
temporal hierarchy so you might say oh I
have this
long uh execution I have to do but I can
divide it into some segments abstractly
right so maybe I have to get out of the
house I have to get in the car I have to
drive so on and
so you can plan if you can build
abstractions so this we started out by
talking about abstractions and we're
back to that now if you can build
abstractions in your state
space and abstractions sort of temporal
abstractions then you can make plans at
a high level and you can say I'm going
to go to town and then I'll have to get
gas and then I can go here and I can do
this other thing and you can reason
about the dependencies and constraints
among these actions again without
thinking about the complete
details what we do in our hierarchical
planning work is then say all right I
make a plan at a high level of
abstraction I have to have some reason
to think that it's feasible without
working it out in complete detail and
that's actually the interesting step I
always like to talk about walking
through an airport
like you can plan to go to New York and
arrive at the airport and then find
yourself in an office building later you
can't even tell me in advance what your
plan is for walking through the airport
partly because you're too lazy to think
about it maybe but partly also because
you just don't have the information you
don't know what gate you're Landing in
or what people are going to be in front
of you or
anything so there's no point in planning
in detail but you have to have you have
to make a leap of faith that you can
figure it out once you get there and
it's really interesting to me how you
arrive at that how do you so you have
learned over your lifetime to be able to
make some kinds of predictions about how
hard it is to achieve some kinds of sub
goals MH and that's critical like you
would never plan to fly somewhere if you
couldn't didn't have a model of how hard
it was to do some of the intermediate
steps so one of the things we're
thinking about now is how do you do this
kind of very aggressive
generalization uh to situations that you
haven't been in and so on to predict how
long will it take to walk through the
koala lour airport like you could give
me an estimate and it wouldn't be crazy
and you have to have an estimate of that
in order to make plans that involve
walking through the qual po airport even
if you don't need to know it in detail
so I'm really interested in these kinds
of abstract models and how do we acquire
them but once we have them we can use
them to do hierarchical reasoning which
is I think is very important yeah
there's this notion of go uh goal
regression and pre-image back chaining
this idea of starting at the goal and
just forming these big clouds of States
you I mean it's almost like saying to
the airport you know you you know once
you show up to the uh the airport that
that's you're like a few steps away from
the goal so like thinking of it this way
uh is kind of interesting I don't know
if you have sort of further comments on
that uh of starting at the goal why
that's yeah I mean it's interesting that
Simon herb Simon back in the early days
of AI did talked a lot about mean Zen's
reasoning and reasoning back from the
goal there's a kind of an intuition that
people have that
the number of that state space is Big
the number of actions you could take is
really big so if you say here I sit and
I want to search forward from where I am
what are all the things I could do
that's just overwhelming if you say if
you can reason at this other level and
say Here's what I'm hoping to achieve
what could I do to make that true that
somehow the branching is smaller now
what's interesting is that like in the
AI planning community that hasn't worked
out in the class of problems that they
at and the methods that they tend to use
it hasn't turned out that it's better to
go backward um it's still kind of my
intuition that it is but I can't prove
that to you right now right I share your
intuition at least for us mere
humans speaking of which uh when you uh
maybe now we take a take a take a little
step into that philosophy Circle uh how
hard would it when you think about human
life you you give those examples often
how hard do you think it is to formulate
human life as a planning problem or
aspects of human life so when you look
at robots you're often trying to think
about object
manipulation uh tasks about moving a
thing when when you take a slight step
outside the room let the robot leave and
go get lunch uh or maybe try to uh
pursue more fuzzy goals how hard do you
think is that problem if you were to try
to maybe put another way try to
formulate human life as as a planning
problem well that would be a mistake I
mean it's not all a planning problem
right I think it's really really
important that we understand that you
have to put together pieces and parts
that have different styles of reasoning
and representation and learning I think
I think it's it's seems probably clear
to anybody that that you can't all be
this or all be that brains aren't all
like this or all like that right they
have different pieces and parts and
substructure and so on so I don't think
that there's any good reason to think
that there's going to be like one true
algorithmic thing that's going to do the
whole job so it's a bunch of pieces
together uh designed to solve a bunch of
specific problem one specific uh or
maybe styles of problems I mean there's
probably some reasoning that needs to go
on in image space I think
again there's this model base versus
model free idea right so in
reinforcement learning people talk about
oh should I
learn I could learn a policy just
straight up a way I behaving I could
learn it's popular learn a value
function that's some kind of weird
intermediate ground uh or I could learn
a transition model which tells me
something about the Dynamics of the
world if I take a trans imagine that I
learn a transition model and I couple it
with a planner and I draw a box around
that I have a policy again it's just
stored a different
way right right it's and but it's just
as much of a policy as the other policy
it's just I've made I think the way I
see it is it's a time space tradeoff in
computation right a more overt policy
representation maybe it takes more space
but maybe I can compute quickly what
action I should take on the other hand
maybe a very compact model of the world
Dynamics plus a planner lets me compute
what action to take two just more slowly
there's no I mean I don't think there's
no argument to be had it's just like a
question of what form of computation is
best for us for the various sub problems
right so and and so like learning to do
algebra manipulations for some reason is
I mean that's probably going to want
naturally a sort of a different
representation than rioting a unicycle
right the time constraints on the
unicycle are serious the state space is
may be smaller I don't know but so I and
there could be the more human sides of
falling in love having a relationship
that might be
another uh another sty have no idea how
to model that yeah let's let's first
solve the algebra and the object
manipulation uh what do you think is
harder perception or planning perception
that's why understanding
that's uh so what do you think is so
hard about perception about
understanding the world around you well
I I mean I think the big question
is representational hugely the question
is representation right
so perception has made great strides
lately right and we can classify images
and we
can play certain kinds of games and
predict how to steer the car and all
that sort of stuff
um I don't think we have a very good
idea
of what perception should deliver right
so if you if you believe in modularity
okay there's there's a very strong view
which
says we shouldn't build in any
modularity we should make a giant
gigantic neural network train it end to
end to do the thing and that's the best
way forward and it's hard to argue with
that except on a sample complexity basis
right so you might say oh well if I want
to do endtoend reinforcement learning on
this giant giant neural network it's
going to take a lot of data and a lot of
like broken robots and
stuff
so then the only answer is to say okay
we have to build something in build in
some structure or some bias we know from
theory of machine learning the only way
to cut down the sample complexity is to
kind of cut down somehow cut down this
the hypothesis space you can do that by
building in bias there's all kinds of
reason to think that nature built bias
into
humans um
convolution is a bias right it's a very
strong bias and it's a very critical
bias so my own view is that we should
look for more things that are like
convolution but that address other
aspects of reasoning right so
convolution helps us a lot with a
certain kind of spatial reasoning that's
quite close to the
Imaging I think there's other ideas like
that maybe some them out of forward
search maybe some Notions of abstraction
maybe the notion that objects exist
actually I think that's pretty important
and a lot of people won't give you that
to start with right so almost like a
convolution in the uh uh uh in the
object semantic object space of some
kind some kind some kind of ideas in
there that's right and people are St
like the graph graph convolutions are an
idea that are related to Rel relational
representations and so so I think there
are so you I've come far a field from
perception but I think um I think the
thing that's going to make perception
that kind of the next step is actually
understanding better what it should
produce right so what are we going to do
with the output of it right it's fine
when what we're going to do with the
output is steer it's less clear when
we're just trying to make a one
integrated intelligent agent what should
the output of perception be we have no
idea and how should that hook up to the
other stuff we don't know right so I
think the pr question is what kinds of
structure can we build in that are like
the moral equivalent of convolution that
will make a really awesome super
structure that then learning can kind of
progress on efficiently I agree very
compelling description of actually where
we stand with the perception problem uh
you're teaching a course on EMB body
intelligence what do you think it takes
to build a robot with human level
intelligence I don't know if we knew we
would do it
if you were to I mean okay so do you
think a robot needs to have a uh
self-awareness uh
Consciousness fear of mortality or is it
is it simpler than that or is
consciousness a simple thing like do you
do you think about these Notions I don't
think much about Consciousness even most
philosophers who care about it will give
you that you could have robots that are
zombies right that behave like humans
but are not conscious and I at this
moment would be happy enough with that
so I'm not really worried one way or the
other so then the technical side you're
not thinking of the use of
self-awareness um well but I okay but
then what does self-awareness mean I
mean that you need to have some part of
the system that can observe other parts
of the system and tell whether they're
working well or not that seems critical
so does that count as I mean does that
count as self-awareness or not well it
depends on whether you think that
there's somebody at home who can
articulate whether they're self-aware
but clearly if I have like you know some
piece of code that's counting how many
times this procedure gets
executed that's a kind of self-awareness
right so there's a big Spectrum it's
clear you have to have some of it right
you know we're quite far away on many
dimensions but is there a direction of
research that's most compelling to you
for you know trying to achieve human
level intelligence in in our robots well
to me I guess the thing that seems most
compelling to me at the moment is this
question of what to build in and what to
learn um I
think we're we don't we're missing a
bunch of ideas and and we you know
people you know don't you dare ask me
how many years it's going to be till
that happens because I won't even
participate in the conversation because
I think we're missing ideas and I don't
know how long it's going to take to find
them so I won't ask you how many years
but uh maybe I'll ask
you what it when you'll be sufficiently
impressed that we've achieved it so
what's what's uh a good test of
intelligence do you like the touring
test the natural language in the robotic
space is there something where you would
sit back and think oh that's that's
pretty impressive uh as a test as a
benchmark do you you think about these
kinds of problems no I I resist I mean I
think all the time that we spend arguing
about those kinds of things could be
better spent just making the robots work
better uh so you don't value competition
so I mean there's the nature of
Benchmark benchmarks and data sets or
touring test challenges where everybody
kind of gets together and tries to build
a better robot cuz they want to out
compete each other like the Dara
challenge with the autonomous
vehicles do you see the value of
that or can get in the way I think it
can get in the way I mean some people
many people find it motivating and so
that's good I find it anti motivating
personally yeah uh but I think what I
mean I think you get an interesting
cycle where for a contest a bunch of
smart people get super motivated and
they hack their brains out and much of
what gets done as just hacks but
sometimes really cool ideas emerge and
then that gives us something to chew on
after that so I'm I it's not a thing for
me but I don't I don't regret that other
people do it yeah it's like you said
with everything else the mix is good so
jumping topics a little bit he started
the Journal of machine learning research
and served as its editorinchief
uh how did the publication come
about and uh what do you think about the
current publishing model space in
machine learning artificial intelligence
okay good so it came about because there
was a journal called machine learning
which still exists which was owned by
cluer and
there was I was on the editorial board
and we used to have these meetings
annually where we would complain to clu
that it was too expensive for the
libraries and that people couldn't
publish and we would really like to have
some kind of relief on those fronts and
they would always sympathize but not do
anything so uh we just decided to make a
new journal and uh there was the Journal
of AI research which has was on the same
model which had been in existence for
maybe five years or so and it was going
along pretty well
so uh we just made a new Journal it
wasn't I mean it um I don't know I guess
it was work but it wasn't that hard so
basically the editorial board probably
75% of the editorial board of uh machine
learning resigned and we founded the new
Journal but it was sort of it was more
open yeah right so it's completely open
it's open access actually uh I had a
post do George conidaris who wanted to
call these journals freefor
all uh because there were I mean it both
has no page charges and has
no uh uh access restrictions and the
reason and so lots of people I mean for
there were there were people who are mad
about the existence of this journal who
thought it was a fraud or something it
would be impossible they said to run a
journal like this with basically I mean
for a long time I didn't even have aank
account uh I paid for the lawyer to
incorporate and the IP address and it
just didn't cost a couple hundred
dollars a year to run it's a little bit
more now but not that much more but it's
because I think computer scientists are
competent and autonomous in a way that
many scientists in other fields aren't I
mean at doing these kinds of things we
already types that around papers we all
have students and people who can hack a
website to together in the afternoon so
the infrastructure for us was like not a
problem but for other people in other
fields it's a harder thing to do yeah
and this kind of Open Access Journal is
nevertheless one of the most prestigious
journals so it's not like um a Prestige
and it can be achieved without any of
the paper is not required for Prestige
turns out yeah so on the review process
side I've actually a long time ago I
don't remember when I reviewed a paper
where you were also a reviewer and I
remember reading your review and being
influenced by it it was really well
written it influenced how I write
feature reviews uh you disagreed with me
actually uh and you made it uh my review
much better so but nevertheless the
review process you know has its uh flaws
and how do you think what do you think
works well how how can it be improved so
actually when I started jamr I wanted to
do something completely
different and I didn't because it felt
like we needed a traditional Journal of
record and so we just made jamr be
almost like a normal Journal except for
the Open Access parts of it basically
um increasingly of course publication is
not even a sensible word you can publish
something by putting it in archive so I
can publish everything tomorrow so
making stuff public is there's no
barrier
we still need
curation and evaluation I don't have
time to read all of
archive and you could argue
that kind of social thumbs uping of
Articles suffices right you might say oh
heck with this we don't need journals at
all we'll put everything on archive and
people will upload and down about the
Articles and then your CV will say oh
man they he got a lot of up votes so uh
that's good um but I think there's
still value
in careful reading and commentary of
things and it's hard to tell when people
are up voting and down voting or arguing
about your paper on Twitter and Reddit
whether they know what they're talking
about right so then I have the second
order problem of trying to decide whose
opinions I should value and such so I
don't know I what I if I had infinite
time which I don't and I'm not going to
do this because I really want to make
robots work but if I felt inclined to do
something more in the publication
Direction I would do this other thing
which I thought about doing the first
time which is to get together some set
of people whose opinions I value and who
are pretty articulate and I guess we
would be public although we could be
private I'm not sure and we would review
papers we wouldn't publish them and you
wouldn't submit them we would just find
papers and we would write reviews MH and
we would make those reviews public and
maybe if you you know so we're Leslie's
friends who review papers and maybe
eventually if if we are opinion was
sufficiently valued like the opinion of
jmr is valued then you'd say on your CV
that lesli's friends gave my paper a
five-star reading and that would be just
as good as saying I got it you know
accepted into this journal um so I think
I think we should have good public
commentary uh and organize it in some
way but I don't really know how to do it
it's interesting times the way the the
way you describe it actually is is
really interesting I mean we do it for
movies
imdb.com there's a experts critics come
in they write reviews but there's also
regular non- critics humans write
reviews and they're separated I like
open review open the the the I uh I
clear process I think is interesting
it's a step in the right direction but
it's still not as compelling as uh
reviewing movies or video games I mean
it sometimes almost it might be silly at
least from my perspective to say but it
boils down to the user interface how fun
and easy it is to actually perform the
reviews how efficient how much you as a
reviewer get uh street cred for being a
good reviewer those ele those human
elements come into play
no it's a big investment to do a good
review of a paper and the flood of
papers is out of control right so you
know there aren't 3,000 new I don't know
how many new movies are there in a year
I don't know but that's probably going
to be less than how many machine
learning papers there are in a year now
and I'm worried I you know I
I H right so I'm like an old person so
of course I'm going to say raar raar
raar things are moving too fast I'm a
stick in the mud
uh so I can say that but my particular
flavor of that
is I think the Horizon for researchers
has gotten very short that students want
to publish a lot of papers and there's a
huge there's value it's exciting and
there's value in that and you get patted
on the head for it and so
on but and some of that is fine but I'm
worried that we're driving out
people who would spend two years
thinking about
something back in my day when we worked
on our thesis we did not publish papers
you did your thesis for years you picked
a hard problem and then you worked and
chewed on it and did stuff and wasted
time and for a long time and when it was
roughly when it was done you would write
papers and so I I don't know how to in
and I don't think that everybody has to
work in that mode but I think there's
some problems that are hard enough that
it's important to have a longer research
Horizon and I'm worried that we don't
incentivize that at all at this point in
this current structure yeah so what do
you
see as uh what are your hopes and fears
about the future of AI and continuing on
this theme so AI has gone through a few
Winters ups and downs do you see another
winter of AI
coming or do you more hopeful
uh about making robots work as he said I
think the Cycles are inevitable but I
think each time we we get higher right I
mean so you know it's it's like climbing
some kind of landscape with a noisy uh
Optimizer yeah so it's clear that the
the you know the Deep learning stuff
has made deep and important improvements
and so the high water mark is now higher
there's no question but of course I
think people are overselling and
eventually uh investors I guess and
other people look around and say well
you're not quite delivering on this
Grand claim and that wild
hypothesis so probably it's going to
crash some amount and
then it's okay I mean it but I don't I I
can't imagine that there's like some
awesome monotonic improvement from here
to human level AI
so in uh you know I have to ask this
question I probably anticipate answers
the answers but uh do you have a worry
shortterm or long term about the
existential threats of AI and U maybe
shortterm less existential but more uh
robots taking away
jobs well actually let let me talk a
little bit about
utility actually I had an interesting
conversation with some military

Resume

# Filosofi, Robotika, dan Masa Depan AI: Wawancara Mendalam bersama Leslie Kaelbling

### Inti Sari (Executive Summary)
Video ini membahas perjalanan intelektual Leslie Kaelbling, seorang profesor robotika dan AI di MIT, yang mengawali karirnya dari bidang filsafat sebelum akhirnya merintis terobosan dalam *reinforcement learning* dan perencanaan robot. Kaelbling mengeksplorasi evolusi kecerdasan buatan, mulai dari kegagalan sistem pakar (*expert systems*) hingga kebangkitan pembelajaran mendalam (*deep learning*), dengan menekankan pentingnya abstraksi, penanganan ketidakpastian (POMDP), dan keseimbangan antara apa yang harus dibangun (*built-in*) dan apa yang harus dipelajari (*learned*). Diskusi juga menyentuh isu etika, masa depan publikasi ilmiah, dan pandangan pragmatis tentang kesadaran dalam robot.

### Poin-Poin Kunci (Key Takeaways)
*   **Latar Belakang Filsafat:** Latar belakang Kaelbling di bidang filsafat dan *Symbolic Systems* memberinya fondasi logika yang kuat, yang ia terapkan pada pemrograman robot.
*   **Kegagalan Sistem Pakar:** Upaya awal AI untuk meniru kecerdasan manusia melalui *expert systems* gagal karena manusia tidak dapat mengartikulasikan pengetahuan intuitif mereka (seperti penglihatan) ke dalam aturan logis.
*   **Pentingnya Abstraksi:** Untuk memecahkan masalah yang kompleks, AI memerlukan abstraksi spasial dan temporal untuk mengurangi ruang keadaan (*state space*) dan membuat perencanaan menjadi efisien.
*   **MDP dan POMDP:** *Markov Decision Processes* (MDP) dan *Partially Observable MDPs* adalah model standar untuk mengambil keputusan di dunia yang tidak pasti, meskipun seringkali memerlukan pendekatan aproksimasi karena sifatnya yang sulit dihitung (*intractable*).
*   **Pertengahan Jalan (*The Middle Ground*):** Masa depan AI bukanlah tentang pemrograman manual (introspeksi) atau sekadar melatih "neural goo" (jaringan saraf raksasa), melainkan menemukan kombinasi yang tepat antara pembelajaran dan struktur yang dibangun sebelumnya.
*   **Revolusi JMLR:** Kaelbling berperan penting dalam mendirikan *Journal of Machine Learning Research* (JMLR) sebagai gerakan akses terbuka untuk melawan monopoli penerbit akademik yang mahal.

---

### Rincian Materi (Detailed Breakdown)

#### 1. Perjalanan Karir dan Inspirasi Awal
*   **Latar Belakang Pendidikan:** Kaelbling mengambil sarjana Filsafat di Stanford karena saat itu jurusan Ilmu Komputer belum tersedia. Ia kemudian melanjutkan ke Magister dan PhD di bidang Ilmu Komputer.
*   **Pengaruh Buku:** Buku *Gödel, Escher, Bach* karya Douglas Hofstadter memberinya pengaruh besar tentang bagaimana primitif sederhana dapat dikombinasikan untuk menghasilkan perilaku cerdas.
*   **Filsafat vs Robotika:** Sebagai seorang materialis, Kaelbling percaya bahwa celah antara robot dan manusia adalah masalah teknis (persepsi, perencanaan, *common sense*), bukan masalah filosofis. Ia tidak terlalu memusingkan masalah "zombie" (manusia yang berperilaku tapi tidak sadar) selama perilaku eksternalnya tidak dapat dibedakan.
*   **Pekerjaan Pertama di SRI:** Di SRI, ia bekerja dengan robot "Flaky", penerus robot legendaris "Shakey". Pengalaman ini memicu ketertarikannya pada robotika dan *reinforcement learning*, di mana ia dan timnya "menemukan kembali roda" karena kurangnya latar belakang teknik kontrol.

#### 2. Evolusi AI dan Kegagalan Sistem Pakar
*   **Perubahan Mode:** AI mengalami pasang surut. Setelah era robotika awal, fokus beralih ke *expert systems* yang mencoba meniru logika pakar manusia.
*   **Mengapa Sistem Pakar Gagal:** Asumsi bahwa manusia dapat menjelaskan bagaimana mereka berpikir ternyata salah. Manusia tidak memiliki akses introspektif ke proses kognitif tingkat rendah (seperti mengenali wajah atau berjalan). Penjelasan pakar seringkali hanya rasionalisasi pasca-faktum.
*   **Filsafat Penalaran Simbolik:** Kaelbling tidak menolak penalaran simbolik. Ia berpendapat bahwa simbolisme berguna untuk perencanaan hirarkis, tetapi tidak semua penalaran harus simbolik. Solusi harus ditentukan oleh kebutuhan masalah, bukan oleh ideologi.

#### 3. Ketidakpastian, MDP, dan Ruang Keyakinan (Belief Space)
*   **MDP (Markov Decision Processes):** MDP adalah model di mana keadaan saat ini memuat semua informasi untuk masa depan, namun aksi memiliki hasil probabilistik.
*   **POMDP (Partially Observable MDP):** Karena dunia nyata jarang sepenuhnya teramati, kita menggunakan POMDP. Robot harus menalar tentang sejarah aksi dan observasi untuk membentuk keyakinan (*belief*) tentang dunia.
*   **Intractability:** Perencanaan di bawah ketidakpastian adalah masalah yang sangat sulit secara komputasi. Namun, Kaelbling berpendapat bahwa "sulit" bukan alasan untuk menyerah, melainkan alasan untuk mencari aproksimasi yang lebih baik.
*   **Kontrol Keyakinan:** Robot tidak hanya mengontrol dunia fisik, tetapi juga mengontrol keyakinannya sendiri (misalnya: memutuskan untuk melihat ke bahu saat mengemudi untuk mengurangi ketidakpastian, meski berisiko kehilangan fokus di depan).

#### 4. Perencanaan Hirarki dan Representasi
*   **Abstraksi Waktu:** Untuk merencanakan tujuan jangka panjang (seperti mendapatkan PhD), manusia menggunakan abstraksi waktu (misal: "tulis tesis" vs "gerakkan siku"). Robot juga membutuhkan ini untuk menghindari ledakan kombinatorial.
*   **Back Chaining:** Secara intuitif, merencanakan mundur dari tujuan (*goal regression*) masuk akal untuk mengurangi percabangan, meskipun dalam komunitas AI modern, metode ini tidak selalu terbukti lebih baik daripada pencarian maju.
*   **Kehidupan Manusia:** Kehidupan manusia tidak bisa sepenuhnya dimodelkan sebagai satu masalah perencanaan besar. Otak menggunakan kombinasi gaya penalaran, representasi, dan pembelajaran yang berbeda untuk masalah yang berbeda.
*   **Model-Based vs Model-Free:** Ada trade-off antara menyimpan kebijakan (*policy*) yang besar (cepat eksekusinya) vs menyimpan model dunia yang ringkas (lebih lambat karena butuh perencanaan).

#### 5. Persepsi, Kesadaran, dan Jurnal JMLR
*   **Masa Depan Persepsi:** Konvolusi adalah *bias* yang kuat dalam pembelajaran mendalam. Tantangan berikutnya adalah menemukan struktur lain (seperti abstraksi objek atau relasional) yang memungkinkan pembelajaran efisien.
*   **Kesadaran (*Consciousness*):** Kaelbling tidak terlalu memikirkan kesadaran filosofis. Ia menerima konsep "zombie" dan akan puas jika robot berperilaku persis seperti manusia. Secara teknis, robot memerlukan bentuk kesadaran diri untuk memonitor kinerja komponennya sendiri.
*   **Sejarah JMLR:** Kaelbling menceritakan bagaimana dewan editorial *Journal of Machine Learning* mengundurkan diri secara massal dari penerbit Kluwer karena biaya langganan yang mahal. Mereka mendirikan JMLR sebagai jurnal akses terbuka, gratis, dan berbasis sukarelawan, yang terbukti sangat sukses dan efisien.

#### 6. Horizon Riset, Siklus AI, dan Ancaman Eksistensial
*   **Krisis Metodologis:** Saat ini, rekayasa AI berjalan lebih cepat daripada sainsnya. Kita banyak melakukan "peretasan" (*hacking*) empiris tanpa teori yang kuat untuk memprediksi mengapa suatu sistem bekerja.
*   **Siklus Musim AI:** AI mengalami siklus musim dingin dan panas. Meskipun *overselling* AI saat ini mungkin akan menyebabkan kekecewaan investor di masa depan, puncak kemajuan setiap siklus selalu lebih tinggi dari sebelumnya.
*   **Fungsi Objektif:** Ancaman terbesar bukanlah robot jahat seperti di film fiksi ilmiah, melainkan kesalahan dalam merancang fungsi objektif (*objective function*). Kita harus belajar merancang tujuan yang selaras dengan nilai manusia (*value alignment*), karena kita tidak bisa memprediksi bagaimana algoritma optimasi akan menafsirkan perintah kita secara harfiah.

#### 7. Strategi Rekayasa: Menemukan Jalan Tengah
*   **Dua Ekstrem yang Salah:**
    1.  **Introspeksi Murni:** Mencoba memprogram segala hal secara manual (telah terbukti gagal).
    2.  **Neural Goo:** Membuat jaringan saraf raksasa dan berharap ia bisa belajar segalanya dari nol (Kaelbling meragukan ini akan berhasil).
*   **Jalan Tengah:** Masa depan robotika terletak pada kombinasi antara pembelajaran (*learning*) dan struktur yang sudah dibangun (*built-in*).

## Kesimpulan & Pesan Penutup
Wawancara ini menegaskan bahwa masa depan AI terletak pada keseimbangan antara struktur yang dibangun sebelumnya dan pembelajaran, bukan

Read

file updated 2026-02-13 13:25:47 UTC