Transcript

c9AbECvRt20 • Michael Littman: Reinforcement Learning and the Future of AI | Lex Fridman Podcast #144
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0473_c9AbECvRt20.txt
Back Raw
Kind: captions
Language: en
the following is a conversation with
michael littman a computer science
professor at brown university
doing research on and teaching machine
learning
reinforcement learning and artificial
intelligence
he enjoys being silly and lighthearted
in conversation
so this was definitely a fun one quick
mention of each sponsor
followed by some thoughts related to the
episode thank you to
simply safe a home security company i
use to monitor
and protect my apartment expressvpn the
vpn i've used for many years to protect
my privacy and the internet
masterclass online courses that i enjoy
from some of the most amazing humans in
history
and better help online therapy with a
licensed professional
please check out the sponsors in the
description to get a discount and
to support this podcast as a side note
let me say that i may experiment with
doing some solo episodes in the coming
months
or two the three ideas i have floating
in my head
currently is to use one a particular
moment in history
two a particular movie or three a book
to uh drive a conversation about a set
of uh related concepts
for example i could use 2001 a space
odyssey or x machina
to talk about agi for one two three
hours
or i could do an episode on the yes
rise and fall of hitler and stalin
each in a separate episode using
relevant books and historical moments
for reference i find the format of a
solo episode
very uncomfortable and challenging but
that just tells me
that it's something i definitely need to
do and learn from the experience
of course i hope you come along for the
ride also
since we have all this momentum built up
on announcements
i'm giving a few lectures on machine
learning at mit this january
in general if you have ideas for the
episodes
for the lectures or for just short
videos on youtube
let me know in the comments that i
still definitely read despite my better
judgment
and the wise sage device of the great
joe rogan if you enjoy this thing
subscribe on youtube
review it with five stars on apple
podcast follow on spotify
support on patreon or connect with me on
twitter
lex friedman and now here's my
conversation
with michael littman i saw a video of
you talking to
charles this bell about westworld the tv
series
you guys were doing a kind of thing
where you're watching new things
together but let's
rewind back is there a sci-fi
movie or book or shows
that you that was profound that had an
impact on you philosophically or just
like
specifically something you enjoyed
nerding out about
yeah interesting i think a lot of us
have been inspired by robots in movies
the one that i really like is uh
there's a movie called robot and frank
which i think is really interesting
because it's very near-term
future where uh robots are being
deployed as
uh helpers in people's homes and it was
it was
and we don't know how to make robots
like that at this point but it seemed
very plausible it seemed very
realistic or imaginable and i thought
that was really cool because
they did they're awkward they do funny
things it raised some interesting issues
but
it seemed like something that would
ultimately be helpful and good if we
could do it right
yeah he was an older cranky gentleman
right he was an older cranky
uh jewel thief yeah it's kind of funny
little
thing which is you know he's a dual
thief and so he
pulls the robot into his life which is
like which is something you could
imagine
taking a home robotics thing
and pulling into whatever quirky thing
that's involved in your this is
meaningful to you exactly so yeah and i
think i think from that perspective i
mean not all of us are jewel thieves and
so when we bring our robots into
it for yourself uh explains a lot about
this apartment actually
but no the idea that that people should
have the ability to
you know make this technology their own
that that it becomes part of their lives
and and i think that's
it's hard for us as technologists to
make that kind of technology it's easier
to mold people into what we need them to
be
and um just that opposite vision i think
is really inspiring
and then there's a anthropomorphization
where we project
certain things on them because i think
the robot was kind of dumb
but i have a bunch of roombas that play
with and they you immediately project
stuff onto them much greater level of
intelligence we'll probably do that with
each other too
much much greater degree of compass
that's right one of the things we're
learning from ai is
where we are smart and where we are not
smart yeah
you also enjoy as people can see
and i enjoyed myself uh watching you
sing
and even dance a little bit a little bit
a little bit a little bit of dancing
a little bit of dancing that's not quite
my thing as a as a method of education
or just in life you know in general
so easy question what's the
definitive objectively speaking top
three songs of all time
maybe something that you know uh
to walk that back a little bit maybe
something that
others might be surprised by the three
three songs that you kind of enjoy
that is a great question that i cannot
answer but instead let me tell you a
story so
pick a question you do want it that's
right i've been watching the
presidential debates and vice president
debates and turns out yeah it's really
you can just answer any question you
want
so so it's a related question
[Laughter]
yeah well said i really like pop music
i've enjoyed pop music ever since i was
very young so 60s music 70s music
80s music this is all awesome and then i
had kids and i think i stopped listening
to music and i was starting to realize
that the
like my musical taste had sort of frozen
out and so i decided
in 2011 i think to start listening to
the top 10
billboard songs each week so i'd be on
the on the treadmill and i would listen
to that week's top 10 songs
so i could find out what was popular now
and what i discovered
is that i have no musical taste
whatsoever i like what i'm familiar
with and so yeah the first time i'd hear
a song it's the first week that was on
the charts i'd be like
and then the second week i was into it a
little bit and the third week
i was loving it and by the fourth week
is like just part of me and so
i'm afraid that i can't tell you the
most my favorite song of all time
because it's whatever i heard most
recently yeah
that's interesting people have told me
that um there's an art to listening to
music as well
you can start to if you listen to a song
just carefully like
explicitly just force yourself to really
listen you start to
uh i did this when i was part of jazz
band and fusion band in college is
there's they you you start to hear the
layers
of the instruments you start to hear the
individual instruments and you start to
uh
you can listen to classical music or to
orchestra this way you can listen to
jazz this way i mean
uh it's funny to imagine you now to
walk in that forward to listening to pop
hits now as like a scholar
listening to like cardi b or something
like that or justin timberlake is he
no not temple like bieber i guess
they've both
been in the top 10 since i've been
listening they're still still up there
oh my god i'm so clueless
if you haven't heard justin timberlake's
top 10 in the last few years
there was one song that he did where the
music video was set
at essentially nurips oh wow oh
the one with the robotics yeah yeah yeah
yeah yeah yeah he's like at an academic
conference and he and he's doing it he
was
presenting it was sort of a cross
between the apple
like steve jobs kind of talk and nurips
um so i you know it's always fun when ai
shows up in pop culture i wonder if he
consulted somebody for that
that's very that's really interesting so
maybe on that topic i've seen your
um your celebrity multiple dimensions
but one of them is you've done
cameos in different places i've seen you
in a turbo tax commercial as like
i guess the the brilliant einstein
character
and the the point is that turbo tax
doesn't need
somebody like you doesn't need a
brilliant
very few things need someone like me but
yes they were specifically
emphasizing the idea that you don't need
to be a like a computer expert to be
able to use their software
how did you end up in that world i think
it's an interesting story so i was
teaching my class it was an intro
computer science class for
non-concentrators non-majors
and sometimes when people would visit
campus they would check in to say hey we
want to see what a class is like can we
sit on your class
so a person came to my class
who was the daughter of the brother
of the hus
husband of the best friend of my wife
anyway basically a family friend came to
campus to
to check out brown and asked to come to
my class and
and came with her dad her dad is uh
who i've known from various kinds of
family events and so forth but he also
does
advertising and he said that he was
recruiting
scientists for this this this ad this
this turbotax
set of ads and he said we wrote the ad
with the idea that we get like
the most brilliant researchers um but
they all said no
so can you help us find the like b
level scientists i'm like sure
that's that's who i hang out with so
that should be fine
so i put together a list and i did what
some people call the dick cheney so i
included myself on the list
of possible candidates uh you know with
a little blurb about each one and why i
thought
it would make sense for them to to do it
and they reached out to a handful of
them but then they ultimately they
youtube stalked me a little bit and they
thought
oh i think he could do this and um they
said okay we're gonna offer you the
commercial
i'm like what so um it was it was such
an interesting experience because it's
it's they have another world the people
who do
like nationwide kind of ad campaigns and
and television shows and movies and so
forth it's quite
a a remarkable system that they have
going because like a set
yeah so i went to uh it was just
somebody's house that they rented in new
jersey
um but it in the in the commercial it's
just me and this other woman
in reality there were 50 people in that
room and another
i don't know half a dozen kind of spread
out around the house in various ways
there were people whose job it was to
control the sun
they were in the backyard on ladders
putting
filters up to try to make sure that the
sun didn't glare off the window in a way
that would wreck the shot
so there was like six people out there
doing that there was three people out
there giving
snacks the craft table there was another
three people giving
healthy snacks because that was a
separate craft table there was one
person whose job it was
to keep me from getting lost and
the i think the reason for all this is
because so many people are in one place
at one time they have to be time
efficient they have to get it done
this the morning they were going to do
my commercial in the afternoon they were
going to do a commercial of a
mathematics professor from princeton
they had to get it done no you know no
wasted time or energy and so there's
just a fleet of people
all working as an organism and it was
fascinating i was just the whole time
just looking around like
this is so neat like one person whose
job it was to take the camera off of the
camera man
so that someone else whose job it was to
remove the film canister because every
couple's takes they had to replace the
film because you know
film gets used up it was just i don't
know i was
geeking out the whole time it was so fun
how many takes did it take it looked the
opposite like there was
more than two people there it was very
relaxing right yeah the super
i mean the person who i was in the scene
with um is a professional
she's a you know uh she's an actor
improv comedian okay in your community
and when i got there they had given me a
script as such as it was and then i got
there and they said
we're gonna do this as improv i'm like i
don't know how to improv like this is
not
i don't know what this i don't know what
you're telling me to do here
don't worry she knows okay okay we'll
see how this goes
i get i guess i got pulled into the
story because like where the heck did
you come from
i guess in the scene like how did you
show up in this random person's house
i don't know yeah well i mean the
reality of it is i stood outside in the
blazing sun there was someone whose job
it was to keep an umbrella over me
because i started to schvitz i started
to sweat
and so i would wreck the shot because my
face was all shiny with sweat so there
was one person who would dab me off
had an umbrella um but yeah like the
reality of it like
why is this strange stalkery person
hanging around outside somebody's house
yeah we're not we're not sure when you
have to look in we'll have to wait for
the book
but are you uh so you make you make like
you said youtube you make videos
yourself
you make awesome parody sort of uh
parody songs that kind of focus in on
particular aspects of computer science
how much those seem really natural
how much production value goes into that
do you also have a team of
50 people videos almost all the videos
except for the ones that people would
have actually seen
were just me i write the lyrics i sing
the song i i generally
find a um like a backing track online
because i'm unlike you can't really play
an instrument and then i do
in some cases i'll do visuals using just
like powerpoint
lots and lots of powerpoint to make it
sort of like an animation
the the most produced one is the one
that people might have seen which is the
overfitting video that i did with
charles isbell um
and that was produced by the georgia
tech and udacity people because we were
doing a class together it was kind of i
usually do parody songs
kind of to cap off a class at the end of
a class so that one
you're wearing so it's a this the
thriller yeah you're wearing the michael
jackson the red
leather jacket the interesting thing
with podcasting that you're also
uh into is that
i really enjoy is that there's not a
team of people
it's kind of more because you know the
the there's something that happens
when there's more people involved than
just one person
that just the way you start acting i
don't know
there's a censorship you're not given
especially for like slow thinkers like
me you're not
and i think most of us are if we're
trying to actually think
we're a little bit slow and and careful
it it kind of large teams get in the way
of that
and i don't know what to do with ice
like that's the to me
like if you know this it's very popular
to criticize quote unquote mainstream
media
i but there is legitimacy to criticizing
them the same i
love listening to npr for example but
every
it's clear that there's a team behind it
there's a commercial there's constant
commercial breaks there's this kind of
like rush of like
uh okay i have to interrupt you now
because we have to go to commercial just
this whole
it creates it destroys the possibility
of nuanced conversation
yeah exactly evian uh which
charles uh isabel who i i talked to
yesterday told me that
evian is naive backwards which the fact
that his mind thinks this way is just
uh it's quite brilliant anyway there's a
freedom to this podcast he's dr awkward
which by the way is a palindrome that's
a palindrome that i happen to know
for from other parts of my life and i
just you just throw it out
well you know use it against charles dr
awkward
so what uh what was the most challenging
parody song to make
was it the thriller one hmm no that was
really fun i
wrote the lyrics really quickly um and
then i gave it over to the product
production team they recruited a
a cappella group to to sing that went it
went really smoothly it's great having a
team because then you can just focus on
the part that you really love which in
my case is writing the lyrics
uh for me the most challenging one not
challenging in a bad way but challenging
in a really fun way
was i did one of this one of the parody
songs i did
is is about the halting problem in
computer science the the fact that
you can't create a program that can tell
for any other arbitrary program whether
it actually going to get stuck in
infinite loop or whether it's going to
eventually stop
and so i i did it to an 80s song
because that's i hadn't started my new
thing of learning current songs
and it was billy joel's the piano man
nice which is a great song great song
yeah yeah
and sing me a song you get the piano man
yeah yeah so the lyrics are great
because first of all it rhymes uh not
all songs rhyme i did i've done
rolling stone songs which turn out to
have no rhyme scheme whatsoever they're
just
sort of yelling and having a good time
which makes it not fun from a parody
perspective because like you can say
anything
but this you know the lines rhymed and
there was a lot of internal rhymes as
well
and so figuring out how to sing with
internal rhymes
a proof of the halting problem was
really challenging and
it was i really enjoyed that process
what about uh
last question on this topic what about
the dancing in the thriller video how
many takes that take
so i wasn't planning to dance they they
had me in the studio and they gave me
the jacket and it's like well you can't
if you have the jacket and the glove
like there's not much you can do yeah
so i um i think i just danced around
and then they said why don't you dance a
little bit we there was a scene with me
and charles dancing together
they did not use it in the video but we
recorded it um yeah yeah no it was
it was pretty funny and charles who has
this
beautiful wonderful voice doesn't really
sing he's not really a singer and so
that was why i designed the song with
him doing a spoken section and me doing
things very like barry white yeah it's a
smooth baritone
yeah yeah it's great that was awesome so
one of the other things charles said is
that you know
everyone knows you as like a super nice
guy super passionate about
teaching and so on uh what he said
i don't know if it's true that despite
the fact that you're
you are cold like okay
i will admit this finally for the first
time that was that was me
it's the johnny cash song the man in
reno just to watch him die
uh that you actually do have uh some
strong opinions on some topics
so if this in fact is true what
uh strong opinions would you say you
have is there ideas
you think maybe an artificial
intelligence machine learning
maybe in life that you believe is true
that others might
you know some number of people might
disagree with you on
so i try very hard to see things from
multiple perspectives
there's there's this great calvin and
harp's calvin and hobb's cartoon where
cal do you know okay so calvin's dad is
always kind of a bit of a foil and he
he was he talked to calvin and just
calvin had done something wrong
the dad talks him into like seeing it
from another perspective and calvin like
this breaks calvin because he's like oh
my gosh now i can see the opposite sides
of things and so the
it's it becomes like a cubist cartoon
where there is no front and back
everything's just exposed
and it really freaks him out and finally
he settles back down it's like oh good
no i can make that go away
but like i'm that i'm that i live in
that world where i'm trying to see
everything from every perspective all
the time so there are some things that
i've formed opinions about that i
would be harder i think to disavow me of
one is um the super intelligence
argument and the existential
threat of ai is one where i feel pretty
confident
in my feeling about that one like i'm
willing to hear other arguments but like
i am not particularly moved by the idea
that
if we're not careful we will
accidentally create a super intelligence
that will destroy
human life let's talk about that let's
get you in trouble and record your video
it's like bill gates uh i think he said
like
some quote about the internet that
that's just gonna be a small thing it's
not gonna really go anywhere
and i think uh steve ballmer said uh
i don't know why i'm sticking on
microsoft uh that's something
that like smartphones are useless
there's no reason why microsoft should
get into smartphones that kind of
so let's get let's talk about agi as agi
is destroying the world we'll look back
at this video and see
no uh i think it's really interesting to
actually talk about because nobody
really
knows the future so you have to use your
best intuition it's very
difficult to predict it but you have
spoken about agi
and the existential risks around it and
sort of
based on your intuition that we're
quite far away from that being a serious
concern relative to the other concepts
we have
can you maybe uh unpack that a little
bit yeah sure so
so as as i understand it that
uh for example i read boston's book and
a bunch of other
reading material about this sort of
general way of thinking about the world
and i think
the story goes something like this that
we will at
some point create computers that
are smart enough that they can help
design
the next version of themselves which
itself will be smarter than the previous
version of themselves and eventually
bootstrapped up to being smarter than
us at which point we are essentially at
the mercy of this sort of
more powerful intellect which in
principle
uh we don't have any control over what
its goals are and so if its goals
are at all out of sync with our goals
like the ex for example the continued
existence of humanity
we won't be able to stop it it'll be way
more powerful
than us and we will be toast so
there's some i don't know very smart
people who have signed on to that story
and it's a
it's a compelling story i once
now i can really get myself in trouble i
once wrote an op-ed about this
specifically responding to some quotes
from elon musk who has been
you know on this very podcast uh more
than once
and well the e-e-a-i's summoning the
demon that you get
i think he said but then he came to
providence rhode island which is where i
live
and said uh to the governors of all the
states
uh you know you're worried about
entirely the wrong thing you need to be
worried about ai you need to be very
very worried about ai so uh and peop
journalists kind of reacted to that and
they wanted to get people's people's
take and
i was like okay my my my belief
is that one of the things that makes
elon musk so successful and so
remarkable as an individual
is that he believes in the power of
ideas he believes that you can
have you can if you know if you have a
really good idea for getting into space
you can get into space if you have a
really good idea for a company or for
how to change the way that people drive
you just have to do it and
and it can happen it's really natural to
apply that same idea to ai you see
these systems that are doing some pretty
remarkable computational
tricks uh demonstrations and then to
take that idea and just push it
all the way to the limit and think okay
where does this go where is this going
to take us next
and if you're a deep believer in the
power of ideas
then it's really natural to believe that
those ideas could
be taken to the extreme and kill us
so i think you know his strength is also
his undoing because
that doesn't mean it's true like it
doesn't mean that that has to happen
but it's natural for him to think that
so
another way to phrase the way he thinks
and
i find it very difficult to argue with
that
line of thinking uh so sam harris is
another person
from neuroscience perspective that
things like that is
saying well is there something
fundamental
in the physics of the universe that
prevents this from eventually happening
and this nebosh from things in the same
way they're kind of zooming out
yeah okay we humans now uh are existing
in this
like time scale of minutes and days and
so our
intuition is in this time scale of
minutes hours and days
but if you look at the span of human
history
is there any reason we you
can't see this in in 100 years and
like is there is there something
fundamental about the laws of physics
that prevent this
and if it doesn't then it eventually
will happen or will
we will destroy ourselves in some other
way it's very difficult
i find to actually argue against that
yeah
me too and not sound like
not sound like you're just like rolling
your eyes uh i'm like i have
like science fiction we don't have to
think about it but even even
worse than that which is like i don't
know kids but like i gotta pick up my
kids now like this okay i see there's
more pressing shortcuts yeah there's
more pressing short-term things that
like
uh stop over this existential crisis
where much much shorter things like
now especially this year there's cova so
like any kind of discussion like that is
like there's there's p you know there's
pressing things
uh today it's it's and then so the sam
harris argument well like
any day the exponential singularity
can can occur it's very difficult to
argue against i mean i don't know but
part of his story is also
he's he's not going to put a date on it
it could be in a thousand years it could
be in 100 years it could be in two years
it's just that as long as we keep making
this kind of progress
it's ultimately has to become a concern
i i kind of am on board with that but
the thing that the the piece that i feel
like is missing from that
that way of extrapolating from the
moment that we're in
is that i believe that in the process of
actually developing technology that can
really get around in the world and
really process and and
and do things in the world in a
sophisticated way we're going to learn a
lot about
what that means which that we don't know
now because we don't know how to do this
right now
if you believe that you can just turn on
a deep learning network and eventually
give it enough compute and it'll
eventually get there well sure that
seems really scary because we won't we
won't be in the loop at all we want we
won't be helping to design or or target
these kinds of systems but i don't i
don't see
that that feels like it is against the
laws of physics because these systems
need help right they need
they need to surpass the the
the difficulty the wall of complexity
that happens in arranging something in
the form that
that will happen in yeah like i believe
in evolution like i believe that the
that that there's an argument right so
there's another argument just to look at
it from a different
perspective that people say well i don't
believe in evolution how could evolution
it's it's sort of like a random set of
parts
assemble themselves into a 747 and that
could just never happen
yeah so it's like okay that's maybe hard
to argue against but clearly
747s do get assembled they get assembled
by us basically the idea being that
there's a process by which we will get
to the par the point of making
technology that has that kind of
awareness and
in that process we're going to learn a
lot about that process and we'll have
more
ability to control it or to shape it or
to build it in our own image
it's not something that is going to
spring into existence like that 747
and we're just gonna have to contend
with it completely unprepared
it's very possible that in the context
of the long arc of human history it will
in fact spring into existence
but that springing might take like if
you look at nuclear weapons
like even 20 years is a springing
in in the context of human history and
it's very possible just like with
nuclear weapons that we could have
i don't know what percentage you want to
put at it but the the possibility
could have knocked ourselves out yeah
the possibility of human beings
destroying themselves in the 20th
century
with nuclear weapons i don't know you
can if you really think through it
you could really put it close to like i
don't know 30 40 percent
given like the certain moments of crisis
that happen
so like i think one
like fear in the shadows that's not
being acknowledged
is it's not so much the ai will run away
is
is that as it's running away we won't
have enough time to uh think through how
to stop it
right fast takeoff or foom yeah i mean
my
much bigger concern i wonder what you
think about it which is
we won't know it's happening
so i kind of that argument i think that
there is an
agi situation already happening with
social media
that our minds our collective
intelligence of human civilization is
already being controlled by an algorithm
and like we're we're already super
like the the level of a collective
intelligence thanks to wikipedia people
should donate to wikipedia
to feed the agi man if we had a super
intelligence that
that was in line with wikipedia's values
that it's a lot better than a lot of
other things i can imagine i've i trust
wikipedia more than i trust facebook or
youtube
as far as trying to do the right thing
from a rational perspective
yeah now that's not where you were going
i understand that but it it it does
strike me that there's sort of
smarter and less smart ways of of
exposing ourselves to each other on the
internet yeah the interesting thing is
that wikipedia
and social media have very different
forces you're right i mean wikipedia if
if agi was wikipedia it'd be just like
this
cranky overly competent editor
of uh articles uh you know there's
there's something to that but the social
media aspect is is is not
so the vision of agis is as a separate
system
that's super intelligent that's super
intelligent that's one key little thing
i mean there's the paper clip argument
that's super dumb
but super powerful systems but with
social media you have
a relatively like algorithms we may talk
about today
very simple algorithms that when
uh something charles talks a lot about
which is interactive ai when they start
like
having at scale like tiny little
interactions with human beings
they can start controlling these human
beings so a single algorithm
can control the minds of human beings
slowly to what we might not
realize it could start wars it could
start it can change the way we
think about things it feels like in the
long arc of history
if i were to sort of zoom out from all
the outrage and all the tension on
social media
that it's progressing us towards uh
better and better things
it feels like chaos and toxic and all
that kind of stuff but it's chaos and
toxic
yeah but it feels like actually the
chaos and toxic is similar to the kind
of debates we had
from the founding of this country you
know there was a civil war that happened
over that over that period and
ultimately it was all about
this tension of like something doesn't
feel right about
our implementation of the core values we
hold as human beings and they're
constantly struggling with this
and that results in people calling each
other
uh like just just being shitty to each
other on twitter
but i ultimately the algorithm is
managing all that and it feels like
there's a possible future in which that
algorithm
controls us to into the direction of
self-destruction
whatever that looks like yeah so so all
right i do believe in the power of
social media to
screw us up royally i do believe in the
power of social media to benefit us too
i do think that we're in a
yeah it's sort of almost got dropped on
top of us and now we're trying to as a
culture figure out how to cope with it
there's a sense in which i don't know
there's there's some arguments that say
that for example
i guess college-age students now late
college-age students now people who are
in middle school when when social media
started to really take off
maybe maybe really damaged like me this
may have really hurt their development
in a way that we don't
have all the implications of quite yet
that's the generation who
if and i hate to make it somebody else's
responsibility but like they're the ones
who can fix it they're the ones who can
who can figure out
how do we keep the good of this kind of
technology without
letting it eat us alive and
if they're successful we move on to the
next phase the next level of the game
if they're not successful then yeah then
we're going to wreck each other we're
going to
destroy society so you're going to in
your old age sit on the porch and watch
the world burn
because the tick tock generation that uh
i believe well so my this is my kids age
right and that's certainly my daughter's
age and she's very tapped in
to social stuff but she's also she's
trying to find that balance right of
participating in it and then getting the
positives of it but without letting it
eat her alive um and i think sometimes
she ventures
hopes just to watch this sometimes i
think she ventures a little too far and
is
in and is consumed by it and other times
she gets a little distance
um and if you know if there's enough
people like her out there they're gonna
they're gonna navigate this this choppy
waters that's that's an
interesting uh skill actually to develop
i talked to my dad about it
you know i've uh now somehow
this podcast in particular but other
reasons
has received a little bit of attention
and with that apparently in this world
even though
i don't shut up about love and i'm just
all about kindness
i i have now a little mini army of
trolls
oh it's kind of hilarious actually but
it also doesn't feel good
but it's a skill to learn
to not look at that like to moderate
actually how much you look at that
the discussion i have with my dad is
similar to uh it doesn't have to be
about trolls it could be about checking
email
which is like if you're anticipating you
know there's uh my dad
runs a large institute at drexel
university and
there could be stressful like emails
you're waiting like there's drama of
some kind
and so like there's a temptation to
check the email if you send an email you
cut it
and that pulls you in into it doesn't
feel good
and it's a skill that he actually
complains that he hasn't learned i mean
he
grew up without it so he hasn't learned
the skill of how to
shut off the internet and walk away and
i think young people
while they're also being quote-unquote
damaged by like
uh you know being bullied online all
those stories which are very
like horrific you basically can't escape
your bullies
these days when you're growing up but at
the same time they're also learning that
skill of how to
be able to shut off uh the like
disconnect with it be able to laugh at
it not take it too seriously
it's fascinating like we're all trying
to figure this out just like you said
it's
been dropped on us and we're trying to
figure it out yeah i think that's really
interesting and i
i guess i've become a believer in the
human design
which i feel like i don't completely
understand like how do you make
something as robust as us like we're
so flawed in so many ways and yet and
yet
you know we dominate the planet and we
do seem to manage to get ourselves out
of scrapes
eventually not necessarily the most
elegant possible way but somehow we get
we get to the next step and i don't know
how i'd make a
machine do that i i i
generally speaking like if i train one
of my reinforcement learning agents to
play a video game and it works really
hard on that first stage over and over
and over again and it makes it through
it succeeds on that first level
and then the new level comes and it's
just like okay i'm back to the drawing
board and somehow humanity we keep
leveling up
and then somehow managing to put
together the skills necessary to
achieve success some semblance of
success in that next level too
and you know i hope we can keep doing
that
you mentioned reinforcement learning so
you've have uh
a couple years in the field no quite
you know quite a few quite a long career
in artificial intelligence broadly but
reinforcement learning specifically
can you maybe give a hint about your
sense
of the history of the field and in some
ways has changed with the
advent of deep learning but has a long
roots like how is it
weaved in and out of your own life how
have you seen the community change or
maybe the ideas that it's playing with
change
i've had the privilege the pleasure of
being
of having almost a front row seat to a
lot of this stuff and it's been really
really fun and interesting so uh when i
was in college in the 80s
early 80s uh the neural net
thing was starting to happen and i was
taking a lot of psychology classes a lot
of computer science classes
as a college student and i thought you
know something that can play tic-tac-toe
and just like learn to get better at it
that ought to be a really easy thing so
i spent almost
almost all of my what would have been
vacations during college
like hacking on my home computer trying
to teach it how to play tic-tac-toe and
programming language
basic oh yeah that's that's i was i
that's my first language that's my
native language
is that when you first fell in love with
computer science just like programming
basic on that
uh what was the computer do you remember
i had i had a trs-80
model one before they were called model
ones because there was nothing else uh
i got my computer in 1979
uh instead so i was i was i would have
been bar mitzvahed
but instead of having a big party that
my parents threw on my behalf
they just got me a computer because
that's what i really really really
wanted i saw him in the in the
in the mall in radio shack and i thought
what how are they doing that i would try
to stump them i would give them math
problems like
one plus and then in parentheses two
plus one yeah and i would always get it
right i'm like
how do you know so much message like
i've had to go to algebra class for the
last few years to learn this stuff and
you just seem to know
so i was i was i was smitten and i got a
computer and i think ages
13 to 15
i have no memory of those years i think
i just was in my room with the computer
listening to billy joel communing
possibly listening to the radio
listening to billy joel
that was the one album i had uh on vinyl
at that time
and um and then i got it on cassette
tape and that was really helpful
because then i could play it i didn't
have to go down to my parents wi-fi or
hi-fi
sorry uh and at age 15 i remember kind
of walking out and like okay
i'm ready to talk to people again like
i've learned what i need to learn here
and um so yeah so so that was that was
my home computer and so i went to
college and i was like oh i'm totally
going to study computer science
i opted the college i chose specifically
had a computer science major
the one that i really wanted the college
i really wanted to go to didn't so
bye-bye to them which college did you go
through so i went to yale
uh princeton would have been way more
convenient and it was just beautiful
campus and it was close enough to home
and i was really excited about princeton
and i visited
i said so computer science major like
well we have computer engineering i'm
like oh i don't like that word
engineering
i like if you're science i really i want
to do like you're saying hardware and
software they're like yeah like i just
want to do software i
i couldn't care less about hardware you
grew up in philadelphia i grew up
outside philly yeah yeah okay
uh so the you know local schools were
like penn and drexel
and uh temple like everyone in my family
went to temple at least at one point in
their lives except for me
so yeah philly philly family yale had a
computer science department and that's
when you
it's kind of interesting you said 80s
and you're all that works that's when
you know that which is a hot new thing
or a hot
thing period uh so what is that in
college when you first learned about
neural networks yeah
yeah was she learned like it was in a
psychology class not in a cs wow
yeah was it psychology or cognitive
science or like do you remember like
what context it was yeah yeah yeah so so
i was a
i've always been a bit of a cognitive
psychology groupie
so like i studied computer science but i
like i like to hang around where the
cognitive
scientists are because i don't know
brains man they're like
they're wacky cool and they have a
bigger picture view of things they're a
little less
engineery i would say they're more
they're more interested in the
nature of cognition and intelligence and
perception it's called like the vision
system work
they're asking always bigger questions
now with
the deep learning community there i
think more there's a lot of
intersections but i do find in
that the neuroscience folks actually
and uh cognitive psychology cognitive
science folks
are starting to learn how to program how
to use your own artificial neural
networks
and they are actually approaching
problems in like totally new interesting
ways
it's fun to watch that grad students
from those departments
like approach the problem of machine
learning right they come in with a
different perspective yeah they don't
care about like your
imagine that data set or whatever they
they want like to understand the
the like the basic mechanisms
at the at the neuronal level and the
functional level of intelligence it's
kind of
it's kind of cool to see them work but
yeah okay so
you always you're always a group you
have cognitive psychology
yeah yeah and so uh so it was in a class
by richard garrick he was kind of my
my favorite uh psych professor in
college and i took uh like three
different classes with him
and yeah so that we they were talking
specifically the class i think was kind
of a
there was a big paper that was written
by stephen pinker and
uh prince i don't i'm blanking on
prince's first name but prince and
pinker and prince
they wrote kind of a they were at that
time
kind of like ah i'm blanking on the
names of the current people
um the cognitive scientists who are
complaining a lot about deep networks
oh uh gary gary marcus sorry marcus
and who else i mean there's a few but
gary gary's the most feisty
sure gary's very feisty and with this
with his co-author they they you know
they're kind of
doing these kind of takedowns where they
say okay well yeah it does all these
amazing amazing things but
here's a shortcoming here's a
shortcoming here's your shortcoming and
so the pinker prince paper
is kind of like the that generation's
version of
marcus and davis right where they're
they're trained as cognitive scientists
but they're looking skeptically at the
results in the
in the artificial intelligence neural
net kind of world and saying
yeah it can do this and this and this
but like it can't do that and it can't
do that and it can't do that
maybe in principle or maybe just in
practice at this point but but the fact
of the matter is
you're you've narrowed your focus too
far
to be impressed you know you're
impressed with the things within that
circle
but you need to broaden that circle a
little bit you need to look at a wider
set of problems
and so um so we have so i was in this
seminar in college
that was basically a close reading of
the pinker prince paper
which was like really thick there was a
lot going on in there
and um and and it talked about
the reinforcement learning idea a little
bit i'm like oh that sounds really cool
because behavior is what is really
interesting to me about
psychology anyway so making programs
that i mean programs are things that
behave
people are things that behave like i
want to make learning that learns to
behave
in which way was reinforcement learning
presented is this uh talking about
human and animal behavior or are we
talking about actual mathematical
constructs ah that's
right so that's a good question right so
this is i think
it wasn't actually talked about as
behavior in the paper that i was reading
i think that it just talked about
learning and to me learning is about
learning to behave but really
neural nets at that point were about
learning like supervised learning so
learning to produce outputs from inputs
so i kind of tried to invent
reinforcement learning
i uh when i graduated i joined a
research group at
bellcore which had spun out of bell abs
recently at that time because of the
divestiture of the
of long distance and local phone service
in the 1980s 1984
and i was in a group uh with dave ackley
who
was the first author of the boltzmann
machine paper so the very first neural
net paper that could handle
xor right so xor sort of killed neural
nets the very first the zero with the
first winter
yeah um the the perceptron's paper
and hinton along with his student dave
ackley and and i think there was other
authors as well
showed that no no with both machines we
can actually learn
non-linear concepts and so everything's
back on the table again and that kind of
started that second wave of neural
networks
so dave ackley was he became my mentor
at bellcore and we
talked a lot about learning and life and
computation and how all these things fit
together
now dave and i have a podcast together
so um so i get to
kind of enjoy that sort of
his his perspective uh once again even
even all these years later
and so i said so i said i was really
interested in learning but
in the concept of behavior and he's like
oh well that's reinforcement learning
here and he gave me rich sutton's 1984
td paper
so i read that paper i honestly didn't
get all of it
but i got the idea i got that they were
using that he was using ideas
that i was familiar with in the context
of neural nets and and
like sort of backprop uh but with this
idea of making predictions over time i'm
like this is so interesting but i don't
really get all the details i said
to dave and dave said oh well why don't
we have him come and give a talk
and i was like wait what you can do that
like these are real people
i thought they were just words i thought
it was just like ideas that somehow
magically seeped into paper he's like no
i
i i know rich like we'll just have him
come down and and he'll give a talk
and so i was you know my mind was blown
and uh so rich came and he gave a talk
at bellcore
and he talked about what he was super
excited which was they had just figured
out at the time
uh q learning so uh watkins had visited
the rich sutton's lab at umass or
it's andy barto's lab that rich was a
part of
and um he was really excited about this
because it resolved a whole bunch of
problems that he didn't know how to
resolve in the
in the earlier paper and so uh
for people who don't know td temporal
difference these are all just algorithms
for reinforcement learning
right and td separate difference in
particular is about making predictions
over time
and you can try to use it for making
decisions right because if you can
predict how good a future action
and action outcomes will be in the
future you can choose one that has
better and or
but the theory didn't really support
changing your behavior like the
predictions had to be of a consistent
process if you really
wanted it to work and one of the things
that was really cool about
q-learning algorithm for reinforcement
learning is it was off policy which
meant that you could actually be
learning about the environment and what
the value of different actions would be
while actually figuring out how to
behave
optimally yeah so that was a revelation
yeah and the proof of that is kind of
interesting
i mean that's really surprising to me
when i first read that and then enriched
rich sutton's book on the matter it's
it's kind of
beautiful that a single equation can
capture an equation one line of code and
like you can learn anything
yeah like enough time so equation and
code you're right like
you can the code that
you can arguably at least if you like
squint your eyes
can say this is all of intelligence
is that you can implement that in a
single wall i think i started with lisp
which is uh
shout out to lisp uh like a single line
of code
key piece of code maybe a couple that
you could do that it's kind of magical
it's uh feels too good to be true
well and it sort of is yeah it's kind of
kind of
it seems to require an awful lot of
extra stuff supporting it but
yeah but nonetheless the ideas the the
idea is really good and as far as we
know it is it is
a very reasonable way of trying to
create adaptive behavior
behavior that gets better at something
over time
did you find the idea of optimal uh at
all compelling that
you could prove that it's optimal so
like one part of computer science
that it makes people feel warm and fuzzy
inside
is when you can prove something like
that a sorting algorithm worst case
runs and and log n and it makes
everybody feel so good
even though in reality it doesn't really
matter what the worst case is what
matters is like
does this thing actually work in
practice on this particular actual set
of data that i
that i enjoy did you so here's that
here's a place where i have maybe a
strong opinion uh-oh which is like
you're right of course but no no like so
so the what makes worst case so great
right if you have a worst case analysis
so great is that you get modularity
you can take that thing and plug it into
another thing and still
have some understanding of what's going
to happen when you click them together
right if it just works well in practice
in other words with respect to some
distribution that you care about
when you go plug it into another thing
that distribution can shift
it can change and your thing may not
work well anymore and you want it to
and you wish it does and you hope that
it will but it might not and then
ah so you're so so you're saying you
don't like
machine learning
but we have some positive theoretical
results for these things
you know you can come back at me with
yeah but they're really weak and yeah
they're really weak and and you can even
say that
you know sorting algorithms like if you
do the optimal sorting algorithm it's
not really the one that you want
and that might be true as well but but
it is the modularity is a really
powerful statement
really as an engineer you can then
assemble different things you can count
on them to be
i mean it's interesting it's it's a
balance
like with everything else in life you
don't want to get too obsessed i mean
this is what computer scientists do
which they
potentially get obsessed they over
optimize things
or they start by optimizing them they
over optimize yeah so it's
it's easy to like get really granular
about this thing
but like the step from an n squared
to an n log n sorting algorithm is a big
leap
for most real-world systems no matter
what the actual
behavior of the system is that's a big
leap and the same can probably be
said for other kind of first
leaps that you would take on a
particular problem like it's the
picking the low hanging fruit or
whatever the equivalent of
doing the not the dumbest thing but the
next to the dumbest thing
is picking the most delicious reachable
fruit yeah
most delicious reachable fruit i don't
know why that's not a saying and
yeah okay so uh so you
then this is the 80s and this kind of
idea starts to percolate of uh
yeah at that point i got to re i got to
meet rich sutton so everything was sort
of downhill from there and that was that
was really the pinnacle of everything
um but then i you know then i felt like
i was kind of on the inside so then as
interesting results were happening i
could like check in with with
rich or with jerry tessaro who had a
huge impact on
uh kind of early thinking in in temporal
difference learning and reinforcement
learning and showed that you could do
you could solve problems that we didn't
know how to solve any other way
and so that was really cool so as good
things were happening i would hear about
it from
either the people who were doing it or
the people who were talking to the
people who are doing it
and so i was able to track things pretty
well through through the 90s
so what uh wasn't most of the excitement
on reinforcement learning in the 90s
era with what is it td gamma like
what's the role of these kind of little
like
fun game playing things and
breakthroughs about uh
get you know exciting the community was
that like
what were your because uh you've also
built across or
we're part of building a crossword a
puzzle
uh solver program yeah solving program
uh called proverb so
so you were interested in this as as a
problem like in forming
using games to understand how to build
uh intelligent systems so like what did
you think about tt gamble like what did
you think about that whole thing in the
90s yeah i mean i found the td gammon
result really just remarkable so i had
known about some of jerry's stuff before
he did td gammon he did a system
just more vanilla well not not entirely
vanilla but a more classical backproppy
kind of
uh network for playing back ammon where
he was training it on
expert moves so it was kind of
supervised but the way that it worked
was not
to mimic the actions but to learn
internally an evaluation function so to
learn
well if the expert chose this over this
that must mean that the expert values
this
more than this and so let me adjust my
weights to make it so that
the network evaluates this as being
better than this
so it could learn from from human
preferences
it could learn its own preferences and
then when he took the step from that to
actually
doing it as a full-on reinforcement
learning problem where you didn't need
a trainer you could just let it play
that was that was remarkable right and
so i think
as as humans often do as we've done
in the recent past as well people
extrapolate it's like oh well if you can
do that which is obviously very hard
then obviously you could do all these
other problems that we
that we want to solve that we know are
also really hard
and it turned out very few of them ended
up being practical
partly because i think neural nets
certainly at the time were struggling to
be
consistent and reliable and so training
them in a reinforcement learning setting
was a bit of a mess
i had i don't know generation after
generation of
like master students who wanted to do
value function approximation basically
learn reinforcement learning
with neural nets and
over and over and over again we were
failing we couldn't get
the good results that jerry tessaro got
i now believe that jerry is a neural net
whisperer
he has a particular ability to get
neural networks to do things
that other people would find impossible
and it's not
the technology it's the technology and
jerry together
yeah and which i think speaks to
the role of the human expert in the
process of machine learning
right it's so easy we're so drawn to the
idea that that it's the technology that
is that
is where the power is coming from that i
think we lose sight of the fact that
sometimes you need a really good
just like i mean no one would think hey
here's this great piece of software
here's like i don't know gnu emacs or
whatever
um doesn't that prove that computers are
super powerful and
basically going to take over the world
it's like no stallman is a hell of a
hacker right so he was able to make
the code do these amazing things he
couldn't have done it without the
computer but the computer couldn't have
done it without him
and so i think people discount the role
of people like jerry who
who um who have just a particular
particular set of skills on that topic
by the way
as a small side note i tweeted emacs is
greater than vim yesterday
and deleted deleted the tweet 10 minutes
later when i realized
you're you were honest i started a war
yeah i was like oh
i was just kidding i i was just being
um walk so people still feel
passionately
about that particular piece of uh i
don't get that because emacs is clearly
so much better
i i don't understand but you know why do
i say that because i cause
like i spent a block of time in the 80s
um making my fingers know the emacs
keys and now like that's part of the
thought process for me like i need to
express and if you take that if you take
my emacs key bindings away
i become little
i can't express myself i'm the same way
with the i don't know if you know what
what it is but it's a kinesis keyboard
which is uh this
butt shaped keyboard yes i've seen them
yeah and
they're very uh i don't know sexy
elegant yeah they're
just beautiful yeah they're they're
gorgeous uh way too expensive
but uh the the problem with them similar
with emacs
is when once you learn to use it
it's harder to use other things it's
hard to use other things there's this
absurd thing where i have like
small elegant lightweight beautiful
little laptops
and i'm sitting there in a coffee shop
with a giant kinesis keyboard
and a sexy little laptop it's absurd but
it you know like i used to feel bad
about it but at the same time you just
kind of have to
sometimes it's back to the billy joel
thing you just have to throw that billy
joe record and
throw taylor swift and justin bieber to
the wind
so see but i like them now because i
cause again i have no musical taste like
like now that i've heard justin bieber
enough i'm like i really like his songs
and
taylor swift not only do i like her
songs but my daughter's convinced that
she's a genius and so now i basically
have i'm signed on to that so
so yeah that that speaks to the back to
the robustness of the human brain that
speaks to the neuroplasticity that you
can just
you can you can just like a mouse teach
yourself to a problem
dog teach yourself to enjoy taylor swift
i'll try it out
i don't know i try you know what it
has to do with just like acclimation
right just like you said a couple weeks
yeah that's an interesting experiment
i'll actually try that like i'll listen
that wasn't the intent of the experiment
just like social media it wasn't
intended as an experiment
to see what we can take as a society but
it turned out that way
i don't think i'll be the same person on
the other side of the week listening to
taylor swift but
let's try it it's more compartmental
don't be so worried like it's
like i get that you can be worried but
don't be so worried because we
compartmentalize really well and so
it won't bleed into other parts of your
life you won't start i don't know
wearing red lipstick or whatever like
it's it's fine it's changed fashion and
everything
but you know what the the thing you have
to watch out for is you'll walk into a
coffee shop once we can do that again
and recognize the song and you'll be no
you won't know that you're singing along
until everybody in the coffee shop is
looking at you and then you're like
that wasn't me yeah that's the
you know people are afraid of agi i'm
afraid of the taylor uh
the tail taylor swift takeover yeah and
i mean people should know that td gammon
was i get would you call it do you like
the terminology of self
play by any chance so like systems that
learn
by playing themselves just i don't know
if it's the best word
but uh so what's what's the problem with
that term
okay so it's like the big bang like
it's it's like talking to serious
physicists do you like the term big bang
and
when when it was early i feel like it's
the early days of self-play i don't know
maybe it was just previously but
i think it's been used by only a small
group of people
uh and so like i think we're still
deciding is this
ridiculously silly name a good name for
the
cons potentially one of the most
important concepts in artificial
intelligence
okay it depends how broadly you apply
the term so i used the term in my 1996
phd dissertation
wow the actual terms of yeah because
because
tessaro's paper was something like um
training up an expert
backgammon player through self-play so i
think it was in the title of his paper
okay if not in the title it was
definitely a term that he used
there's another term that we got from
that work is rollout so i don't know if
you do you ever hear the term rollout
that's a backgammon term that has now
applied
generally in computers well at least in
ai
because of td gammon yeah that's
fascinating so how is health play being
used now and like why is it does it does
it feel like a more general powerful
concept
sort of the idea of well the machine's
just going to teach itself to be smart
yeah so that's that's where maybe you
can correct me
but that's where you know the
continuation of the spirit and actually
like
literally the exact algorithms of td
gammon are applied
by deep mind and open ai to learn games
that are a little bit more complex
that when i was learning artificial
intelligence go was presented to me
with artificial intelligence the modern
approach i don't know if they explicitly
pointed to go
in those books as like unsolvable kind
of thing
like implying that these approaches hit
their limit
in this with these particular kind of
games so something
i don't remember if the book said it or
not but something in my head
or was the professors instilled in me
the idea like this is the limits of
artificial intelligence
of the field like it instilled in me the
idea that
if we can create a system that can solve
the game of go
we've achieved agi that was kind of i
didn't explicitly like
say this but it that was the feeling and
so from i was one of the people that it
seemed
magical when a learning system was able
to
to beat a uh a human
world champion at the game of go and
even more so
from that that was alphago even more so
with alphago zero then
kind of renamed and advanced into alpha
zero
beating a world champion or world-class
player
without any supervisors learning on
expert games
we're doing only through by playing
itself
so that is
i don't know what to make of it i think
it would be interesting to hear
what your opinions are on just how
exciting
surprising profound
interesting or boring the
breakthrough performance of alpha zero
was
okay so alphago knocked my socks off
that was that was so remarkable which
aspect of it
that they they got it to work that they
actually were able to
leverage a whole bunch of different
ideas integrate them into one
giant system just the software
engineering aspect of it is mind-blowing
i don't
i i've never been a part of a program as
complicated as the program that they
built for that
and um and just the you know like like
jerry tessaro is a neural net whisperer
like you know david silver is a kind of
neural net whisperer too he was able to
coax these networks
and these new way out their
architectures to do
these you know solve these problems that
um as you said
you know when we were learning from uh
ai
no one had an idea how to make it work
it was it was remarkable
that um these you know these these
techniques that were so good at playing
chess and they could beat the world
champion in chess
couldn't beat you know your typical go
playing teenager and go
so the fact that that you know in a very
short number of years we kind of ramped
up to
uh trouncing people and go just
blew me away so you're kind of focusing
on the engineering
aspect which is also very surprising i
mean there's something
different about large well-funded
companies i mean there's a compute
aspect to it too
sure like that of course
i mean that's similar to deep blue right
with uh with ibm
like there's something important to be
learned and remembered about
a large company taking the ideas that
are
already out there and investing a few
million dollars into it
or or more and
so you're kind of saying the engineering
is kind of fascinating both on the
with alphago is probably just gathering
all the data
right of the expert games like
organizing everything
actually doing distributed supervised
learning
and to me
see the engineering i kind of took for
granted
to me philosophically being able to
persist in the
in the face of like long odds because it
feels like
for me i'll be one of the skeptical
people in the room thinking that you can
learn your way to
to beat go like it sounded like
especially with david silver it sounded
like david was not
confident at all it's like it was like
not it's funny how confidence works
yeah it's like you're not like cocky
about it
like but right because if you're cocky
about it you
kind of stop and stall and don't get
anywhere yeah but there's like a
hope that's unbreakable maybe that's
better than confidence it's a kind of
wishful
hope and a little dream and you almost
don't want to do anything else you kind
of
keep doing it that's that seems to be
the story
and but with enough skepticism that
you're looking for where the problems
are and fighting
through them yeah because you know
there's got to be a way out of this
thing yeah
and for him it was probably there's
there's a bunch of little factors that
come into play it's funny how these
stories just all come together like
everything he did in his life
came into play which is like a love for
video games
and also a connection to so the the 90s
had to happen with td gammon and so on
yeah
in some ways it's surprising maybe you
can provide some intuition to it
that not much more than td gammon was
done for quite a long time
on the reinforcement learning front yeah
is that weird to you
i mean like i said the the students who
i worked with we tried to get
basically apply that architecture to
other problems and
we consistently failed there were a
couple
a couple really nice demonstrations that
ended up being in the literature there
was a
paper about controlling elevators right
where it's it's like okay
can we modify the heuristic that
elevators use for deciding like a bank
of elevators for deciding
which floors we should be stopping on to
maximize throughput essentially and you
can set that up as a reinforcement
learning problem and you can
you know have a neural net represent the
value function so that it's taking
where all the elevators where the button
pushes you know this high dimensional
well at the time high dimensional input
um you know a couple dozen dimensions
and turn that into a prediction as to oh
is it going to be better if i stop at
this floor or not
and ultimately it appeared as though for
the
standard simulation distribution for
people trying to leave the building at
the end of the day
that the neural net learned a better
strategy than the standard one that's
implemented in
elevator controllers so that that was
nice
there was some work that satender singh
it all did on
uh handoffs with cell phones
uh you know deciding when when should
you hand off from this cell tower to
this cell
okay communication networks yeah yeah
and so
a couple things seemed like they were
really promising none of them made it
into production that i'm aware of
and neural nets as a whole started to
kind of implode around then
and so there just wasn't a lot of air in
the room for people to try to figure out
okay how do we get this to work
in the rl setting and then they they
found their way back in
in 10 in 10 plus years so you said
alphago was impressive like it's a big
spectacle is there right so then alpha
zero so i think i may have a slightly
different opinion
on this than some people so um i talked
to tinder saying in particular about
this so satinder was
uh like rich sutton a student of
antibartow so they came out of the same
lab
very influential machine learning
reinforcement learning researcher
uh now deep mind uh as just as is rich
though different sites the two of them
he's in alberta
rich is in alberta and uh satinder would
be in england but i think he's in
england from michigan at the moment
uh but the but he was yes he was much
more impressed with
uh alphago zero
which is didn't didn't get a kind of a
bootstrap in the beginning with human
trained games yes just was purely
self-play
though the first one alpha go was also a
tremendous amount of self-play
right they started off they kick-started
the the action network that was making
decisions
but then they trained it for a really
long time using more traditional
temporal difference methods
um so so as a result i didn't it didn't
seem
that different to me like it seems like
yeah
why wouldn't that work like once once it
works it works so
but he he found that that removal of
that extra information to be
breathtaking like that that's a game
changer to me the first thing was more
of a game changer
but the open question i mean i guess
that's the assumption
is the expert games might contain with
them
within them a
humongous amount of information but we
know that it went beyond that
right we know that it somehow got away
from that information because it was
learning strategies i don't think it
i don't think alphago is just better at
implementing human strategies i think it
actually developed its own strategies
that were
that was more effective and so from that
perspective
okay well so it made at least one
quantum leap in terms of strategic
knowledge
okay so now maybe it makes three like
okay but that first one is the doozy
right
getting it to to to work reliably and
and for the networks to to hold on to
the value well enough like that was
that was a big step well isn't maybe you
could speak to this on the reinforcement
learning front so the
starting from scratch and learning to do
something like the first like
like random behavior to like
crappy behavior to like somewhat okay
behavior
it's not obvious to me that that's not
like impossible to take those steps like
if you just think about the intuition
like how the heck does
random behavior become somewhat
basic intelligent behavior not not human
level not super human level
but just basic but you're saying to you
kind of the intuition is like if
if you can go from human to superhuman
level intelligence on the
uh on this particular task of game
playing then
so you're good at taking leaps so you
can take many of them
that the system i believe that the
system can take that kind of leap
yeah no and also i think that that
beginner knowledge
in go like you can start to get a feel
really quickly
for the idea that um you know certain
parts of the being in certain parts of
the board seems to be
more associated with winning right
because it's not
it's not stumbling upon the concept of
winning it's told that it wins
or that it loses well it's self-play so
it both wins and loses it's told
which which side won and the information
is kind of there to start
percolating around to make a difference
as to
um well these things have a better
chance of helping you win and these
things have a worse chance of helping
you win and so
you know it can get to basic play i
think pretty quickly
then once it has basic play well now
it's kind of forced to do some search to
actually experiment with okay well what
gets me that next
increment of of improvement how far do
you think
okay this is where you kind of bring up
the the elon musk and the sam harris is
right
how far is your intuition about these
kinds of self-playing mechanisms being
able to take us
because it feels one of the
ominous but stated
calmly things that when i talked to
david silver he said
is that they have not yet discovered a
ceiling
for alpha zero for example in the game
of go or chess
it's it keeps no matter how much the
compute they throw at it it keeps
improving
so it's possible it's very possible that
you
if you throw you know some like 10x
compute that it will improve by 5x or
something like that
and when stated calmly it's so like
oh yeah i guess so but like and then you
think like well
can we potentially have like uh
continuations of moore's law in totally
different way like broadly defined
moore's law right not the constitutional
improvement exponential improvement like
are we going to have an alpha zero that
swallows the world
uh but notice it's not getting better at
other things it's getting better at
go yeah and i think it's a that's a big
leap to say
okay well therefore it's better at other
things
well i mean the the question is how much
of the game of life
can be turned into right so that's of
that i think is a really good question
and i think that we don't i don't think
we as a
i don't know community really know that
the answer to this but
um so okay so so i went i went to a talk
uh by some experts on
computer chess so in particular computer
chess is really interesting because
for you know for of course for a
thousand years humans were the best
chess playing
things on the planet um and then
computers like edge to head of the best
person and they've been ahead ever since
it's not like people have
have overtaken computers but um
but computers and people together have
overtaken computers
right so at least last time i checked i
don't know what the very latest is but
last time i checked
that there were teams of people who
could work with computer programs to
defeat the best computer programs
in the game of go in the game of chess
in the game of chess right and so
using the information about how
these things called elo scores this sort
of notion of how strong a player are you
there's a there's kind of a range of
possible scores and the
you you increment and score basically if
you can
beat another player of that lower score
62 percent of the time or something like
that like there's some threshold of
if you can somewhat consistently beat
someone then you are
of a higher score than that person and
there's a question as to how many times
can you do that in chess
right and so we know that there's a
range of human ability levels that
cap out with the best playing humans and
the computers went a step beyond that
and computers and people together have
not gone i think a full step beyond that
it feels the estimates that they have is
that it's starting to asymptote
that we've reached kind of the maximum
the best possible
chess playing and so that means that
there's kind of a
finite strategic depth right at some
point you just can't get any better at
this game
yeah i mean i i don't uh so i like to
check that
uh i think it's interesting because if
you have
somebody like uh magnus carlsen who's
using these chess programs to train his
mind like to learn
to become a better chess player yeah and
so like that's a very
interesting thing because we're not
static creatures we're learning together
i mean just like we're talking about
social networks those algorithms are
teaching us just like we're teaching
those algorithms
so that's a fascinating thing but i
think
the best just playing programs are now
better than the pairs like they have
competition between
paris but the it's still even if they
weren't
it's an interesting question where's the
ceiling so the the david the ominous
david silver
kind of statement is like we have not
found the ceiling
right but so the question is okay so i
don't i don't know
his analysis on that my from talking to
go experts the depth the strategic depth
of go seems to be
substantially greater than that of chess
that there's more kind of
steps of improvement that you can make
get getting better and better and better
but there's no reason to think that it's
infinite yeah
and so it could be that it's that the
what david is seeing is a kind of
asymptoting that you can keep getting
better
but with diminishing returns and at some
point you hit
optimal play like in theory all these
finite games
they're finite they have an optimal
strategy there's a strategy that is the
minimax optimal strategy
and so at that point you can't get any
better you can't beat that that strategy
now that strategy may be
from an information processing
perspective
intractable right the you need
the the all the situations are
sufficiently different that you can't
compress it at all it's this
giant mess of hard-coded rules
and we can never achieve that but but
that still puts a cap on how many levels
of improvement that we can actually make
but the the thing about self-play is if
you
if you put it although i don't like
doing that in the broader category of
self-supervised learning is that it
doesn't require
too much or any human human labeling
yeah yeah human label or
just human effort the human involvement
past a certain point
and the same thing you could argue is
true for
the recent breakthroughs in natural
language processing with language models
oh this is how you get to gpt3
yeah see how that did the uh that was a
good good transition yeah yeah
i practiced that for days uh leading up
to this guy now
uh but like that's one of the questions
is
can we find ways to formulate problems
in this world that are important to us
humans
like more important than the game of
chess
that uh to which self-supervised kinds
of approaches could be applied whether
it's self-play
for example for like maybe you could
think of like autonomous vehicles
in simulation that kind of stuff
or just robotics applications and
simulation
or in the self-supervised learning
where unannotated
data or data that's generated by humans
naturally without extra cost like the
wikipedia or like all of the internet
can be used
to learn something about to create
intelligent systems that do something uh
really powerful that pass the turing
test or that
do some kind of superhuman level
performance
so what's your intuition like trying to
stitch all of it together
about our discussion of agi
the limits of self-play and your
thoughts about maybe the limits of
neural networks in the context of
language models is there some intuition
in there that might be
useful to think about yeah yeah yeah so
so
first of all the the whole transformer
network
family of things um is really cool
it's really really cool i mean for you
know if you've ever
back in the day you played with i don't
know mark off models for generating text
and you've seen the kind of text that
they spit out
and you compare it to what's happening
now it's
it's amazing it's so amazing now it
doesn't take very long
interacting with one of these systems
before you find the holes
right it's it's not smart in any
kind of general way
it's really good at a bunch of things
and it does seem to understand a lot of
the statistics of language extremely
well
and that turns out to be very powerful
you can answer many questions with that
but it doesn't make it a good
conversationalist
right and doesn't make it a good
storyteller it just makes it good at
imitating of things it has seen
in the past the exact same thing could
be said by
people who voting for donald trump about
joe biden supporters
and people voting for joe biden about
donald trump supporters
is uh you know that they're not
intelligent they're just following
the yeah they're following things
they've seen in the past and uh
so it's very it doesn't take long to
find the flaws
in their uh in their like
natural language generation abilities
yes yeah so we're being very
that's interesting critical of ass
right so so i've had a similar thought
which was that
the stories that gpt-3 spits out
are amazing and very human-like
and it doesn't mean that computers are
smarter than we realize necessarily it
partly means that people are dumber than
we realize
or that much of what we do day to day
is not that deep like we're just we're
just kind of going with the flow we're
saying
whatever feels like the natural thing to
say next not a lot of it
is is is creative or meaningful or or
intentional but enough is that we
actually get
we get by right we we do come up with
new ideas sometimes and we do
manage to talk each other into things
sometimes and we do sometimes
vote for reasonable people sometimes
but um but it's really hard to see in
the statistics because so much of what
we're saying is kind of rote
and so our metrics that we use to
measure how these systems are doing
don't reveal that because it's it's it's
in the interest this is
that that is very hard to detect but is
your
do you have an intuition that with these
language models
if they grow in size it's already
surprising that when you go from gpt2 to
gpg3
that there is a noticeable improvement
so the question now goes back to the
ominous david silver and the ceiling
right so maybe there's just no ceiling
we just need more compute now
i mean okay so now i'm speculating yes
as opposed to before when i was
completely on firm yeah all right um
i don't believe that you can get
something that really
can do language and use language as a
thing that doesn't interact with people
like i think that it's not enough to
just take everything that we've said
written down and just say
that's enough you can just learn from
that and you can be intelligent i think
you really need to
be pushed back at i think that
conversations
even people who are pretty smart maybe
the smartest thing that we know
not maybe not the smartest thing we can
imagine but we get
so much benefit out of talking to each
other and interacting
that's presumably why you have
conversations live with guests is that
that there's
something in that interaction that would
not be exposed by
oh i'll just write your story and then
you can read it later and i think
i think because these systems are just
learning from our stories they're not
learning from
being pushed back at by us that they're
fundamentally limited into what they
could actually become
on this route they have to they have to
get
you know shut down like we like we have
to have an argument that
they have to have an argument with us
and lose a couple times before they
start to realize
oh okay wait there's some nuance here
that actually matters
yeah that's actually subtle sounding but
quite profound that the interaction with
humans is essential
and the limitation within that is
profound as well
because the time scale like the
bandwidth at which you can really
interact with humans is very low
so it's costly so you can't one of the
underlying things about self self-plays
it has to do you know a very large
number of interactions
and so you can't really deploy
reinforcement learning systems
into the real world to interact like you
couldn't deploy a language model
into the real world to interact with
humans because it would just not get
enough data
relative to the cost it takes to
interact like the time of humans is is
expensive
which is really interesting that's that
go that takes us back to reinforce and
learning and trying to figure out
if there's ways to make algorithms that
are more efficient at learning
keep the spirit and reinforcement
learning and become more efficient
in some sense this seems to be the goal
i'd love to hear what your
thoughts are i don't know if you got a
chance to see a
blog post called bitter lesson oh yes
but rich sutton that makes an argument
hopefully i can
summarize it perhaps perhaps you can
yeah but
okay so i i mean i could try and you can
correct me which is uh
he makes an argument that it seems if we
look at the long arc
of the history of the artificial
intelligence field it calls you know 70
years
that the algorithms from which we've
seen
the biggest improvements in practice are
the very simple
like dumb algorithms that are able to
leverage computation
and you just wait for the computation to
improve like all the academics and so on
have fun by
finding little tricks and and
congratulate themselves on those tricks
and sometimes those tricks can be like
big that feel in the moment like big
spikes and breakthroughs but in reality
over the decades it's still the same
dumb algorithm that just
waits for the compute to get faster and
faster do you find
that to be an interesting argument
against the entirety of the field of
machine learning
that's an academic discipline that we're
really just a subfield of computer
architecture yeah
we're just kind of waiting around for
them to do we really don't want to do
hardware work so like that's right i
really don't want to
we're procrastinating yes that's right
just waiting for them to do their job so
that we can pretend to have done ours so
uh yeah i mean the argument reminds me a
lot of
i think it was a fred jelinek quote uh
early computational linguist who said
you know we're building these
computational linguistic systems and
every time we fire a linguist
performance goes up by ten percent
something like that and so the idea of
us building the knowledge in
in that in that case um was much less he
was finding to be much less successful
than get rid of the people who know
about language as a
you know from a kind of scholastic
academic kind of perspective and replace
them with more compute
and so i think this is kind of a modern
version of that story which is okay we
want to do better on
machine vision you could build in all
these you know
motivated part-based models that you
know that just feel like
obviously the right thing that you have
to have or we can throw a lot of data at
it and guess what we're doing better
with it with a lot of data
so i i hadn't thought about it until
this moment in this way
but what i believe well i've thought
about what i believe
what i believe is that you know
compositionality and
what's the right way to say it the
complexity grows
rapidly as you consider more and more
possibilities
like explosively and so far moore's law
has also
been growing explosively exponentially
and so so it really does seem like well
we don't have to
think really hard about the algorithm
design or the way that we build the
systems
because the best benefit we could get is
exponential and the best benefit that we
can get from waiting is exponential
so we can just wait it's got that's
gotta end right and there's hints now
that that moore's law is
is starting to feel some friction uh
starting to
the world is pushing back a little bit
um one thing i
i don't know do lots of people know this
i didn't know this i was i was trying to
write an essay and yeah moore's law has
been amazing and it's been it's enabled
all sorts of things but there's a
there's also a kind of counter moore's
law which is that the development cost
for each
successive generation of chips also is
doubling
so it's costing twice as much money so
the amount of development money per
cycle or whatever is actually sort of
constant and at some point
we run out of money uh so or we have to
come up with an entirely different way
of
of doing the development process so like
i
i guess i always always a bit skeptical
of the look it's an exponential curve
therefore it has
no end soon the number of people going
to nurips will be greater than the
population of the earth
that means we're going to discover life
on other planets no it doesn't it means
that we're in a
in a sigmoid curve on the front half
which looks a lot like an exponential
the second half is going to look a lot
like diminishing returns
yeah the i mean but the interesting
thing about moore's law if you actually
like look at the technologies involved
it's
hundreds if not thousands of s-curves
stacked on top of each other it's not
actually an exponential curve
it's constant breakthroughs and and
then what becomes useful to think about
which is exactly what you're saying
the cost of development like the size of
teams the amount of resources that are
invested
in continuing to find new s-curves new
breakthroughs
and yeah it's uh it's an interesting
idea
you know if we live in the moment if we
sit here today
it seems to be the reasonable thing to
say that
exponentials end and yet in the software
realm
they just keep appearing to be happy
anyway
and it's so i mean it's so
hard to disagree with elon musk on this
because
it it like i i've
you know i used to be one of those folks
i'm still one of those folks i've
studied autonomous vehicles that's what
i worked on and
and it's it's like you look what elon
musk is saying about autonomous vehicles
well obviously in a couple years
or in a year or next month we'll have
fully autonomous vehicles like there's
no reason why we can't driving is pretty
simple
like it's just a learning problem and
you just need to convert
uh all the driving that we're doing into
data and just having you all know with
the trains on that data
and uh like we use only our eyes so you
can use cameras and
you can train on it and it's like yeah
that's that what that should work
and then you put that hat on like the
philosophical hat
and but then you put the pragmatic hat
and it's like this is what the flaws of
computer vision are like
this is what it means to trans scale and
then you you put the
human factors the psychology hat on
which is like
it's actually driving us a lot the
cognitive science or cognitive whatever
the heck you call it
is it's really hard it's much harder to
drive than
than we realize there's much larger
number of edge cases
so building up an intuition around this
is uh
around exponential is really difficult
and on top of that
the pandemic is making us think about
exponentials
making us realize that like we don't
understand anything about it we're not
able to intuit exponentials we're either
that's true ultra terrified some part of
the population and some part is like
uh the opposite of whatever the
carefree and we're not managing
everything
blase well wow that's that french
uh it seems so it's got so it's uh
it's fascinating to think what what
the limits of this exponential
growth of technology not just moore is
law
it's technology how that rubs up against
the bitter lesson and gpt-3
and self-play mechanisms like it's not
obvious
i used to be much more skeptical about
neural networks now
at least give a slither possibility that
we'll be
all though will be very much surprised
and
also you know uh
caught in a way that like
we uh are not prepared for
like in applications of um
social networks for example sure because
it feels like
really good transformer models that are
able to do
some kind of like very good
uh natural language generation of the
same kind of models that could be used
to learn human behavior and then
manipulate that human behavior to
gain advertiser dollars and all those
kinds of things sure uh
feed the capitalist system and and right
so they arguably already are
manipulating human behavior
yeah yeah so but not for
self-preservation
which i think is a big that would be a
big step like if they were trying to
manipulate us
to convince us not to shut them off i
would be very freaked out
but i don't see a path to that from
where we are now
they they don't have any of those
abilities that's not what they're trying
to do they're trying to keep people on
on the site but see the thing is this
this is the thing about life on earth
is they might be borrowing our
consciousness and sentience like
so like in a sense they do because the
creators of the algorithms
have like they're not you know if you
look at our body
okay we're not a single organism we're a
huge number of organisms with like tiny
little motivations we're built on top of
each other
in the same sense the ai algorithms that
are they're not
it's a system that includes human
companies and corporations right because
corporations are
funny organisms in and of themselves
that really do seem to have self
preservation built in and i think that's
at the at the design level i think
they're designed to have
self-preservation
be a focus so you're right in that in
that broader
system that we're also a part of
and can have some influence on uh
it's it's it is much more complicated
much more powerful yeah i agree with
that
uh so people really love it when i ask
what three books
technical philosophical fiction had a
big impact in your life maybe you
couldn't recommend
we went with movies we went uh
with uh billy joel and i forgot what you
uh
what music you recommended but i didn't
i just said i have no taste in music i
just like pop music that was actually
really uh skillful the way you thank you
that question i'm going to try to do the
same with the books
so do you have a skillful way to avoid
answering the question about three books
you would recommend
i'd like to tell you a story
so um my first job out of college was at
bellcore i mentioned that before
where i worked with dave ackley the head
of the group was a guy named tom
landauer and i don't know
how well known he's known now but
arguably he's the
he's the inventor and the first
proselytizer of word embeddings
so they they developed a system shortly
before i got to the group
yeah um that that uh called latent
semantic analysis that would take
words of english and embed them in you
know multi-hundred dimensional space
and then used that as a way of uh you
know assessing similarity and basically
doing reinforcement learning
not sorry not reinforcing information
retrieval you know sort of pre-google
information retrieval
and he was trained as an anthropologist
but then
became a cognitive scientist so i was in
the cognitive science research group
it's you know like i said
i'm a cognitive science groupie um at
the time i thought i'd become a
cognitive scientist but then i realized
in that group
no i'm a computer scientist but i'm a
computer scientist who really loves to
hang out with cognitive scientists
and he said he studied language
acquisition in particular he said you
know humans have about
this number of words of vocabulary and
most of that is learned from reading
and i said that can't be true because i
have a really big vocabulary
and i don't read he's like you must i'm
like i don't think i do i mean like stop
signs i definitely read stop signs
but like reading books is not it's not a
thing that i do
really though it might be just no i
might be the red color
do i read stop signs yeah no it's just
pattern recognition at this point i
don't sound it out
um so now i do
i wonder what that oh yeah stop the guns
so um that's fascinating so you don't uh
so i don't read very i mean obviously i
read and i've read
i've read plenty of books um but like
some people like
charles my friend charles and and and
others like a lot of people in my field
a lot of
academics like reading was really a
central topic to them
in development and i'm not that guy in
fact i used to joke
that um when i got into college that it
was on
kind of a help out the illiterate kind
of program because i got to
like in my house i wasn't a particularly
bad or good reader but when i got to
college i was surrounded by these people
that were just
voracious in their reading appetite and
they were like have you read this have
you read this have you read this
and i'd be like no i'm clearly not
qualified to be at this school like
there's no way i should be here
now i've discovered books on tape like
audiobooks
um and so i'm i'm much better uh i'm
more caught up i read a lot of books
a small tangent on that it is a
fascinating open question to me
on the topic of driving whether
you know supervised learning people
machine learning people think you have
to like
drive to learn how to drive to me it's
very possible that just by
us humans by first of all walking but
also by
watching other people dr not even being
inside cars as a passenger
but let's say being inside the car as a
passenger but even just
like being a pedestrian and crossing the
road you learn
so much about driving from that it's
very possible
that you can without ever being inside
of a car
be okay at driving once you get in it uh
or like watching a movie for example
yeah i don't know
something like that it's have you have
you taught anyone to drive
no so i have myself i have two children
and um i learned a lot about car driving
because my wife doesn't want to be the
one in the car while they're learning so
that's my job
yeah so i sit in the passenger seat and
it's really scary
um you know i have wishes to live um
and they're you know they're figuring
things out now they start off
very very much better than i imagine
uh like a neural network would right
they get that they're seeing the world
they get that there's a road that
they're trying to be on they get that
there's a relationship between the angle
the steering
but it takes a while to not be very
jerky
and so that happens pretty quickly like
the ability to stay in lane
at speed that happens relatively fast
it's not zero shot learning but
it's pretty fast the thing that's
remarkably hard and this is i think
partly why self-driving cars are really
hard
is the degree to which driving is a
social interaction activity yes
and that blew me away i was completely
unaware of it until i watched my son
learning to drive
and i was realizing that he was sending
signals to all the cars around him
and those in his case he's he's always
had
social communication challenges
he was sending very mixed confusing
signals to the other cars and that was
causing the other cars to drive
weirdly and erratically and there was no
question in my mind
that he would he would have an accident
because
they didn't know how to read him there's
things you do with the the speed that
you drive the positioning of your car
that you're constantly like in the head
of the other drivers
and seeing him not knowing how to do
that and having to be taught explicitly
okay you have to be thinking about what
the other driver is thinking
was a revelation to me yeah i was
supposed to be really
so so creating kind of uh theories of
mind of the other theories of mind of
the other cars
yeah yeah which i just hadn't heard
discussed in the self-driving car
talks that i've been to since then
there's some people who do do
consider those kinds of issues but it's
way more subtle than i think
there's a little bit of work involved
with that when you realize
like when you especially focus not on
other cars but on pedestrians for
example
it's it's a literally staring you in the
face yeah yeah yeah so that when you're
just like how do i interact with
pedestrians
um yeah like pedestrians you're
practically talking to an octopus at
that point they've got all these
weird degrees of freedom you don't know
what they're going to do they can turn
around any second but the point is
we humans know what they're going to do
like we have a good
theory of mind we have a good mental
model of what they're doing
and we have a good model of the model
that have a view
and the model of the model of the model
like they're we're able to kind of
reason about this kind of
uh the social like game of it
uh all the hope is that it's quite
simple actually
that it could be learned that's what i
just talked to the waymo i don't know if
you know that company it's
google south africa they i talked to
their cto
about this podcast and they like i wrote
in their car
and it's quite aggressive and it's quite
fast and it's good and it feels
great it make it also just like tesla
waymo made me change my mind about like
maybe driving is easier than i thought
maybe i'm just being
speciesist human
maybe uh it's a speciesist argument yes
i don't know
but it it's fascinating to think about
like the same
as with reading which i think you just
said you avoided the question
but i still hope you answered in some
way we avoided it brilliantly
it is there's blind spots there's
artificial intelligence
that artificial intelligence researchers
have about what it actually
takes to learn to solve a problem have
you had anka dragon
on yeah okay one of my favorites so much
energy
she's right oh she yeah she's amazing
fantastic and and in particular she
thinks a lot about this kind of
i know that you know that i know kind of
planning and
the last time i spoke with her she was
very articulate about the
ways in which self-driving cars are not
solved
like what's still really really hard but
even her intuition is limited
like we're all like new to this uh so in
some sense the elon musk approach of
being ultra confident and just like
put it out there putting it out there
like some people say it's reckless and
dangerous and so on
but like partly it's like it seems to be
one of the only ways to make progress in
artificial intelligence so it's uh
it's you know these these are difficult
things you know democracy
is messy uh uh implementation of
artificial
intelligence systems in the real world
is messy so
many years ago before self-driving cars
were an actual thing you could have a
discussion about
somebody asked me like what if what if
the what if we could use that robotic
technology and use it to drive cars
around
like isn't that aren't people going to
be killed and then it's not you know
blah blah blah
i'm like that's not what's gonna happen
i said with confidence
incorrectly obviously uh what i think is
gonna happen is we're gonna have a lot
more
like a very gradual kind of rollout
where people
have these cars in like closed
communities
right where it's somewhat realistic but
it's still
in a box right so that we can really get
a sense of what
what are the weird things that can
happen how do we
how do we have to change the way we
behave around these vehicles like
it obviously requires a kind of
co-evolution
that you can't just plop them in and see
what happens
but of course we're basically popping
them in to see what happens so i was
wrong but i
do think that would have been a better
plan so that's but your intuition that's
funny
just zooming out and looking at the
forces of capitalism
and it seems that capitalism rewards
risk takers
and rewards and punishes risk takers
like it
and like try it out
the academic approach
to let's try a small thing
and try to understand slowly the
fundamentals of the problem
and let's start with one and do two and
then see that
and then do the three uh you know uh the
the capitalist like startup
entrepreneurial dream is
let's build a thousand and let's right
and 500 of them fail but whatever the
other 500 we
learned from them but if you're good
enough i mean one thing it's like your
intuition would say like
that's going to be hugely destructive to
everything but
actually it's kind of the the the forces
of capitalism
people are quite it's easy to be
critical but if you actually look at the
data
at the way our world has progressed in
terms of the quality of life
it seems like the competent good people
rise to the top
this is coming from me from the soviet
union and
so on it's like it's interesting
that somebody like elon musk is the way
you uh you push progress in artificial
intelligence like it's forcing way more
to step this
their stuff up uh and waymo is forcing
uh
elon musk to step up
it's fascinating i because i have this
tension in in my
heart and just being upset by
the lack of progress in autonomous
vehicles and within academia
so there's a huge progress in the early
days
of the darpa challenges and then it just
kind of stopped
like at mit but it's true everywhere
else
with an exception of a few sponsors here
and there
is is like it's not seen as a sexy
problem
uh thomas like the moment artificial
intelligence starts
approaching the problems of the real
world
like academics kind of like ah all right
let let the couple get really hard in a
different way in a different way and
that's right i think yeah right some of
us are not
excited about that other way but i still
think there's fundamentals problems to
be
solved in those difficult things it's
not it's still
publishable i think like we just need to
it's the same criticism you could have
of all these conferences in europe's
cvpr where application papers
are often as powerful and as important
as like
uh theory paper even like theory just
seems
much more respectable and so on i mean
machine learning community is changing
that a little bit
i mean at least in statements but it's
it's still
not seen as the sexiest of uh pursuits
which is like how do i actually make
this thing work in practice
as opposed to on this toy data set
all that to say are you still avoiding
the three books question is there
something on audiobook that you can uh
recommend
oh i've yeah i mean um i yeah i've read
a lot of really fun stuff
uh in terms of books that i find myself
thinking back on that i read a while ago
like that have stood the test of time to
some degree i find myself thinking of
program or be programmed a lot by
douglas
roshkopf um which
was it basically put out the premise
that we all need to become
programmers in one form or another
and it was an analogy to once upon a
time we all had to become
readers we had to become literate and
there was a time before that when not
everybody was literate but once literacy
was possible
the people who were literate had more of
a say
in society than the people who weren't
and so we made a big effort to get
everybody up to speed and now it's
it's not 100 universal but it's quite
widespread
like the assumption is generally that
people can read
the analogy that he makes is that
programming is a similar kind of thing
that uh
that we need to have a say
in right so being a reader being
literate being a reader means you can
receive all this information
but you don't get to put it out there
and programming is the way that we get
to put it out there
that was the argument he made i think he
specifically has now
backed away from this idea he doesn't
think it's happening quite this way
and that might be true that it didn't
society
didn't sort of play forward quite that
way i still believe in the premise i
still believe that at some point
we have the relationship that we have to
these machines and these networks
has to be one of each individual can has
the wherewithal
to make the machines help them
do do the things that that person once
done and as so you know as software
people we know how to do that and we
have a problem we're like okay i'll just
i'll hack up a perl script or something
and make it so
if we lived in a world where everybody
could do that that would be a
better world and computers would be have
i think
less sway over us and other people's
software would have less sway over us
as a group yeah in some sense software
engineering programming's power
it's programming is power right it's
it's yeah it's like magic it's like
magic spells and
and it's not out of reach of everyone
but at the moment it's just a sliver of
the population who can
who can commune with machines in this
way so i don't know so that book had a
big
big impact on me currently i'm i'm
reading uh the alignment problem
actually by brian christian so i don't
know if you've seen this out there yet
is this similar to stuart russell's work
with the control problem
it's in in that same general
neighborhood i mean they take they have
different
emphases that they're they're
concentrating on i think i think
stewart's book
did a remarkably good job like a just a
celebratory good job at describing
ai technology and sort of how it works i
thought that was great it was really
cool to see that in a book
yeah i think he has some experience
writing some books
you know that's probably a possible
thing he's maybe thought a thing or two
about how to explain
ai to people yeah yeah that's a really
good point um this book so far
has been remarkably good at telling the
story
of the sort of the history the recent
history
of some of the things that have happened
uh this i'm in the first third he said
this book is in three thirds the first
third is
essentially ai fairness and you know
implications of ai on society that we're
seeing right now
and that's been great i mean he's
telling the stories really well he's he
went out and talked to the frontline
people who
whose names are associated with some of
these ideas and and it's been terrific
he says the second half of the book is
on reinforcement learning so
maybe that'll be fun um and then the
third half
third third is on uh this is super
intelligence alignment problem
and i i suspect that that part will be
less fun for me to read
yeah it's yeah it's
it's an interesting problem to talk
about i find it to be the most
interesting just like
thinking about whether we live in a
simulation or not as a
as a thought experiment to think about
our own existence
so in the same way talking about
alignment problem with agi
is a good way to think similarly like
the trolley problem with autonomous
vehicles
it's a useless thing for engineering but
it's a it's a nice little thought
experiment for
actually thinking about what are like
our own
human ethical systems our moral systems
to
to to uh by thinking how we engineer
these things you start to understand
yourself
so sci-fi can be good at that too so one
sci-fi book to recommend is
exhalations by ted chang a bunch of
short stories
um this ted chang is the guy who wrote
the short story that became the movie
arrival
um and all his stories just from a
he's he was a computer scientist
actually he studied at brown
they all have this sort of really
insightful bit of
science or computer science that drives
them and so it's just
a romp right to just like he creates
these artificial worlds with these
by extrapolating on these ideas that
that we know about
but hadn't really thought through to
this kind of conclusion and so his stuff
is
it's really fun to read it's mind
warping
so i'm not sure if you're familiar i
seem to mention this every other word
uh is i'm from the soviet union and i'm
russian
uh read way too much my roots are
russian too
but a couple generations back well it's
probably in there somewhere so maybe we
can uh
we can pull up that thread a little bit
of the existential dread
that we all feel you mentioned that you
i think somewhere in the conversation
you mentioned they you don't
really pretty much like dying i forget
in which context
it might have been a reinforcement
learning perspective i don't know i know
you know what it was it was
in teaching my kids to drive
that's that's how you face your
mortality yes uh
from a human being's perspective or from
a reinforcement learning researcher's
perspective let me ask you the most
absurd question
what's uh what do you think is the
meaning of this whole thing
the meaning of life on this spinning
rock
i mean i think reinforcement learning
researchers maybe think about this from
a science perspective more often
than a lot of other people right as a
supervised learning person you're
probably not thinking about the sweep of
a lifetime but reinforcement learning
agents are
having little lifetimes little weird
little lifetimes and it's
it's hard not to project yourself into
their world sometimes
but you know as far as the meaning of
life so i when i turned 42
you may know from that's a that is a
book i read um
the the historical hitchhiker's guide to
the galaxy
that that is the meaning of life so when
i turned 42 i had a meaning of life
party
where i invited people over and um
everyone shared their meaning of life we
they
we had slides made up and so we had we
all sat down and
did a slide presentation to each other
about the meaning of life
and mine mine was balance
i think that life is balance and um
so the activity at the party for a 42
year old maybe this is a little bit
non-standard
but i i found all the little toys and
devices that i had that where you had to
balance on them you had to
like stand on it and balance or pogo
stick i brought
a ripstick which is like a weird
two-wheeled skateboard
um i got a unicycle but i didn't know
how to do it i didn't know how to do it
i now can do it i love watching you try
yeah i'll send you a video
i'm not great but i put but but i
managed
um and so uh so balanced yeah so so
my my wife has a really good one that
she
sticks to and is probably pretty
accurate and it has to do with
healthy relationships with people that
you love
and working hard for good causes but to
me yeah balance balance
in a word that's that that works for me
not too much of anything because too
much of anything is
iffy that feels like uh rolling stone
song i feel like they must be
you can't always get what you want but
if you try sometimes
you can strike a balance yeah i think
that's how it goes
uh michael i'll write your parody it's a
huge honor to talk to you this
been a big fan of yours so um uh
can't uh can't wait to see what you do
next in the world of uh education the
world of parity in the world of
reinforcement learning thanks for
talking today my pleasure
thank you for listening to this
conversation with michael littman and
thank you to our sponsors
simplisafe a home security company i use
to monitor
and protect my apartment expressvpn
the vpn i've used for many years to
protect my privacy and the internet
masterclass online courses that i enjoy
from some of the most amazing humans in
history
and better help online therapy with a
licensed professional
please check out the sponsors in the
description to get a discount and to
support this podcast
if you enjoy this thing subscribe on
youtube review
five stars napa podcast follow on
spotify
support it on patreon or connect with me
on twitter
at lex friedman and now let me leave you
some
words from groucho marx if you're not
having fun
you're doing something wrong thank you
for listening and hope to see you
next time
you