Yann LeCun: Deep Learning, ConvNets, and Self-Supervised Learning

Yann LeCun: Deep Learning, ConvNets, and Self-Supervised Learning | Lex Fridman Podcast #36

SGSOCuByo24 • 2019-08-31

Transcript preview

Open

Kind: captions
Language: en
the following is a conversation with
Jana kun he's considered to be one of
the fathers of deep learning which if
you've been hiding under a rock is the
recent revolution in AI that's
captivated the world with the
possibility of what machines can learn
from data he's a professor in New York
University a vice president and chief AI
scientist a Facebook & Co recipient of
the Turing Award for his work on deep
learning he's probably best known as the
founding father of convolutional neural
networks in particular their application
to optical character recognition and the
famed M NIST data set he is also an
outspoken personality unafraid to speak
his mind in a distinctive French accent
and explore provocative ideas both in
the rigorous medium of academic research
and the somewhat less rigorous medium of
Twitter and Facebook this is the
artificial intelligence podcast if you
enjoy it subscribe on YouTube give it
five stars on iTunes support and on
patreon we're simply gonna equip me on
Twitter Alex Friedman spelled the Fri D
ma N and now here's my conversation with
Yann Laocoon you said that 2001 Space
Odyssey is one of your favorite movies
Hal 9000 decides to get rid of the
astronauts for people haven't seen the
movie spoiler alert because he it she
believes that the astronauts they will
interfere with the mission do you see
how is flawed in some fundamental way or
even evil or did he do the right thing
neither there's no notion of evil in
that in that context other than the fact
that people die but it was an example of
what people call value misalignment
right you give an objective to a machine
and the Machine strives to achieve this
objective and if you don't put any
constraints on this objective like don't
kill people and don't do things like
this
the Machine given the power will do
stupid things just to achieve this dis
objective or damaging things to achieve
its objective it's a little bit like we
are used to this in the context of human
society we we put in place laws to
prevent people from doing bad things
because fantasy did we do those bad
things right so we have to shave their
cost function the objective function if
you want through laws to kind of correct
an education obviously to sort of
correct for for those so maybe just
pushing a little further on on that
point how you know there's a mission
there's a this fuzziness around the
ambiguity around what the actual mission
is but you know do you think that there
will be a time from a utilitarian
perspective or an AI system where it is
not misalignment where it is alignment
for the greater good of society that
kneei system will make decisions that
are difficult well that's the trick I
mean eventually we'll have to figure out
how to do this and again we're not
starting from scratch because we've been
doing this with humans for four
millennia
so designing objective functions for
people is something that we know how to
do and we don't do it by you know
programming things although the legal
code is called code so that tells you
something and it's actually the design
of an object you function that's really
what legal code is right it tells you
you can do it what you can't do if you
do it you pay that much that's that's an
objective function so there is this idea
somehow that it's a new thing for people
to try to design objective functions are
aligned with the common good but no
we've been writing laws for millennia
and that's exactly what it is
so this that's where you know the
science of lawmaking and and computer
science will come together will come
together so it's nothing there's nothing
special about how or a I systems is just
the continuation of tools used to make
some of these difficult ethical
judgments that laws make yeah and we and
we have systems like this already that
you know make many decisions for
ourselves in society that you know need
to be designed in a way that they like
you know rules about things that
sometimes sometimes have bad side
effects and we have to be flexible
enough about those rules so that they
can be broken when it's obvious that
they shouldn't be applied so you don't
see this on the camera here but all the
decorations in this room is all pictures
from 2001 a Space Odyssey Wow
and by accident or is there a lot about
accident it's by design Wow so if you
were if you were to build hell 10,000 so
an improvement of Hal 9000 what would
you improve well first of all I wouldn't
ask you to hold secrets and tell lies
because that's really what breaks it in
the end that's the the fact that it's
asking itself questions about the
purpose of the mission and it's you know
pieces things together that it's heard
you know all the secrecy of the
preparation of the mission and the fact
that it was discovery and on the lunar
surface that really was kept secret and
and one part of Hal's memory knows this
and the other part is does not know it
and it's supposed to not tell anyone and
that creates a internal conflict do you
think there's never should be a set of
things that night AI system should not
be allowed like a set of facts that
should not be shared with the human
operators well I think no I think the I
think it should be a bit like in the
design of autonomous AI systems there
should be the equivalent of you know the
the the oath that hypocrite Oh calm
yourself yeah that doctors sign up to
right so the certain thing certain rule
said that that you have to abide by and
we can sort of hardwire this into into
our into our machines to kind of make
sure they don't go so I'm not you know
advocate of the the 303 dollars of
Robotics you know the as you move kind
of thing because I don't think it's
practical but but you know some some
level of of limits but but to be clear
this is not these are not questions that
are kind of really worth asking today
because we just don't have the
technology to do this we don't we don't
have a ton of missing teller machines we
have intelligent machines so my
intelligent machines that are very
specialized but they don't they don't
really sort of satisfy an objective
they're just you know kind of trained to
do one thing so until we have some idea
for design of a full-fledged autonomous
intelligent system asking the question
of how we design use objective I think
is a little a little too abstract it's a
little tough rat there's useful elements
to it in that it helps us understand our
own ethical codes humans so even just as
a thought experiment if you imagine that
in a GI system is here today how would
we program it is a kind of nice thought
experiment of constructing how should we
have a law have a system of laws far as
humans
it's just a nice practical tool and I
think there's echoes of that idea too in
the AI systems left today it don't have
to be that intelligent
yeah like autonomous vehicles there's
these things start creeping in that were
thinking about but certainly they
shouldn't be framed as as hell yeah
looking back what is the most I'm sorry
if it's a silly question but what is the
most beautiful or surprising idea and
deep learning or AI in general that
you've ever come across sort of
personally well you said back and and
just had this kind of wow that's pretty
cool
moment that's nice well surprising I
don't know if it's an idea rather than a
sort of empirical fact the fact that you
gigantic neural nets trying to train
them on you know relatively small
amounts of data relatively with the
caste grid in the center that it
actually works breaks everything you
read in every textbook right every pre
deep learning textbook that told you you
need to have fewer parameters and you
have data samples you know if you have
non-convex objective function you have
no guarantee of convergence you know all
the things that you read in textbook and
they tell you stay away from this and
they were all wrong huge number of
parameters non-convex and somehow which
is very relative to the number of
parameters data it's able to learn
anything right does that surprise you
today well it it was kind of obvious to
me before I knew anything that that's
that this is a good idea and then it
became surprising that it worked
because I started reading those text
books okay so okay you talk to the
intuition of why was obviously if you
remember well okay so the intuition was
it's it's sort of like you know those
people in the late 19th century who
proved that heavier than than air flight
was impossible right and of course you
have birds right they do fly and so on
the face of it it it's obviously wrong
as an empirical question right and so we
have the same kind of thing that you
know the we know that the brain works we
don't know how but we know it works and
we know it's a large network of neurons
and interaction and the learning takes
place by changing the connection so kind
of getting this level of inspiration
without copying the details but sort of
trying to derive basic principles
you know that kind of gives you a clue
as to which direction to go there's also
the idea somehow that I've been
convinced of since I was an undergrad
that even before that intelligence is
inseparable from running so you the idea
somehow that you can create an
intelligent machine by basically
programming for me was a non-starter you
know from the start every intelligent
entity that we know about arrives at
this intelligence to learning so
learning you know machine learning was
completely obvious path also because I'm
lazy so you know it's automate basically
everything and learning is the
automation of intelligence right so do
you think so what is learning then what
what falls under learning because do you
think of reasoning is learning where
reasoning is certainly a consequence of
learning as well just like other
functions of of the brain
the big question about reasoning is how
do you make reasoning compatible with
gradient based learning do you think
neural networks can be made to reason
yes that there's no question about that
again we have a good example right the
question is is how so the question is
how much prior structure you have to put
in the neural net so that something like
human reasoning will emerge from it you
know from running another question is
all of our kind of model of what
reasoning is that are based on logic are
discrete and and and are therefore
incompatible with gradient based
learning and I was very strong believer
in this idea Grandin baserunning I don't
believe that other types of learning
that don't use kind of gradient
information if you want so you don't
like discrete mathematics you don't like
anything discrete
well that's it's not that I don't like
it it's just that it's it's incompatible
with learning and I'm a big fan of
running right so in fact that's perhaps
one reason why deep learning has been
kind of looked at with suspicion by a
lot of computer scientists because the
math is very different the method you
use for deep running you know we kind of
as more to do with you know cybernetics
the kind of math you do in electrical
engineering then the kind of math you
doing computer science and and you know
nothing in in machine learning is exact
right computer science is all about sort
of you know obviously compulsive
attention to details of like you know
every index has to be right and you can
prove that an algorithm is correct right
machine learning is the science of
sloppiness really that's beautiful so
okay maybe let's feel around in the dark
of what is a neural network that reasons
or a system that is works with
continuous functions that's able to do
build knowledge however we think about
reasoning builds on previous knowledge
build on extra knowledge create new
knowledge generalized outside of any
training set ever built what does that
look like if yeah maybe
do you have Inklings of thoughts of what
that might look like well yeah I mean
yes or no if I had precise ideas about
this I think you know we'd be building
it right now but and there are people
working on this or whose main research
interest is actually exactly that right
so what you need to have is a working
memory so you need to have some device
if you want some subsystem they can
store a relatively large number of
factual episodic information for you
know a reasonable amount of time so you
you know in the in the brain for example
it kind of three main types of memory
one is the sort of memory of the the
state of your cortex and that sort of
disappears within 20 seconds you can't
remember things for more than about 20
seconds or a minute if if you don't have
any other form of memory the second type
of memory which is longer term is short
term is the hippocampus so you can you
know you came into this building you
remember whether where the the exit is
where the elevators are you have some
map of that building that's stored in
your hippocampus you might remember
something about what I said you know if
you
minutes ago and forgot all our stars
being raised but you know but that does
not work in your hippocampus and then
the the longer term memory is in the
synapse the synapses right so what you
need if you want for a system that's
capable reasoning is that you want the
hippocampus like thing right and that's
what people have tried to do with memory
networks and you know no Turing machines
and stuff like that right and and now
with transformers which have sort of a
memory in their kind of self attention
system you can you can think of it this
way so so that's one element you need
another thing you need is some sort of
network that can access this memory get
an information back and then kind of
crunch on it and then do this
iteratively multiple times because a
chain of reasoning is a process by which
you you you can you update your
knowledge about the state of the world
about you know what's gonna happen etc
and that there has to be this sort of
recurrent operation basically and you
think that kind of if we think about a
transformer so that seems to be too
small to contain the knowledge that's
that's to represent the knowledge as
containing Wikipedia for example but
transformer doesn't have this idea of
recurrence it's got a fixed number of
layers and that's number of steps that
you know limits basically it's a
representation but recurrence would
build on the knowledge somehow I mean
yeah it would evolve the knowledge and
expand the amount of information perhaps
or useful information within that
knowledge yeah but is this something
that just can emerge with size because
it seems like everything we have now is
just no it's not it's not it's not clear
how you access and right into an
associative memory in efficient way I
mean sort of the original memory network
maybe had something like the right
architecture but if you try to scale up
a memory network so that the memory
contains all we keep here it doesn't
quite work right so so this is a need
for new ideas there okay but it's not
the only form of reasoning so there's
another form of reasoning which is true
which is very classical so in
some types of AI and it's based on let's
call it energy minimization okay so you
have some sort of objective some energy
function that represents the the the
quality or the negative quality okay
energy goes up when things get bad and
they get low when things get good so
let's say you you want to figure out you
know what gestures do I need to to do to
grab an object or walk out the door if
you have a good model of your own body a
good model of the environment using this
kind of energy minimization you can make
a you can make you can do planning and
it's in optimal control it's called it's
called Marie put model predictive
control you have a model of what's gonna
happen in the world as consequence for
your actions and that allows you to buy
energy minimization figure out the
sequence of action that optimizes a
particular objective function which
measures you know minimize the number of
times you're gonna hit something and the
energy gonna spend doing the gesture and
etc so so that's performer reasoning
planning is a form of reasoning and
perhaps what led to the ability of
humans to reason is the fact that or you
know species you know that appear before
us had to do some sort of planning to be
able to hunt and survive and survive the
winter in particular and so you know
it's the same capacity that you need to
have so in your intuition is if you look
at expert systems in encoding knowledge
as logic systems as graphs in this kind
of way is not a useful way to think
about knowledge graphs are your brittle
or logic representation so basically you
know variables that that have values and
constraint between them that are
represented by rules as well too rigid
and too brittle right so one of the you
know some of the early efforts in that
respect were were to put probabilities
on them so a rule you know you know if
you have this in that symptom you know
you have this disease with that
probability and you should
describe that antibiotic with that
probability right this my sin system
from the for the 70s and that that's
what that branch of AI led to you know
busy networks in graphical models and
causal inference and vibrational you
know method so so there there is I mean
certainly a lot of interesting work
going on in this area the main issue
with this is is knowledge acquisition
how do you reduce a bunch of data to
graph of this type near relies on the
expert and a human being to encode at
add knowledge and that's essentially
impractical yeah the question the second
question is do you want to represent
knowledge symbols and you want to
manipulate them with logic and again
that's incomparable we're learning so
one suggestion with geoff hinton has
been advocating for many decades is
replace symbols by vectors think of it
as pattern of activities in a bunch of
neurons or units or whatever you wanna
call them and replace logic by
continuous functions okay
and that becomes now compatible there's
a very good set of ideas by region in a
paper about 10 years ago by leon go to
on who is here at face book the title of
the paper is for machine learning to
machine reasoning and his idea is that
learning learning system should be able
to manipulate objects that are in the
same space in a space and then put the
result back in the same space so is this
idea of working memory basically and
it's a very enlightening and in the
sense that might learn something like
the simple expert systems
I mean it's with you can learn basic
logic operations there yeah quite
possibly yeah this is a big debate on
sort of how much prior structure you
have to put in for this kind of stuff to
emerge that's the debate I have with
Gary Marcus and people like that yeah
yeah so and the other person so I just
talked to judea pearl mm-hmm well you
mentioned causal inference world
his worry is that the current knew all
networks are not able to learn what
causes what causal inference between
things so I think I think he's right and
wrong about this if he's talking about
the sort of classic type of neural nets
people also didn't worry too much about
this but there's a lot of people now
working on causal inference and there's
a paper that just came out last week by
Leon Mbutu among others develop his path
and push for other people exactly on
that problem of how do you kind of you
know get a neural net to sort of pay
attention to real causal relationships
which may also solve issues of bias in
data and things like this so I'd like to
read that paper because that ultimately
the challenges also seems to fall back
on the human expert to ultimately decide
causality between things people are not
very good at its direction causality
first of all so first of all you talk to
a physicist and physicists actually
don't believe in causality because look
at the all the busy clause or
microphysics are time reversible so
there is no causality the arrow of time
is not right yeah it's it's as soon as
you start looking at macroscopic systems
where there is unpredictable randomness
where there is clearly an arrow of time
but it's a big mystery in physics
actually well how that emerges is that
emergent or is it part of the
fundamental fabric of reality yeah or is
it bias of intelligent systems that you
know because of the second law of
thermodynamics we perceive a particular
arrow of time but in fact it's kind of
arbitrary right so yeah physicists
mathematicians they don't care about I
mean the math doesn't care about the
flow of time well certainly certainly
macro physics doesn't people themselves
are not very good at establishing causal
causal relationships if you ask is I
think it was in one of Seymour Papert
spoken on like children learning you
know he studied with Jean Piaget you
know he's the guy who co-authored the
book perceptron with Marvin Minsky that
kind of killed the first wave
but but he was actually a learning
person he in the sense of studying
learning in humans and machines that's
what he got interested in for scepter on
and he wrote that if you ask a little
kid about what is the cause of the wind
a lot of kids will say they will think
for a while and they'll say oh it's the
the branches in the trees they move and
that creates wind right so they get the
causal relationship backwards and it's
because their understanding of the world
and intuitive physics is not that great
right I mean these are like you know
four or five year old kids you know it
gets better and then you understand that
this it can't be right but there are
many things which we can because of our
common sense understanding of things
what people call common sense yeah and
we understanding of physics we can
there's a lot of stuff that we can
figure out causality even with diseases
we can figure out what's not causing
what often there's a lot of mystery of
course but the idea is that you should
be able to encode that into systems it
seems unlikely to be able to figure that
out themselves well whenever we can do
intervention but you know all of
humanity has been completely deluded for
millennia probably since existence about
a very very wrong causal relationship
where whatever you can explain you
attributed to you know some deity some
divinity right and that's a cop-out
that's the way of saying like I don't
know the cause so you know God did it
right so you mentioned Marvin Minsky and
the irony of you know maybe causing the
first day I winter you were there in the
90s you're there in the 80s of course in
the 90s what do you think people lost
faith and deep learning in the 90s and
found it again a decade later over a
decade later yeah it wasn't called
dethroning yeah it was just called
neural nets you know
yeah they lost interests I mean I think
I would put that around 1995 at least
the machine learning community there was
always a neural net community but it
became
disconnected from sort of ministry
machine owning if you want
there were it was basically electrical
engineering that kept at it and computer
science just gave up give up on neural
nets I don't I don't know you know I was
too close to it to really sort of
analyze it with sort of a unbiased eye
if you want but I would I would I would
would make a few guesses so the first
one is at the time neural nets were it
was very hard to make them work in the
sense that you would you know implement
back prop in your favorite language and
that favorite language was not Python it
was not MATLAB it was not any of those
things cuz they didn't exist right you
had to write it in Fortran or C or
something like this right so you would
experiment with it you would probably
make some very basic mistakes like you
know badly initialize your weights make
the network too small because you read
in the textbook you know you don't want
too many parameters right and of course
you know and you would train on x4
because you didn't have any other data
set to try it on and of course you know
it works half the time so we'd say you
give up also 22 the batch gradient which
you know isn't it sufficient so there's
a lot of bag of tricks that you had to
know to make those things work or you
had to reinvent and a lot of people just
didn't and they just couldn't make it
work so that's one thing the investment
in software platform to be able to kind
of you know display things figure out
why things don't work and I get a good
intuition for how to get them to work
have enough flexibility so you can
create you know network architectures
well completion ads and stuff like that
it was hard yeah when you had to write
everything from scratch and again you
didn't have any Python or MATLAB or
anything right so what I read that sorry
to interrupt but I read he wrote in in
Lisp the first versions of Lynette
accomplished in your networks which by
the way one of my favorite languages
that's how I knew you were legit the
Turing Award whatever this would be
programmed and list that's still my
favorite language but it's not that we
programmed in Lisp it's that we had to
write or this printer printer okay cuz
it's not that's right that's one that
existed so
we wrote a lisp interpreter that we
hooked up to you know back in library
that we wrote also for neural net
competition and then after a few years
around 1991 we invented this idea of
basically having modules that know how
to forward propagate and back propagate
gradients and then interconnecting those
modules in a graph loom but who had made
proposals on this about this in the late
80s and were able to implement this
using all this system eventually we
wanted to use that system to make build
production code for character
recognition at Bell Labs so we actually
wrote a compiler for that disp
interpreter so that Christy Martin who
is now Microsoft kind of did the bulk of
it with Leone and me and and so we could
write our system in lisp and then
compiled to seee and then we'll have a
self-contained complete system that
could kind of do the entire thing
neither Python or turn pro can do this
today yeah okay it's coming yeah I mean
there's something like that in
Whitehorse called you know tor script
and so you know we had to write or Lisp
interpreter which retinol is compiler
way to invest a huge amount of effort to
do this and not everybody if you don't
completely believe in the concept
you're not going to invest the time to
do this right now at the time also you
know it were today this would turn into
torture by torture and so for whatever
we put it in open-source everybody would
use it and you know realize it's good
back before 1995 working at AT&T there's
no way the lawyers would let you release
anything in open source of this nature
and so we could not distribute our code
really and at that point and sorry to go
on a million tangents but on that point
I also read that there was some almost
pad like a patent on convolution your
network yes it was labs so that first of
all I mean just to actually that ran out
the thankfully 8007 in 2007 that what
look can we can we just talk about that
first I know you're a facebook but
you're also done why you and and what
does it mean
patent ideas like these software ideas
essentially or what are mathematical
ideas or what are they okay so they're
not mathematical idea so there are you
know algorithms and there was a period
where the US Patent Office would allow
the patent of software as long as it was
embodied the Europeans are very
different they don't they don't quite
accept that they have a different
concept but you know I don't I know no I
mean I never actually strongly believed
in this but I don't believe in this kind
of patent Facebook basically doesn't
believe in this kind of pattern
Google Files patterns because they've
been burned with Apple and so now they
do this for defensive purpose but
usually they say we're not going to see
you if you infringe Facebook has a
similar policy they say you know we file
pattern on certain things for defensive
purpose we're not going to see you if
you infringe unless you sue us
so the the industry does not believe in
in patterns they are there because of
you know the legal landscape and and and
various things but but I don't really
believe in patterns for this kind of
stuff yes so that's that's a great thing
so I tell you a war story yeah you so
what happens was the the first the first
pattern of a condition that was about
kind of the early version Congress on
that that didn't have separate pudding
layers it had the conditional layers
which tried more than one if you want
right and then there was a second one on
commercial nets with separate pudding
layers
train with back probably in 89 and 1992
something like this at the time the life
life of a pattern was 17 years so here's
what happened over the next few years is
that we started developing character
recognition technology around commercial
Nets
and in 1994 a check reading system was
deployed in ATM machines in 1995 it was
for a large check reading machines in
back offices etc and those systems were
developed by an engineering group that
we were collaborating with AT&T and they
were commercialized by NCR which at the
time was a subsidiary of AT&T now it
ain't he split up in 1996
99 in 1996 and the lawyers just looked
at all the patterns and they distributed
the patterns among the various companies
they gave the the commercial net pattern
to NCR because they were actually
selling products that used it but nobody
I didn't see are at any idea where they
come from that was yeah okay so between
1996 and 2007
there's a whole period until 2002 I
didn't actually work on machine on your
couch on that I resumed working on this
around 2002 and between 2002 and 2007 I
was working on them crossing my finger
that nobody and NCR would notice nobody
noticed yeah and I and I hope that this
kind of somewhat as you said lawyers
decide relative openness of the
community now will continue
it accelerates the entire progress of
the industry and you know the problems
that Facebook and Google and others are
facing today is not whether Facebook or
Google or Microsoft or IBM or whoever is
ahead of the other it's that we don't
have the technology to build the things
we want to build we only build
intelligent virtual systems that have
common sense we don't have a monopoly on
good ideas for this we don't believe
with you maybe others do believe they do
but we don't okay if a start-up tells
you they have the secret to you know
human level intelligence and common
sense don't believe them they don't and
it's going to take the entire work of
the world research community for a while
to get to the point where you can go off
and in each of the company is going to
start to build things on this we're not
there yet
it's absolutely in this this calls to
the the gap between the space of ideas
and the rigorous testing of those ideas
of practical application that you often
speak to you've written advice saying
don't get fooled by people who claim to
have a solution to artificial general
intelligence who claim to have an AI
system that work just like the human
brain or who claim to have figured out
how the brain works ask them what the
error rate they get on em 'no store
imagenet this is a little dated by the
way that mean five years who's counting
okay but i think your opinion it's the
Amna stand imagenet yes may be data
there may be new benchmarks right but i
think that philosophy is one you still
and and somewhat hold that benchmarks
and the practical testing the practical
application is where you really get to
test the ideas well it may not be
completely practical like for example
you know it could be a toy data set
but it has to be some sort of task that
the community as a whole has accepted as
some sort of standard you know kind of
benchmark if you want it doesn't need to
be real so for example many years ago
here at fair people you know chosen
Western art one born and a few others
proposed the the babbitt asks which were
kind of a toy problem to test the
ability of machines to reason actually
to access working memory and things like
this and it was very useful even though
it wasn't a real task amnesties kind of
halfway a real task so you know toy
problems can be very useful it's just
that i was really struck by the fact
that a lot of people particularly our
people with money to invest would be
fooled by people telling them oh we have
you know the algorithm of the cortex and
you should give us 50 million yes
absolutely so there's a lot of people
who who tried to take advantage of the
hype for business reasons and so on but
let me sort of talk to this idea that
new ideas the ideas that push the field
forward
may not yet have a benchmark or it may
be very difficult to establish a
benchmark I agree that's part of the
process establishing benchmarks is part
of the process so what are your thoughts
about so we have these benchmarks on
around stuff we can do with images from
classification to captioning to just
every kind of information can pull off
from images and the surface level
there's audio datasets there's some
video what can we start natural language
what kind of stuff what kind of
benchmarks do you see they start
creeping on to more something like
intelligence like reasoning like maybe
you don't like the term but AGI echoes
of that kind of yeah sort of elation a
lot of people are working on interactive
environments in which you can you can
train and test intelligent systems so so
there for example you know it's the
classical paradigm of supervised running
is that you you have a data set you
partition it into a training site
validation set test set and there's a
clear protocol right but what if the
that assumes that this
apples are statistically independent you
can exchange them the order in which you
see them doesn't shouldn't matter you
know things like that but what if the
answer you give determines the next
sample you see which is the case for
example in robotics right you robot does
something and then it gets exposed to a
new room and depending on where it goes
the room would be different so that's
the decrease the exploration problem
the what if the samples so that creates
also a dependency between samples right
you you if you move if you can only move
it in in space the next sample you're
gonna see is going to be probably in the
same building most likely so so so the
all the assumptions about the validity
of this training set test set a potus's
break whatever a machine can take an
action that has an influence in the in
the world and it's what is going to see
so people are setting up artificial
environments where what that takes place
right the robot runs around a 3d model
of a house and can interact with objects
and things like this how you do robotics
by simulation you have those you know
opening a gym type thing or mu Joko kind
of simulated robots and you have games
you know things like that so that that's
where the field is going really this
kind of environment now back to the
question of a GI like I don't like the
term a GI because it implies that human
intelligence is general and human
intelligence is nothing like general
it's very very specialized we think it's
general we'd like to think of ourselves
as having your own science we don't
we're very specialized we're only
slightly more general than why does it
feel general so you kind of the term
general I think what's impressive about
humans is ability to learn as we were
talking about learning to learn in just
so many different domains is perhaps not
arbitrarily general but just you can
learn in many domains and integrate that
knowledge somehow okay that knowledge
persists so let me take a very specific
example yes it's not an example it's
more like a a quasi mathematical
demonstration so you have about 1
million fibers coming out of
one of your eyes okay two million total
but let's let's talk about just one of
them it's 1 million nerve fibers your
optical nerve let's imagine that they
are binary so they can be active or
inactive right so the input to your
visual cortex is 1 million bits
now they connected to your brain in a
particular way on your brain has
connections that are kind of a little
bit like accomplish on that they're kind
of local you know in space and things
like this I imagine I play a trick on
you it's a pretty nasty trick I admit I
I cut your optical nerve and I put a
device that makes a random perturbation
of a permutation of all the nerve fibers
so now what comes to your to your brain
is a fixed but random permutation of all
the pixels there's no way in hell that
your visual cortex even if I do this to
you in infancy will actually learn
vision to the same level of quality that
you can got it and you're saying there's
no way you ever learn that no because
now two pixels that on your body in the
world will end up in very different
places in your visual cortex and your
neurons there have no connections with
each other because they only connect it
locally so this whole our entire the
hardware is built in many ways to
support the locality of the real world
yeah yes that's specialization yep okay
it's still now really damn impressive so
it's not perfect generalization I even
closed no no it's it's it's it's not
that it's not even close it's not at all
yes it's socialize so how many boolean
functions so let's imagine you want to
train your visual system to you know
recognize particular patterns of those 1
million bits ok so that's a boolean
function right either the pattern is
here or not here this is a to to a
classification with 1 million binary
inputs
how many such boolean functions are
there okay if you have 2 to the 1
million combinations of inputs for each
of those you have an output bit and so
you have 2 to the 2 to the 1 million
boolean functions of this type okay
which is an unimaginably large number
how many of those functions can actually
be computed by your visual cortex and
the answer is a tiny tiny tiny tiny tiny
tiny sliver like an enormous little tiny
sliver yeah yeah so we are ridiculously
specialized you know okay but okay
that's an argument against the word
general I think there's there's a I
there's I agree with your intuition but
I'm not sure it's it seems the breath
the the brain is impressively capable of
adjusting to things so it's because we
can't imagine tasks that are outside of
our comprehension right we think we
think we are general because we're
general of all the things that we can
apprehend so yeah but there is a huge
world out there of things that we have
no idea
we call that heat by the way heat heat
so at least physicists call that heat or
they call it entropy which is kokkonen
you have a thing full of gas right call
system for gas right goes on a coast it
has you know pressure it has temperature
has you know and you can write the
equations PV equal NRT you know things
like that right when you reduce a volume
the temperature goes up the pressure
goes up you know things like that right
for perfect gas at least those are the
things you can know about that system
and it's a tiny tiny number of bits
compared to the complete information of
the state of the entire system because
the state when HR system will give you
the position and momentum of every every
molecule of the gas and what you don't
know about it is the entropy and you
interpret it as heat the energy
containing that thing is is what we call
heat now it's very possible that in fact
there is some very strong structure in
how those molecules are moving is just
that they are in a way that we are just
not wired to perceive they are ignorant
to it and there's in your infinite
amount of things we're not wired to
perceive any right that's a nice way to
put it
well general to all the things we can
imagine which is a very tiny a subset of
all things that are possible it was like
coma growth complexity or the coma was
charged in some one of complexity you
know every bit string or every integer
is random except for all the ones that
you can actually write down yeah okay so
beautifully put but you know so we can
just call it artificial intelligence we
don't need to have a general whatever
novel
human of all Nutella transmissible oh
you know you'll start anytime you touch
human it gets it gets interesting
because you know it's just because we
attach ourselves to human and it's
difficult to define with human
intelligences yeah
nevertheless my definition is maybe damn
impressive intelligence ok damn
impressive demonstration of intelligence
whatever and so on that topic most
successes in deep learning have been in
supervised learning what is your view on
unsupervised learning is there a hope to
reduce involvement of human input and
still have successful systems that are
have practically used yeah I mean
there's definitely a hope is it's more
than a hope actually it's it's you know
mounting evidence for it and that's
basically or I do like the only thing
I'm interested in at the moment is
I call it self supervised running not
unsupervised cuz unsupervised running is
a loaded term people who know something
about machine learning you know tell us
how you doing clustering or PCA yeah
she's nice and the way public we know
when you say enterprise only oh my god
you know machines are gonna learn by
themselves and without supervision you
know there's the parents yeah so so I
could sell supervised learning because
in fact the underlying algorithms that I
use are the same algorithms as the
supervised learning algorithms except
that what we trained them to do is not
predict a particular set of variables
like the category of an image and and
not to predict a set of variables that
have been provided by human labelers but
what you're trying to machine to do is
basically reconstruct a piece of its
input that it's being this being masked
masked out essentially you can think of
it this way right so show a piece of a
video to a machine and ask it to predict
what's gonna happen next and of course
after a while you can show what what
happens and the machine will kind of
train itself to do better at that task
you can do like all the latest most
successful models the natural language
processing use cell supervised running
you know sort of bird style systems for
example right you show it a window of a
thousand words on a test corpus you take
out 15% of the words and then you train
a machine to predict the words that are
missing that's out supervised running
it's not predicting the future it's just
you know predicting things in middle but
you could have you predict the future
that's what language models do so you
construct it so in an unsupervised way
you construct a model of language do you
think or video or the physical world or
whatever right how far do you think that
can take us do you think very far it
understands anything to some level it
has you know a shallow understanding of
of text but it needs to I mean to have
kind of true human level intelligence I
think you need to ground language in
reality so some people are attempting to
do this right having systems that can I
have some visual representation of what
what is being talked about which is one
reason you need interactive environments
actually this is like a huge technical
problem that is not solved and that
explains why such super versioning works
in the context of natural language that
does not work in the context on at least
not well in the context of image
recognition and video although it's
making progress quickly and the reason
that reason is the fact that it's much
easier to represent uncertainty in the
prediction you know context of natural
language than it is in the context of
things like video and images so for
example if I ask you to predict what
words are missing you know 15 percent of
the words that I've taken out the
possibility is small that means small
right there is 100,000 words in the in
the lexicon and what the Machine spits
out is a big probability vector right
it's a bunch of numbers between 0 & 1
that's 1 to 1 and we know how to do how
to do this with computers so they are
representing uncertainty in the
prediction is relatively easy and that's
in my opinion why those techniques work
for NLP for images if you ask if you
block a piece of an image and you as a
system reconstruct that piece of the
image
there are many possible answers there
are all perfectly legit right and how do
you represent that the set of possible
answers
you can't train a system to make one
prediction you can train a neural net to
say here it is that's the image because
it's there's a whole set of things that
are compatible with it so how do you get
the machine to represent not a single
output but all set of outputs and you
know similarly with video prediction
there's a lot of things that can happen
in the future video you're looking at me
right now I'm not moving my head very
much but you know I might you know what
turn my my head to the left or to the
right right if you don't have a system
that can predict this and you train it
with least Square to kind of minimize
the error with the prediction and what
I'm doing
what you get is a blurry image of myself
in all possible future positions that I
might be in which is not a good
prediction but so there might be other
ways to do the self supervision right
for visual scenes like what if i I mean
if I knew I wouldn't tell you
publish it first I don't know I know
there might be so I mean these are kind
of there might be artificial ways of
like self play in games the way you can
simulate part of the environment you can
oh that doesn't solve the problem it's
just a way of generating data but
because you have more of a country might
mean you can control yeah it's a way to
generate data and that's right and
because you can do huge amounts of data
generation that doesn't you write this
well it's it's a creeps up on the
problem from the side of data and you
don't think that's the right way to it
doesn't solve this problem of handling
uncertainty in the world right so if you
if you have a machine learn a predictive
model of the world in a game that is
deterministic or quasi deterministic
it's easy right just you know give a few
frames of the game to a combat put a
bunch of layers and then half the game
generates the next few frames and and if
the game is deterministic it works fine
and that includes you know feeding the
system with the action that your little
character is going to take
the problem comes from the fact that the
real world and certain most games are
not entirely predictable that's what
they're you get those blurry predictions
and you can't do planning with very
predictions all right so if you have a
perfect model of the world you can in
your head run this model with a
hypothesis for a sequence of actions and
you're going to predict the outcome of
that sequence of actions but if your
model is imperfect how can you plan yeah
it quickly explodes what are your
thoughts on the extension of this which
topic I'm super excited about it's
connected to something you're talking
about in terms of robotics is active
learning so as opposed to sort of
unemployed and supervisors self
supervised learning you ask the system
for human help right for selecting parts
you want annotated next so if you talk
about a robot exploring a space or a
baby exploring a space or a system
exploring a data set every once in a
while asking for human input you see
value in that kind of work I don't see
transformative value it's going to make
things that we can already do more
efficient or they will learn slightly
more efficiently but it's not going to
make machines sort of significantly more
intelligent I think and I and by the way
there is no opposition there is no
conflict between self supervisor on
reinforcement learning and supervisor on
your imitation learning or active
learning
I see sub super wrestling as a as a
preliminary to all of the above yes so
the example I use very often is how is
it that so if you use
enforcement running deep enforcement
running if you want the best methods
today was so-called model free
enforcement training to learn to play
Atari games take about 80 hours of
training to reach the level that any
human can reach in about 15 minutes they
get better than humans but it takes a
long time alpha star okay the you know
are your videos and his team's the
system to play to to play Starcraft
plays you know a single map a single
type of player and
which
better than human level is about the
equivalent of 200 years of training
playing against itself it's 200 years
right it's not something that no no
human can could every I'm not sure what
it doesn't take away from that okay now
take those algorithms the best our
algorithms we have today to train a car
to drive itself it would probably have
to drive millions of hours you will have
to kill thousands of pedestrians it will
have to run into thousands of trees it
will have to run off cliffs and you had
to run the cliff multiple times before
it figures out it's a bad idea first of
all
yeah and second of all the figures that
had not to do it and so I mean this type
of running obviously does not reflect
the kind of running that animals and
humans do there is something missing
that's really really important there and
my apart is is which have been
advocating for like five years now is
that we have predictive models of the
world that include the ability to
predict under uncertainty and what
allows us to not run off a cliff when we
learn to drive most of us can learn to
drive in about 20 or 30 hours of
training without ever crashing causing
any accident if we drive next to a cliff
we know that if we turn the wheel to the
right the car is going to run off the
cliff and nothing good is gonna come out
of this because we have a pretty good
model of intuitive physics that tells us
you know the car is gonna fall we know
we know about gravity babies run this
around the age of eight or nine months
that objects don't float they fall and
you know we have a pretty good idea of
the effect of turning the wheel

Resume

Berikut adalah rangkuman komprehensif dan terstruktur dari transkrip wawancara tersebut.

***

# Membedah Masa Depan AI: Wawancara Eksklusif Bersama Yann LeCun

### Inti Sari (Executive Summary)
Video ini membahas wawancara mendalam dengan Yann LeCun, salah satu bapak pendiri Deep Learning dan Chief AI Scientist di Meta (Facebook), mengenai evolusi dan masa depan kecerdasan buatan. LeCun menjelaskan mengapa pembelajaran (learning) adalah inti dari kecerdasan, keterbatasan *Deep Learning* saat ini dalam hal penalaran dan akal sehat (*common sense*), serta mengapa pendekatan *Reinforcement Learning* saat ini belum cukup untuk mencapai AGI (*Artificial General Intelligence*). Beliau menegaskan bahwa kunci kecererdasan masa depan terletak pada kemampuan mesin untuk mempelajari model dunia (*world models*) melalui *self-supervised learning*.

---

### Poin-Poin Kunci (Key Takeaways)
*   **Intelligence = Learning:** Kecerdasan tidak dapat diprogram secara manual; mesin harus mencapai kecerdasan melalui proses pembelajaran, dan penalaran adalah konsekuensi dari pembelajaran tersebut.
*   **Keterbatasan Arsitektur Saat Ini:** Jaringan saraf (*neural networks*) modern membutuhkan memori kerja (*working memory*) dan operasi berulang (*recurrent*) untuk menalar, sesuatu yang saat ini terbatas pada arsitektur *Transformer*.
*   **Self-Supervised Learning:** Masa depan AI bergantung pada pembelajaran yang tidak memerlukan label manusia (*self-supervised*), di mana mesin memprediksi bagian yang hilang dari input untuk memahami dunia.
*   **Kritik terhadap RL Murni:** *Model-free Reinforcement Learning* (seperti yang digunakan pada game Atari atau StarCraft) terlalu tidak efisien untuk diterapkan di dunia nyata (seperti mobil otonom) karena membutuhkan jumlah percobaan yang tidak masuk akal.
*   **Pentingnya *Grounding*:** Kecerdasan membutuhkan pemahaman tentang realitas fisik (*grounding*), bukan sekadar pemrosesan bahasa atau memiliki tubuh fisik (*embodiment*).
*   **Skeptisisme AGI:** Klaim dari startup yang mengaku memiliki "rahasia" kecerdasan tingkat manusia harus dipandang skeptis; pencapaian AGI masih membutuhkan waktu dan kolaborasi global.

---

### Rincian Materi (Detailed Breakdown)

#### 1. Profil Narasumber dan Filosofi Kecerdasan
*   **Tentang Yann LeCun:** Profesor di NYU, Chief AI Scientist di Meta, penerima Turing Award, dan bapak pendiri *Convolutional Neural Networks* (CNN) yang digunakan dalam pengenalan karakter (OCR) dan dataset MNIST.
*   **Inti Kecerdasan:** Kecerdasan adalah hal yang tidak terpisahkan dari pembelajaran (*learning*). Mencoba memprogram kecerdasan secara eksplisit adalah jalan buntu.
*   **Pembelajaran vs Logika:** Penalaran (*reasoning*) adalah hasil dari pembelajaran. Tantangan terbesar adalah membuat penalaran kompatibel dengan pembelajaran berbasis gradien (*gradient-based learning*). Matematika diskrit (logika) dianggap tidak kompatibel dengan pembelajaran modern yang cenderung "berantakan" (*science of sloppiness*).

#### 2. Arsitektur, Memori, dan Penalaran
*   **Kebutuhan Memori:** Agar bisa menalar, mesin membutuhkan memori kerja untuk menyimpan informasi faktual dan episodik, mirip dengan *hippocampus* pada otak manusia.
*   **Kritik pada Transformer:** Arsitektur *Transformer* (seperti pada GPT) memiliki lapisan tetap dan langkah pemrosesan yang terbatas. Untuk penalaran yang lebih dalam, diperlukan operasi berulang (*recurrent*) yang dapat mengubah pengetahuan secara iteratif.
*   **Jaringan Memori:** Konsep *Memory Networks* dan *Turing Machines* disebut sebagai langkah menuju mesin yang dapat mengakses memori dan mengolah informasi secara berulang, namun masih membutuhkan ide baru untuk skalabilitas.

#### 3. Sejarah AI, Inferensi Kausal, dan Kegagalan Masa Lalu
*   **Inferensi Kausal:** Yann LeCun menanggapi kritik Judea Pearl bahwa jaringan saraf tidak bisa belajar sebab-akibat. LeCun berpendapat bahwa penelitian saat ini sedang bergerak ke arah itu, meskipun fisika kuantum (yang dapat dibalik waktunya) membuat konsep sebab-akibat menjadi rumit.
*   **Musim Dingin AI (AI Winter):** Pada tahun 1995, minat pada jaringan saraf turun drastis karena kegagalan teknis: inisialisasi bobot yang buruk, jaringan yang terlalu kecil, dan kurangnya perangkat lunak pendukung (saat itu masih menggunakan Fortran/C).
*   **Debat Struktur:** Terdapat perdebatan klasik tentang seberapa banyak struktur pra-bangun (*prior structure*) yang harus dimasukkan ke dalam jaringan saraf.

#### 4. Paten, Komersialisasi, dan Hype AGI
*   **Kisah Paten CNN:** LeCun menceritakan pengalamannya mematenkan teknologi CNN untuk pembaca cek di ATM pada tahun 1994-1995 melalui kolaborasi AT&T dan NCR. Paten tersebut tidak diberlakukan secara agresif, yang memungkinkan teknologi berkembang lebih cepat.
*   **Skeptisisme Klaim AGI:** LeCun memperingatkan untuk tidak percaya pada startup yang mengklaim memiliki rahasia kecerdasan tingkat manusia atau "akal sehat" tanpa bukti *benchmark* yang valid. Tidak ada monopoli pada ide bagus dalam riset AI.

#### 5. Spesialisasi Biologis dan Definisi Kecerdasan "Umum"
*   **Manusia itu Spesialis:** Sistem visual manusia dirancang untuk memproses piksel lokal, bukan fungsi boolean acak. Kita adalah spesialis yang sangat hebat dalam lingkungan kita, sehingga terlihat "umum".
*   **Entropi dan Informasi:** Analogi "gas dalam wadah" digunakan untuk menjelaskan bahwa kecerdasan kita hanya beroperasi pada subset kecil dari semua kemungkinan informasi (kita tidak melacak setiap molekul gas).
*   **Self-Supervised Learning (SSL):** LeCun lebih menyukai istilah ini daripada *unsupervised learning*. SSL menggunakan algoritma yang sama dengan pembelajaran terawasi, tetapi melatih model untuk merekonstruksi bagian input yang hilang (seperti BERT dalam teks atau prediksi frame video), tanpa label manusia.

#### 6. Tantangan Prediksi dan Ketidakefisienan Reinforcement Learning
*   **Masalah Ketidakpastian:** Memprediksi masa depan di dunia nyata menghasilkan gambar "kabur" karena banyaknya kemungkinan. Model prediktif yang deterministik cocok untuk game, tetapi gagal di dunia nyata.
*   **Kritik RL Tanpa Model:** Metode *Deep Reinforcement Learning* (model-free) membutuhkan waktu sangat lama (setara 200 tahun untuk StarCraft) untuk belajar. Ini tidak efisien untuk aplikasi seperti mobil otonom yang tidak boleh menabrak pohon ribuan kali untuk belajar.
*   **Solusi: Model-Based RL:** Manusia belajar mengemudi dalam 20-30 jam karena kita memiliki model fisika intuitif dan akal sehat. Kita perlu mesin yang mempelajari model dunia (*world model*) terlebih dahulu sebelum merencanakan tindakan.

#### 7. Mobil Otonom dan Masa Depan Arsitektur AI
*   **Pendekatan Elon Musk:** LeCun setuju bahwa *Deep Learning* adalah kunci mobil otonom, namun evolusinya akan bergerak dari sistem buatan tangan (*hand-built*) menuju sistem yang sepenuhnya berbasis pembelajaran.
*   **Batasan Saat Ini:** Sistem otonom saat ini (seperti Waymo) masih terlalu bergantung pada peta 3D dan sensor mahal.

## Kesimpulan & Pesan Penutup
Wawancara ini menegaskan bahwa masa depan AI tidak hanya bergantung pada pemrosesan data besar, tetapi pada kemampuan mesin untuk mempelajari model dunia melalui *self-supervised learning*. Yann LeCun mengingatkan kita untuk tetap skeptis terhadap klaim AGI yang berlebihan dan fokus pada pengembangan arsitektur yang mampu menalar serta memahami realitas fisik. Kolaborasi global dan penelitian yang mendalam masih sangat dibutuhkan untuk mewujudkan kecerdasan buatan yang benar-benar umum dan bermanfaat bagi umat manusia.

Read

file updated 2026-02-13 13:23:53 UTC