Transcript

PUAdj3w3wO4 • François Chollet: Measures of Intelligence | Lex Fridman Podcast #120
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0440_PUAdj3w3wO4.txt
Back Raw
Kind: captions
Language: en
the following is a conversation with
francois chalet
his second time in the podcast he's both
a world-class engineer
and a philosopher in the realm of deep
learning and artificial intelligence
this time we talk a lot about his paper
titled on the measure of intelligence
that discusses how we might define and
measure
general intelligence in our computing
machinery
quick summary of the sponsors babel
masterclass
and cash app click the sponsor links in
the description to get a discount and
to support this podcast as a side note
let me say that the serious
rigorous scientific study of artificial
general intelligence
is a rare thing the mainstream machine
learning community works on
very narrow ai with very narrow
benchmarks
this is very good for incremental and
sometimes
big incremental progress on the other
hand
the outside the mainstream renegade you
could say
agi community works on approaches that
verge on the philosophical and even the
literary
without big public benchmarks walking
the line between the two worlds is a
rare breed
but it doesn't have to be i ran the agi
series at mit as an attempt to inspire
more people to walk this line
deep mind and open ai for time and still
on occasion walk this line francois
chole
does as well i hope to also
it's a beautiful dream to work towards
and to make real
one day if you enjoy this thing
subscribe on youtube
review it with five stars on apple
podcast follow on spotify
support on patreon or connect with me on
twitter at lex friedman
as usual i'll do a few minutes of ads
now and no ads in the middle
i try to make these interesting but i
give you time stamps so
you can skip but still please do check
out the sponsors by clicking the links
in the description
it's the best way to support this
podcast
this show is sponsored by babel an app
and website that gets you speaking in a
new language within weeks
go to babble.com and use colex to get
three months free
they offer 14 languages including
spanish french italian german
and yes russian daily lessons are 10 to
15 minutes
super easy effective designed by over
100 language experts
let me read a few lines from the russian
poem
by alexander bloch that you'll start to
understand if you sign up to babble
no it's
now i say that you'll start to
understand this poem
because russian starts with a language
and
ends with the vodka now the latter part
is definitely not endorsed
or provided by babel it will probably
lose me this sponsorship
although it hasn't yet but once you
graduate with babel
you can roll my advanced course of late
night russian conversation over vodka
no app for that yet so get started by
visiting babel.com
and use codelex to get three months free
this show is also sponsored by
masterclass sign up
at masterclass.com lex to get a discount
and to support this podcast
when i first heard about masterclass i
thought it was too good to be true
i still think it's too good to be true
for 180
a year you get an all-access pass to
watch courses from
to list some of my favorites chris
hadfield on space exploration
hope to have him in this podcast one day
neil degrasse tyson on scientific
thinking communication
neil two will wright creator of simcity
and sims on game design carlos santana
on guitar
carrie casparov von chasse daniel
negrano and poker and many more
chris hadfield explaining how rockets
work and the experience of being
launched at the space alone is worth the
money
by the way you can watch it on basically
any device
once again sign up at masterclass.com
lex to get a discount and
to support this podcast this show
finally is presented by
cash app the number one finance app in
the app store
when you get it use code lex podcast
cash app lets you send money to friends
buy bitcoin and invest in the stock
market with as little as one dollar
since cash app allows you to send and
receive money digitally
let me mention a surprising fact related
to physical money
of all the currency in the world roughly
eight percent of it
is actually physical money the other 92
percent of the money only exists
digitally
and that's only going to increase so
again
if you get cash out from the app store
google play and use code
lex podcast you get ten bucks and cash
app will also donate ten dollars to
first
an organization that is helping to
advance robotics and stem education
for young people around the world and
now here's my conversation
with francois chalet what philosophers
thinkers or ideas had a big impact on
you growing up
and today so one
author that had a big impact on me when
i read
these books as a teenager with jean
pierre who is
a swiss psychologist is considered
to be the father of developmental
psychology and he has a large body of
work about
um basically how intelligence develops
uh in children and so it's really old
work like most of it is from the 1930s
1940s
so it's not quite up to date it's
actually superseded by many
neural developments in developmental
psychology but to me it was
it was very uh very interesting very
striking and actually shaped
the early ways in which i started
thinking about the mind
and development of intelligence as a
teenager his actual ideas or the way he
thought about it or just the fact that
you could think about the developing
mind at all
i guess both jean-pierre is the author
that's reintroduced me to the
notion that intelligence and the mind is
something that you construct
through throughout your life and that
you the children
uh construct it in stages and i thought
that was a very interesting idea which
is you know of course very relevant
uh to ai to building artificial minds
another book that i read around the same
time that had a big impact on me
uh and and there was actually a little
bit of overlap with john pierre as well
and i read it around the same
time is jeff hawkins on
intelligence which is a classic and he
has this vision
of the mind as a multi-scale hierarchy
of temporal prediction modules and these
ideas really resonated with me
like the the notion of a modular
hierarchy
um of you know potentially um
of compression functions or prediction
functions
i thought it was really really
interesting and it reshaped
uh the way it started thinking about how
to build
minds the hierarchical nature
the which aspect also he's a
neuroscientist so he was thinking yes
actual he's basically talking about how
our mind works yeah the notion that
cognition is prediction
was an idea that was kind of new to me
at the time and that i really loved at
the time
and yeah and the notion that yeah there
are multiple scales of processing
uh in the brain the hierarchy
yes this is before deep learning these
ideas of hierarchies
in here i've been around for a long time
even before on intelligence i mean
they've been around since the
1980s um and yeah that was before deep
learning but of course
i think these ideas really found their
practical implementation in deep
learning what about the memory side of
things
i think he's talking about knowledge
representation
do you think about memory a lot one way
you can think of neural networks
as a kind of memory you're memorizing
things
but it doesn't seem to be the kind of
memory that's in our
brains or it doesn't have the same rich
complexity long-term nature that's in
our brains
yes the brain is more for sparse access
memory so that you can actually retrieve
um very precisely like bits of your
experience
the retrieval aspect you can like
introspect
you can ask yourself questions again yes
you can program your own memory and
language is actually
the tool you used to do that i think
language is a kind of
operating system for the mind and use
language
well one of the uses of language is as
a query that you run over your own
memory use
words as keys to retrieve specific
experiences of basic concepts specific
starts
like language is the way you store
thoughts not just in writing in the in
the physical world but also in your own
mind
and it's also how you reach with them
like imagine if you didn't have language
then you would have to you would not
have really have a
self internally triggered uh way of
retrieving
past thoughts you would have to rely on
external experiences
for instance you you see a specific site
you smell specific smell and it brings
up memories
but you would naturally have a way to
deliberately deliberately access these
memories
without language well the interesting
thing you mentioned is you can also
program
the memory you can change it probably
with language yeah using language yes
well
let me ask you a chomsky question which
is like
first of all do you think language is
like fundamental
like uh there's turtles
what's at the bottom of the turtles they
don't go it can't be turtles all the way
down
is language at the bottom of cognition
of everything is like
language the fundamental
aspect of like what it means to be
a thinking thing no i don't think so
i think language you disagree with noam
chomsky yes
language is a layer on top of cognition
so
it is fundamental to cognition in the
sense that
to to use a computing metaphor i see
language as
the operating system uh of the brain of
the human mind
yeah and the operating system you know
is a layer on top of the computer
the computer exists before the operating
system but the operating system is how
you make it truly useful
and the operating system is most likely
windows not
not linux because it's uh language is
messy
yeah it's messy and it's uh it's um
pretty difficult to uh
uh inspect it introspect it how do you
think about language
like we use actually sort of human
interpretable language but is there
something like
a deeper that's closer to like
like logical type of statements um
like yeah what is the nature of
language do you think because there's
something deeper than like the syntactic
rules we construct is there something
that
doesn't require utterances or
writing or so on are you asking about
the possibility that there could exist
uh languages for thinking that are not
made of words
yeah yeah i think so i think so
uh the mind is layers right and language
is almost like the
the outermost the uppermost layer
um but before we think in words i think
we think
in in terms of emotion in space
and we think in terms of physical
actions
and i think a baby babies in particular
probably express his thoughts in terms
of
um the actions uh that they've seen of
that or that they can perform
and in terms of the in in terms of
motions of objects in their environment
before they start thinking in terms of
words it's amazing to
think about that as the building
blocks of language so like the kind of
actions and
ways the babies see the world as like
more fundamental than the
beautiful shakespearean language you
construct on top of it
and we we probably don't have any idea
what that looks like right
like what because it's important for
them trying to
engineer it into ai systems
i think visual analogies and motion
is a fundamental building block of the
mind and you
you actually see it reflected in
language like language is full of
special metaphors and when you think
about things
i consider myself very much as a visual
thinker
you you often express your thoughts
um by using things like uh visualizing
concepts um
in in 2d space or like you solve
problems by image
imagining yourself navigating a concept
space i don't know if you have this sort
of experience
you said visualizing concept space so
like so i certainly think about
i certainly met i certainly visualize
mathematical concepts
but you mean like in concept space
visually you're embedding ideas into
some
into a three-dimensional space you can
explore with your mind essentially
yeah 2d you're a flatlander
you're um okay no
i i i do not i always have to uh before
i jump from concept to concept i have to
put it back down on pape
and it has to be on paper i can only
travel
on 2d paper not inside my mind
you're able to move inside your mind but
even if you're writing
like a paper for instance don't you have
like a special representation of your
paper
like you you visualize where ideas lie
topologically in relationship to other
ideas
kind of like a subway map of the ideas
in your paper
yeah that's true i mean there there is
uh
in papers i don't know about you but
there feels like there's a destination
um there's a there's a key
idea that you want to arrive at and a
lot of it is in
in the fog and you're trying to kind of
it's almost like um
what's that called when um you do a path
planning search from both directions
from the start and from the end
but and then you find you do like
shortest path but like
uh you know in game playing you do this
with like a star
from both sides when you see where they
join
yeah so you kind of do at least for me i
think like
first of all just exploring from the
start from like uh
first principles what do i know uh what
can i
start proving from that right and then
from the destination
if i you start backtracking like
if if i want to show some kind of sets
of ideas
what would it take to show them and you
kind of backtrack but
yeah i don't think i'm doing all that in
my mind though like i'm putting it down
on paper
do you use mind maps to organize your
ideas yeah i like mind maps
let's get into this i've been so jealous
of people i haven't really tried it i've
been jealous of
people that seem to like they get like
this fire of passion in their eyes
because everything starts making sense
it's like uh tom cruise in the movie was
like moving stuff around
some of the most brilliant people i know
use mind maps i haven't tried
really can you explain what the hell a
mind map is
i guess mind map is a way to make
connected mess inside your mind
to just put it on paper so that you gain
more control over it
it's a way to organize things on paper
and as as kind of like a consequence for
organizing things on paper it start
being more organized inside inside your
own mind what what does that look like
you put like do you have an example like
what what
what do you what's the first thing you
write on paper what's the second thing
you write
i mean typically uh you you draw a mind
map to
organize the way you think about a topic
so you would start by
writing down like the the key concept
about that topic like you would write
intelligence or something and then you
would start adding
uh associative connections like what do
you think about when you think about
intelligence what do you think are the
key elements of intelligence so maybe
you would have language for instance
instead of motion
and so you would start drawing notes
with these things and then you would see
what do you think about when you think
about motion
and so on and you would go like that
like a tree it's a
tree or a tree mostly there's a graph to
like
a tree oh it's it's more of a graph than
a tree and
um and it's not limited to just you know
writing down
words you can also uh draw things
and it's not it's not supposed to be
purely hierarchical right
like you can um the point is that you
can start once once you start writing it
down you can start reorganizing it so
that it makes more sense so that it's
connected in a more effective way see
but i'm so
ocd that you just mentioned intelligence
and language emotion
i would start becoming paranoid that the
categorization isn't perfect
like that i'll become paralyzed
with the mind map that like this may not
be
so like the even though you're just
doing associative kind of
connections there's an implied hierarchy
that's emerging
and i would start becoming paranoid
that's not the proper hierarchy
so you're not just one way to see mind
maps is you're putting
thoughts on paper it's like a
stream of consciousness but then you can
also start getting paranoid well
if is this the right hierarchy sure like
which it's a mind map it's your mind map
you're free to draw anything you want
you're free to draw any connection you
want and you can
just make a different mind my opinion is
if you think the central node is not the
right node
yeah so i suppose there's a fear of
being wrong
if you want to if you want to organize
your ideas by
writing down what you think which i
think is is very effective like
how do you know what you think about
something if you don't write it down
right uh if you do that the thing is
that it
imposes a much more uh syntactic
structure
over your ideas which is not required
with mind map so mind map is kind of
like a lower level
more freehand way of organizing your
thoughts
and once you've drawn it then you can
start
uh actually voicing your thoughts in
terms of you know
paragraphs it's a two-dimensional aspect
of layout too right
yeah and it's it's a kind of flower i
guess
you start there's usually you want to
start with a central concept
yes typically it ends up more like a
subway map so it ends up more like a
graph
a topological graph without a root note
yeah so
like in a subway map there are some
nodes that are more connected than
others and there are some nodes that are
more important than others
right so there are destinations but
it's it's not going to be purely like a
tree for instance
yeah it's fascinating to think that if
there's something to that about our
about the way our mind thinks by the way
i just kind of remembered
obvious thing that i have probably
thousands of documents in google doc at
this point
that bullet point lists uh
which is you can probably map a mine
map to a bullet point list
it's the same it's a no it's not it's a
tree
it's a tree yeah so i create trees but
also they don't have the visual element
like um i guess i'm comfortable with the
structure it feels like
it the narrowness the constraints feel
more
comforting if you have thousands of
documents with your own
thoughts in google docs why don't you
write
uh some kind of search engine like maybe
a mind map
um a piece of software mind mapping
software where you write down a concept
and then it gives you
sentences or paragraphs from your
thousand google docs document that match
this concept
the problem is it's so deeply unlike
mind maps
it's so deeply rooted in natural
language
so it's not um
it's not semantically searchable i would
say
because the categories are very you kind
of mention intelligence
language and motion they're very strong
semantic like
it feels like the mind map forces you to
be
semantically clear and specific the
bullet points list i have
are are sparse desperate
thoughts that uh
poetically represent a category
like motion as opposed to saying motion
so unfortunately it's that's the same
problem with the internet that's why the
idea of semantic web is difficult to get
it's uh most language on the internet is
a giant mess
of natural language that's hard to
interpret
which so do you think uh do you think
there's something to mind maps
as um you actually originally brought up
as we were talking about
kind of cognition and language do you
think there's something to mind maps
about how our brain actually
deals like think reasons about things
it's possible i think it's reasonable to
assume that there is
some level of topological processing in
the brain that the brain
is very associative in nature
and i also believe that
a topological space is a better medium
to encode thoughts than a geometric
space
then so i think what's the difference in
topological and geometric space
well um if you're talking about
topologies uh
then points are either connected or not
so the topology is more like a subway
map
and geometry is when you're interested
in the distance between things and in
subway maps you don't really have the
concept of distance you only have the
concept of whether there is a train
going from station a
to station b and
what we do in deep learning is that
we're we're actually dealing with uh
geometric spaces we're dealing with
concept vectors
word vectors uh that have a distance
between
the gist expressed in terms of dot
product um
we are not we are not really building
topological models usually
i think you're absolutely right like
distance is a
fundamental importance in deep learning
i mean it's the
continuous aspect of it yes because
everything is a vector and everything
has to be a vector because everything
has to be differentiable
if your space is discrete it's no longer
differentiable you cannot do deep
learning in it anymore
well you could but you could only do it
by embedding it in
a bigger continuous space so if you do
topology in the in the context of deep
learning you have to do it by embedding
your topology in a geometry right yeah
well let me uh let me zoom out for a
second
uh let's get into your paper on the
measure of intelligence
that uh did you put on 2019 yes
okay yeah november november
yeah remember 2018 that was a different
time
yeah i remember i still remember
it feels like a different and different
different world
you could travel you can you know
actually go outside and
see friends yeah
let me ask the most absurd question i
think
uh there's some non-zero probability
there'll be a textbook one day
like 200 years from now on artificial
intelligence
or it'll be called like just
intelligence because humans will already
be gone
it'll be your picture with a quote
you know one of the early biological
systems would consider
the nature of intelligence and they'll
be like a definition of how they thought
about intelligence
which is one of the things you do in
your paper on measure intelligence is
to ask like well
what is intelligence and and uh how to
test for intelligence and so on
so is there a spiffy quote
about what is intelligence
what is the definition of intelligence
according to francois charley
yes so do you think the the
superintendent ais of the future will
want to remember us do we
remember humans from the past and do you
think they would be
you know they won't be ashamed of having
a biology called origin
uh no i i think it would be a niche
topic it won't be that interesting but
it'll be
it'll be like the people that study in
certain contexts
like historical civilization that no
longer exist
the aztecs and so on that that's how
it'll be seen
and it'll be studying also the context
on social media there will be hashtags
about the atrocity committed to human
beings
um when when the when the robots finally
got rid of them
like it was a mistake it'll be seen as a
as a giant mistake but
ultimately in the name of progress and
it created a better world because humans
were
uh over consuming the resources and all
they were not very rational and were
destructive
in the end in terms of productivity and
putting more love in the world and so
within that context there'll be a
chapter about these biological systems
seems to have a very detailed vision
of that feature you should write a
sci-fi novel about it i said i'm working
i'm working on a sci-fi novel currently
yes
yes self-published yeah the definition
of intelligence so
intelligence is the efficiency
with which you acquire new skills
at tasks that you did not previously
know about that you did not prepare for
all right so it is not intelligence is
not skill itself
it's not what you know it's not what you
can do it's how
well and how efficiently you can learn
new things
new things yes the idea of newness there
seems to be fundamentally important
yes so you would see intelligence on
display for instance
whenever you see a human being or you
know an ai
creature adapt to a new environment that
it has not seen before that its creators
did not anticipate
when you see adaptation when you see
improvisation when you see
generalization that's intelligence
uh in reverse if you have a system
that's when you put it in a
slightly new environment it cannot adapt
it cannot improvise it cannot deviate
from what it's hardcoded to do oh
what what it has been trying to do
um that is a system that is not
intelligent there's actually a quote
from
einstein that captures this idea which
is
the measure of intelligence is the
ability to change
i i like that quote i think it captures
at least part of this idea
you know there might be something
interesting about the difference between
your definition and einsteins
i mean he's just being einstein
and clever but acquisition of
new ability to deal with new things
versus ability to just change
what's the difference between those two
things so just
changing itself do you think there's
something to that
just being able to change yes being able
to adapt so not
not change but certainly uh
changes direction being able to adapt
yourself
to your environment whatever the
environment that's
that's a big part of intelligence yes
and intelligence is more precisely you
know
how efficiently you're able to adapt how
efficiently you're able to
basically master your environment how
efficiently
you can acquire new skills and i think
there's a there's a big distinction to
be drawn
between intelligence which is a process
and the output of that process which is
skill
so for instance if you have a very smart
human programmer
that considers the game of chess and
that writes down
a static program that can play chess
then
the intelligence is the process of
developing that program but the program
itself
is just encoding
the output artifact of that process the
program itself is not intelligent and
the way you tell it's not intelligent
is that if you put it in a different
context you ask it to play go or
something
it's not going to be able to perform
well with human involvement because the
source of intelligence
the entity that is capable of that
process is the human programmer
so we should be able to tell the
difference between
the process and its output we should not
confuse
the output and the process it's the same
as you know do not confuse
a road building company and one specific
road because one specific road takes you
from point a to point b
but a road building company can take you
from you can
make a path from anywhere to anywhere
else yeah that's beautifully put but
it's also
to play devil's advocate a little bit
you know um it's possible that there's
something
more fundamental than us humans so
you kind of said the programmer creates
uh the difference between the the choir
of the skill and the skill itself
there could be something like you could
argue the universe
is more intelligent like the the deep
the base intelligence of um that we
should be
trying to measure is something that
created humans
we should be measuring god or what
the source the universe as opposed to
like there's there could be a deeper
intelligence sure there's always deeper
intelligence you can argue that but that
does not
take anything away from the fact that
humans are intelligence and you can't
tell that
because they are capable of adaptation
and and generality
um and you see that in particular and
the fact that
uh humans are capable of handling
uh situations and tasks that
are quite different from anything that
any of our
evolutionary ancestors has ever
encountered
so we are capable of generalizing very
much out of distribution if you consider
our evolutionary history as being in a
way else training data
course evolutionary biologists would
argue that we're not going too far out
of the distribution
we're like mapping the skills we've
learned previously
desperately trying to like jam them into
like
these new situations i mean there's
definitely a little bit
a little bit of that but it's pretty
clear to me that we're able to
uh you know most of the things we do
any given day in our modern civilization
are things that are
very very different from what you know
our ancestors a million years ago would
have been doing in in a given day and
your
environment is very different so i agree
that
um everything we do we do it with
cognitive building blocks
that we acquired over the course of
revolution
right and that anchors um our cognition
to a certain context which is
the human condition very much but still
our mind is capable of a pretty
remarkable degree of generality
far beyond anything we can create in
artificial systems today
like the degree in which the mind can
generalize
from its evolutionary history
can generalize away from its
evolutionary history is much greater
than the
degree to which a depending system today
can generalize away from its training
data
and like the key point you're making
which i think is quite beautiful is like
we shouldn't measure if we talk about
measurement
we shouldn't measure the skill we should
measure like the creation of the new
skill
the ability to create that new skill yes
but there it's tempting
like it's weird because the skill
is a little bit of a small window into
the
into the system so whenever you have a
lot of skills
it's tempting to measure the skills yes
i mean the skill is the
uh only thing you can objectively
measure
but yeah so the the thing to keep in
mind is that
when you see skill in the human
it gives you a strong signal that that
human is intelligent because you knew
they weren't born with that skill
typically like you say this you see a
very strong chess player maybe you're a
very stronger player yourself
i think you're and you're you're saying
that because i'm russian and now now
you're
you're prejudiced you assume oh yeah
it's just biased
i'm biased yeah well you're dead by us
um so if you see a very strong chess
player you know they weren't born
knowing how to play chess so they had to
acquire that skill
with their limited resources with their
limited lifetime
and you know they did that because they
are generally intelligent
and so they may as well have acquired
any other skill you know they have this
potential
and on the other hand if you see a
computer
playing a chess you cannot make the same
assumptions because you cannot you know
just assume the computer is generally
intelligent
the computer may be born knowing
how to play chess in the sense that it
may have been programmed
by a human that has understood chess for
the computer and and that has just
encoded
um the output of that understanding in
aesthetic program and that program
is not intelligent so let's zoom out
just for a second and say like
what is the goal of the on the measure
of intelligence paper
like what do you hope to achieve with it
so the goal of the paper
is to clear up some long-standing
misunderstandings
about the way we've been conceptualizing
intelligence in the ai community and
in the way we've been evaluating
progress
in ai there's been a lot of progress
recently in machine learning and people
are you know extrapolating from that
progress that we're about
to solve general intelligence
and if you want to be able to evaluate
these statements
you need to precisely define what you're
talking about when you're talking about
general intelligence and you need
a formal way a reliable way to measure
how much intelligence how much general
intelligence
a system processes and ideally this
measure of intelligence should be
actionable so it should not just
describe
what intelligence is it should not just
be a binary indicator that tells you
the system is intelligent or it isn't
um it should be actionable it should
have explanatory
power right so you could use it as a
feedback signal
it would show you uh the way towards
building more intelligent systems
so at the first level you draw a
distinction between two divergent views
of intelligence
of um as we just talked about
intelligence is a collection of tax
task specific skills and a general
learning ability so what's the
difference between
kind of this memorization of skills
and a general learning ability we've
talked about a little bit but can you
try to
linger on this topic for a bit yeah so
the first part of the paper
uh is uh an assessment of the different
ways
uh we've been thinking about
intelligence and the different ways
we've been evaluating progress
in ai and the history
of cognitive sciences has been shaped by
two views
of the human mind and one view is the
evolutionary psychology view in which
the mind
is a collection of fairly static
special purpose ad-hoc mechanisms
that have been hard coded by evolution
over our our history as a species over a
very long time
and um early
ai researchers people like marvin minsky
for instance
they clearly subscribed to this view
and they saw they saw the mind as a kind
of
you know collection of static programs
uh
similar to the programs they would they
would run on like mainframe computers
and in fact they i think they very much
understood the mind
uh through the metaphor of the mainframe
computer because that was the tool they
they were working with right
and so you had the static programs this
collection of very different static
programs operating over
a database like memory and in this
picture learning was not very important
learning was considered to be just
memorization and in fact
learning is basically not featured in ai
textbooks until
the 1980s with the rise of machine
learning
it's kind of fun to think about that
learning was the outcast
like the the weird people were learning
like the mainstream
ai world was um
i mean i don't know what the best term
is but it's non-learning
it was seen as like reasoning yes would
not be learning based
yes it was seen it was considered that
the mind was a collection
of programs that were primarily
logical in nature and that's all you
needed to do to create a mind was to
write down these programs and they would
operate over your knowledge
which would be stored in some kind of
database and as long as your database
would encompass you know
everything about the world and your
logical rules were uh
comprehensive then you would have in
mind so the other view of the mind
is the brain as a sort of blank slate
right this is a very old idea you find
it in
john locke's writings this is the
tabulata
and this is this idea that the mind is
some kind of like information sponge
that starts empty it starts
blank and that absorbs uh
knowledge and skills from experience
right so it's uh it's a sponge that
reflects
the complexity of the world the
complexity of your life experience
essentially
that everything you know and everything
you can do is
a reflection of something you found in
the outside world essentially
so this is an idea that's very old uh
that was not very popular for instance
in the in the 1970s
but that had gained a lot of vitality
recently with the rise of
connectionism in particular deep
learning and so today deep learning is
the dominant
paradigm in ai and i feel like lots of
ai researchers are conceptualizing the
mind
via a deep learning metaphor like they
see the mind as a kind of
randomly initialized neural network that
starts blank
when you're born and then that gets
trained yeah
exposure to training data that acquires
knowledge and skills exposure to
training data
by the way it's a small tangent
i feel like people who are thinking
about intelligence
are not conceptualizing it that way i
actually
haven't met too many people who believe
that a neural network
will be able to reason who
seriously think that rigorously because
i think it's actually interesting world
view
and and we'll talk about it more but it
it's been impressive
what the uh what neural networks have
been able to accomplish
and it's i to me i don't know you might
disagree but it's an open question
whether
like like scaling size
eventually might lead to incredible
results to us mere humans will appear as
if it's general
i mean if you if you ask people who are
seriously thinking about intelligence
they will
definitely not say that all you need to
do is is
like the mind is just in your network uh
however
it's actually you that's that's very
popular i think in the deep learning
community that
many people are kind of uh conceptually
you know intellectually lazy about it
right but what i guess what i'm saying
exactly right it's
uh i i me i haven't met many people and
i think it would be interesting
uh to meet a person who is not
intellectualized about this particular
topic and still believes
that neural networks will go all the way
i think january
is probably closest to that there are
definitely people
who argue that uh
current deep learning techniques are
already the way
to general artificial intelligence and
that all you need to do
is to scale it up to all the available
training data
and that's if you look at the the waves
that
open ai's gpt stream model has made you
see
echoes of this idea so on that topic
gpt-3 similar to gpt-2
actually have captivated some part of
the imagination of the public
there's just a bunch of hype of
different kind that's
i would say it's emergent it's not
artificially manufactured it's just like
people just get excited for some strange
reason in in the case of gpt3 which is
funny
that there's i believe a couple months
delay from release to
hype maybe i'm not
historically correct on that but it
feels like there was a little bit of a
lack of hype and then there's a phase
shift into into
hype but nevertheless there's a bunch of
cool applications
that seem to captivate the imagination
of the public about what this
language model that's trained in
unsupervised way
without any fine tuning is able to
achieve
so what do you make of that what are
your thoughts about gbt3
yeah so i think what's interesting about
gpg3 is the idea that it may be able
to learn new tasks in
after just being shown a few examples so
i think if it's actually capable of
doing that
that's novel and that's very interesting
and that's something we should
investigate
that said i must say i'm not entirely
convinced
that we have shown it's it's capable of
doing that
it's very likely given the amount
of data that the model is trained on
that what it's actually doing
is pattern matching uh a new task you
give it
with the task that it's been exposed to
in its training data it's just
recognizing the task
instead of just developing a model of
the task
right but there's a side to interrupt
there's there's a parallels to what you
said before
which is it's possible to see gpt3
as like the prompts that's given as a
kind of
sql query into this thing that it's
learned similar to what you said before
which is language is used
to query the memory yes so is it
possible that
neural network is a giant memorization
thing
but then if it gets sufficiently giant
it'll memorize sufficiently large
amounts of thing in the world
where it becomes more intelligence
becomes a querying machine
i think it's possible that uh a
significant chunk of intelligence
is this giant associative memory uh
i definitely don't believe that
intelligence is just
a giant issue of memory but it may well
be a big component
so do you think gpt 3
4 5 gpt 10 will eventually
like what do you think where's the
ceiling do you think you'll be able to
reason um no that's a bad question uh
like what is the ceiling is the better
question how well is it going to scale
how good is gptn going to be yeah
so i believe gptn is going to
chiptn is going to improve on the
strength
of gpt2 and 3 which is it will be able
to generate you know
ever more plausible text in context just
monitoring
the process performance um yes
if you train if you're training bigger
more on more data then
your text will be increasingly more
context aware
and increasingly more plausible in the
same way that gpd3
it is much better at generating
clausable text compared to gpd2
but that said i don't think just getting
up
uh the model to more transformer layers
and more train data is going to address
the flaws
lgbt3 which is that it can generate
plausible text but that text is
not constrained by anything else other
than plausibility
so in particular it's not constrained by
factualness
uh or even consistency which is why it's
very easy to get gpt3 to generate
statements that are
factually untrue uh or to general
statements that are even
self-contradictory
right uh because it's uh it's it's
only goal is plausibility and it has no
other constraints
it's not constrained to be
self-consistent for instance right
and so for this reason one thing that i
thought was very interesting with gpd3
is that
you can present mind the answer it will
give you
by asking the question in specific way
because it's very responsive to the way
you ask the question since it has
no understanding of the content of the
question right
and if you if you ask the same question
in two different ways that are
basically adversarially engineered to
produce certain answers you will get
two different answers to contractor
answers it's very susceptible to
adversarial
attacks essentially potentially yes so
in in general
the problem with these models is
generative models is that
they are very good at generating
plausible text but that's just
that's just not enough right um
you need uh i think one one avenue that
would be very
interesting to make progress is to make
it possible
to write programs over the latent space
that these models operate on that you
would rely
on these self-supervised models to
generate a sort of flag
pool of knowledge and concepts and
common sense and then you will be able
to write
explicit uh reasoning programs over it
uh because the current problem with gpt
stream is that you
it's it can be quite difficult to get it
to
do what you want to do if you want to
turn gpd3 into products you need to put
constraints on it
you need to um force it to
obey certain rules so you need a way to
program it explicitly
yeah so if you look at its ability to do
program synthesis
it generates like you said something
that's plausible yeah so
if you if you try to make it generate
programs it will perform well
for any program that it has seen it in
its training data
but because uh program space is not
interpretive
right um it's not going to be able to
generalize to problems it hasn't seen
before
now that's currently do you think
sort of an absurd but i think useful
um i guess intuition builder is uh
you know the gpt-3 has 175 billion
parameters
a human brain has a hundred has about a
thousand times that or
or more in terms of number of synapses
do you think obviously
very different kinds of things but there
is
some degree of similarity
do you think what do you think gpt will
look
like when it has a hundred trillion
parameters
you think our conversation might be so
in nature different
like because you've criticized gbt3 very
effectively now
do you think no i don't think so
so the the to begin with the bottleneck
with scaling upgrades
gbt models uh alternative pre-trained
transformer models
is not going to be the size of the model
or how long
it takes to train it the bottleneck is
going to be the trained data
because openui is already training gpt3
on a crore of basically the entire web
right and that's a lot of data so you
could imagine training on more data than
that like google could try on more data
than that
but it would still be only incrementally
more data
and i i don't recall exactly how much
more data gpd3 was trained on compared
to gpt2 but it's probably at least like
100 or maybe even a thousand x don't
have the exact number
uh you're not going to be able to train
the model on 100 more data than with
what you already with what you're
already doing
so that's that's brilliant so it's not
you know it's easier to think of compute
as a bottleneck
and then arguing that we can remove that
bottleneck but we can remove the compute
bottleneck i don't think it's a big
problem
if you look at the at the base at which
we've uh
improved the efficiency of deep learning
models
in the past a few years i'm not worried
about
uh trying time bottlenecks or model size
bottlenecks
the the bottleneck in the case of these
generative transformer models is
absolutely the trained data
what about the quality of the data so so
yeah so the quality of the data is an
interesting point the thing is
if you're going to want to use these
models in real
products um then you
you want to feed them data that's as
high quality as
factual i would say as unbiased as
possible
but you know there's there's not really
such a thing as unbiased
data in the first place but you probably
don't want to
to train it uh on reddit for instance it
sounds
sounds like a bad plan so from my
personal experience working with
a large scale deep learning models
so at some point i was working on a
model at google
that's trained on extra 150 million
labeled images it's image classification
model that's a lot of images that's like
probably most publicly available images
on the web at the time
and it was a very noisy data set because
the labels
were not originally annotated by hand by
humans they were
automatically derived from like tags on
social media
or just keywords in in the same page as
the image was fun and so on so it was
very noisy and
it turned out that you could uh easily
get
a better model uh not just by training
like if you train on more
of the noisy data you get an
incrementally better model but you
you you very quickly hit diminishing
returns on the other hand
if you try on smaller data set with
higher quality annotations quality that
are
annotations that are actually made by
humans you get a better
model and it also takes you know less
time to train it
uh yeah that's fascinating it's the
self-supervised
learnings there's a way to get better
doing the automated
labeling yeah so you can
enrich or refine your labels
in an automated way that's correct do
you have a hope for um
i don't know if you're familiar with the
idea of a semantic web
is this a semantic web just for people
who are not familiar
and is uh is the idea of being able to
convert
the internet
or be able to attach like semantic
meaning to
the words on the internet this the
sentences the paragraphs
to be able to contr convert information
on the internet or some fraction of the
internet into something that's
interpretable by machines
that was kind of a dream
for um i think the the semantic white
papers in the 90s
it's kind of the dream that you know the
internet is full of rich exciting
information even
just looking at wikipedia we should be
able to use that
as data for machines and so information
is not it's not really in a format
that's available to machines so no i
don't think the semantic web will
ever work simply because it would be a
lot of work
right to make to provide that
information in structured form
and there is not really any incentive
for anyone to provide that work
uh so i think the the way forward to
make
the knowledge on the web available to
machines is actually
something closer to unsupervised deep
learning
yeah the gpg 3 is actually a bigger step
in the direction of making the knowledge
of the web available to machines
than the semantic web was yeah perhaps
in a human-centric sense it it feels
like
gpt-3 hasn't learned
anything that could be used
to reason but that might be just the
early days
yeah i think that's correct i think the
forms of reasoning that you
that you see it perform are basically
just reproducing
patterns that it has seen in string data
so of course if you're trained on
uh the entire web then you
can produce an illusion of reasoning in
many different situations but it will
break down
if it's presented with a novel uh
situation
that's the opening question between the
illusion of reasoning and actual
reasoning
yes the power to adapt to something that
is genuinely new
because the thing is even imagine you
had
uh you could train on
every bit of data ever generated
in history of humanity uh it remains
so that model would be capable of of
anticipating
uh many different possible situations
but it remains that
the future is going to be something
different like
for instance if you train a gpt stream
model on
on data from the year 2002 for instance
and then use it today it's going to be
missing many things it's going to be
missing many
common sense facts about the world
it's even going to be missing vocabulary
and so on
yeah it's interesting that uh gbt3 even
doesn't have
i think any information about the
coronavirus
yes which is why you know uh
a system that's uh you you tell that the
system is intelligent when it's capable
to adapt
so intelligence is gonna require uh some
amount of continuous learning
but it's also gonna require some amount
of improvisation like
it's not enough to assume that what
you're going to be
asked to do is something that you've
seen before
or something that is a simple
interpolation of things you've seen
before
yeah in fact that model breaks down for
uh even even very
tasks that look relatively simple from a
distance
like l5 self-driving for instance
google had a paper couple of years back
showing that something like 30 million
different road situations were actually
completely insufficient to train
a driving model it wasn't even l2 right
and that's a lot of data that's a lot
more data than the
the 20 or 30 hours of driving that a
human needs
to learn to drive given the knowledge
they've already accumulated
well let me ask you on that topic
elon musk tesla autopilot
one of the only companies i believe is
really pushing for a learning based
approach
are you you're skeptical that that kind
of network can achieve level four
l4 is probably achievable
l5 is probably not what's the
distinction there
this l5 is completely you can just fall
asleep
yeah alpha is basically human level well
it will drive you have to be careful
saying human level because like
that's yeah most of the drivers yeah
that's the clearest example of like
you know cars will most likely be much
safer than humans in situ
in many situations where humans fail
it's the
vice versa so i'll tell you you know
the thing is the the amounts of training
data you would need
to anticipate for pretty much every
possible situation
you'll encounter in the real world uh is
such that
it's not entirely unrealistic to think
that at some point in the future we'll
develop a system that's running on
enough data especially uh
provided that we can uh simulate a lot
of that data
we don't necessarily need actual uh
actual cars on the road
for everything but it's a massive
effort and it turns out you can create a
system that's
much more adaptative that can generalize
much better
if you just add
explicit models of the surroundings of
the car
and if you use deep learning for what
it's good at which is to provide
perceptive information so in general
deep learning is is
a way to encode perception and a way to
encode intuition
but it is not a good medium for any sort
of
explicit reasoning and
uh in ai systems today uh strong
generalization tends to
come from um explicit
models tend to come from abstractions in
the human mind that are encoded
in program form by a human engineer
right yeah these are the abstractions
you can actually generalize not the sort
of
weak abstraction that is learned by a
neural network yeah and the question is
how much
how much reasoning how much strong
abstractions are required to solve
particular tasks like driving
that's that's the question or human life
existence how much how much strong obs
abstractions does existence
require but more specifically on driving
that's that seems to be that seems to be
a coupled
question about intelligence is like uh
how much
intelligence like how do you build an
intelligent system
and uh the coupled problem how hard is
this problem how much intelligence does
this problem actually require
so we're um we get to cheat right
because we get to look at the problem
like it's not like you get to close
our eyes and completely new to driving
we get to do what we do as
human beings which is uh for the
majority of our life
before we ever learn quote unquote to
drive we get to watch other
cars and other people drive we get to be
in cars we get to
watch we get to get to see movies about
cars we get to
you know get to observe all this stuff
and that's similar to what neural
networks are doing
it's getting a lot of data and
the the the question is yeah how much is
uh
how many leaps of reasoning genius is
required
to be able to actually effectively drive
i think it's an example of driving
i mean sure you've seen a lot of cars
in your life before you learn to drive
but let's say you've learned to drive in
silicon
valley and now you rent a car in tokyo
well now everyone is driving on the
other side of the road and the signs are
different
and the roads are more narrow and so on
so it's a very very different
environment
uh a smart human even an average human
should be able to just zero shot it to
just
be operational in this in this very
different environment
yeah right away despite having add
new contacts with the novel complexity
that is contained in this environment
right and that is another complexity is
not just
interpolation over the situations that
you've encountered previously
like learning to drive in the u.s right
i would say
the reason i ask this one of the most
interesting tests of intelligence we
have
today actively which is driving
in terms of having an impact on the
world like when do you think we'll pass
that test of intelligence
so i i don't think driving is that much
of a test institutions because again
there is no task for which skid at that
task
demonstrates intelligence unless
it's a kind of meta task that involves
acquiring
new skills so i don't think i think you
can actually solve driving
without having
any any real amount of intelligence for
instance if you really did have infinite
trained data
um you could just literally train an
end-to-end deep learning model that's
driving
provided infinite training data the only
problem
with the whole idea is um
collecting a data sets that's
sufficiently comprehensive that covers
the very long tail of possible
situations you might encounter
and it's really just a scale problem so
i think the
there's nothing fundamentally wrong
uh uh with this plan with this idea it's
just that
um it strikes me as a fairly inefficient
thing to do because you run
into this uh this uh scanning issue with
diminishing returns whereas
if instead you took a more manual
engineering approach
where you use deep learning
modules in combination
with um engineering an explicit model
of the surrounding of the cars and you
and you bridge the two in a clever way
your model will actually start
generalizing much
earlier and more effectively than the
end-to-end depleting model so why would
you not
go with the more manually engineering
oriented approach
like even if you created that system
either the end-to-end deep learning
model system that's infinite data or
the slightly more human system
i i don't think achieving alpha would
demonstrate
uh general intelligence or intelligence
of any generality at all again the only
possible test
of generality in ai would be a test that
looks at skill
acquisition over unknown tasks but for
instance you could take your l5
driver and ask it to to learn to to
pilot a
a commercial airplane for instance and
then you would look at how much human
involvement
is required and how much training data
is required uh for the system to learn
to pirate an airplane and
that that gives you a measure of how
intelligent that system
is yeah well i mean that's a big leap i
get you but
i'm more interested as a problem i would
see
to me driving is a black box
that can generate novel situations at
some rate
that what people call edge cases like so
it does have
newness that keeps being like we're
confronted
let's say once a month it is a very long
time yes
long term that doesn't mean you cannot
solve it
uh just by by training a statistical
model a lot of data
huge amount of data it's it's really a
matter of scale
but i guess what i'm saying is if you
have a
vehicle that achieves level five it is
going to be
able to deal with new situations
or i mean the data is so large
that the rate of new situations is
very low yes that's not intelligent so
if we go back to your kind of definition
of intelligence it's the efficiency
with which you can adapt to new
situations to truly new situations not
situations you've seen before
right not situations that could be
anticipated by your creators by the
creators of the system but three new
situations
the efficiency with which you acquire
new skills
if you require if in order to pick up a
new skill you require
a very extensive training data sets
of most possible situations that can
that can occur
in the practice of that skill then the
system is not intelligent it is mostly
just
a lookup table yeah
well likewise if uh in order to acquire
a skill you need
a human engineer to write down a bunch
of rules that cover
most or every possible situation
likewise the system is
is not intelligent the system is merely
the output artifact
of a process that that depends that
happens in the minds
of the engineers that are creating it
right it is
including uh an abstraction that's
produced by the human mind
and intelligence that would actually be
the process of producing of autonomously
producing this abstraction
yeah not like if you take an abstraction
you encode it
on a piece of paper or in a computer
program the abstraction itself
is not intelligent what's intelligent is
the the agent that's capable of
producing these abstractions
right yeah it feels like there's a
little bit of a gray area
like because you're basically saying
that deep learning forms abstractions
too but those abstractions
do not seem to be effective for
generalizing far outside
of the things that's already seen but
generalize a little bit
yeah absolutely no depending does
generalize a little bit like
generalization is not it's not a binary
it's mark a spectrum
yeah and there's a certain point it's a
gray area but there's a certain point
where there's an impressive degree of
generalization
that happens no like
i guess exactly what you were saying is
uh intelligence is um
how efficiently you're able to
generalize
far outside of the distribution of
things you've seen already
yes so it's both like the the distance
of how far you can
like how new how radically new something
is
and how efficiently yes absolutely so
you you can think of
uh intelligence as a measure of an
information conversion ratio
like imagine uh a space of possible
situations
and you've covered some of them
so you have some amount of information
about your space of possible situations
that's provided by the situations you
already know
and that's on the other hand also
provided by
the prior knowledge that the system
brings to the table the prior knowledge
that's embedded
in the system so the system starts with
some information
right about the problem but the task and
it's about going from that information
to
a program what you would call a skill
program a behavioral program
that can cover a large area of possible
situation space
and essentially the ratio between that
area and the amount of information you
start with
is intelligence
so a very smart agent uh can make
efficient uses
of very little information about a new
problem and very little prior knowledge
as well
to cover a very large area of potential
situations in that problem
without knowing what these future new
situations are going to be
so one of the other big things you talk
about in in the paper
we've talked about a little bit already
but let's talk about it some more is uh
actual tests of intelligence so
if we look at like human and machine
intelligence
do you think tests of intelligence
should be different
for humans and machines or how we think
about testing of intelligence
are these fundamentally the same kind of
intelligences that we're after and
therefore the test should be
similar so if your goal is to create
ais that are more human-like
then it will be super variable obviously
to have a test
that's that's universal at a price to
both uh ais
uh and humans so that you can you could
establish a comparison
uh between the two that you could tell
exactly
how uh intelligent in terms of human
intelligence
a given system is so that said the
constraints
that apply to artificial intelligence
and to human intelligence
are very different and your tests should
account for this difference
because if you look at artificial
systems it's always possible
for an experimenter to buy arbitrary
levels of skill
at arbitrary tasks either by
injecting a hard-coded prior knowledge
into the system via rules
and so on that come from the human mind
from the minds of the programmers
and also buying uh higher levels of
skill just by training on more data
for instance you could generate an
infinity of different goal games
and you could train a good playing
system
that way but you could not directly
compare
it to human goal playing skills because
a human that plays go had to develop
that skill
in a very constrained environment they
had a limited amount of time
their limited amount of energy and of
course
this started from a different set of
priors to solids from
uh um you know innate uh human priors
um so i think if you want to compare the
intelligence of two systems like the
intentions of an ai
and the intelligence of a human you have
to
um control for priors you have to
start from the the same set of knowledge
priors about the task
and you have to control for for
experience and that is to say for
training data
so prior what's priors
so prior is whatever information
you have about a given task before you
start learning about this task
and how's the difference from experience
well experience is acquired
right so for instance if you're if
you're trying to play goal
your experience with goal is all the
goal games you've played
or you've seen or you've simulated in
your mind let's say
and uh your priors are things like
well go go is a game on on a 2d grid
and we have lots of hard-coded priors
about
the organization of 2d space and the
rules of how
the the dynamics of the physics of this
game in this 2d space
yes and the idea that you have what
winning is
yes exactly so like and all other board
games
can also share some similarities with
school and if you've played these board
games then
uh with respect to the game of go that
would be part of your priors about the
game
well it's interesting to think about the
game of goes how many priors are
actually brought to the table
when you look at uh self-play
reinforcement learning based mechanisms
that do learning it seems like the
number of prizes pretty low yes but
you're saying you should be exp
there's a 2d special priority in the
covenant
right but you should be clear at making
those priors explicit
yes uh so in particular i think if your
if your goal
is to measure a human-like form of
intelligence
then you should clearly establish that
you want
the ai your testing to start from
the same set of priors that humans start
with right
so i mean to me personally but i think
to a lot of people
the human side of things is very
interesting so testing intelligence for
humans
what um what do you think is a good test
of human intelligence
well that's the question that
psychometrics
is is interested in what is there's an
entire subfield of psychology
that deals with this question so what's
psychometrics the psychometrics
is the sub-field of psychology that that
tries to
measure quantify aspects of the human
mind so in particular community
abilities intelligence
and personality threats as well so
uh like what are might be a weird
question but what are like the first
principles
of the of psychometrics that
operates on the you know what what are
the priors it brings to the table
so it's a filled with a with a fairly
long history
um it's so you know psychology sometimes
gets a bad reputation
for not having very reproducible uh
results and some
psychometrics as actually some fairly
solidly or producible results
so the ideal goals of the field is you
know tests should be
be reliable which is a an ocean type
reproducibility
it should be valid uh meaning that it
should actually measure what you say
but you say it measures um so for
instance if you're if you're saying that
you're measuring intelligence then
your test results should be created with
things that you expect to be correlated
with intelligence like
success in school or success in the
workplace and so on
should be standardized meaning that you
can
administer your tests to many different
people in the same conditions
and it should be free from bias meaning
that for instance uh if you're if if
your test involves
uh the english language then you have to
be aware that
this creates a bias against people who
have english as their second language or
people who can't speak english at all
so of course these these principles for
creating psychometric tests are
very much nighty old i don't think every
psychometric test
is is really either reliable
valid or offer from bias but at least
the field is aware
of these weaknesses and is trying to
address them so
it's kind of interesting um ultimately
you're only
able to measure like you said previously
the skill
but you're trying to do a bunch of
measures of different skills that
correlate as you mentioned strongly with
some general concept of
cognitive ability yes yes so what's the
g
factor so right there are many different
kinds of
tests tests of intelligence and uh
each of them is interested in in uh
different aspects of intelligence you
know some of them will deal with
language some of them
we deal with a special vision maybe
mental rotations numbers and so on
when you run these very different tests
at scale what you start seeing is that
there are clusters of correlations
among test results so for instance if
you look at uh
homework at school um you will see that
people
who do well at math are also likely
statistically to do well in physics
and what's more uh there there also
people do well at math and physics
are also statistically likely to do well
in things that
sound completely unrelated like writing
in english essay for instance
and so when you see clusters of
correlations
uh in in statical statistical terms you
would explain them with a latent
variable
and the latent variable that would for
instance explain uh
the relationship between being good at
math and being good at physics would be
cognitive ability right and the g factor
is the the latent variable that explains
uh the fact that
every test of intelligence that you can
come up with
results on that on on this test end up
being correlated so there is some a
single uh a unique variable
uh that that explains this correlations
that's the g factor
so it's a statistical construct it's not
really something you can directly
measure for instance in a person um but
it's there
but it's there it's there it's the art
scale and that's also one thing i want
to
mention about psychometrics like you
know when you talk about
measuring intelligence in in humans for
instance some people get
a little bit worried they will say you
know that sounds dangerous maybe that's
not potentially discriminatory and so on
and
they're not wrong and the thing is so
personally i'm not interested in
psychometrics
as a way to characterize one individual
person like if if i get
your psychometric personality assessment
or your iq i don't think that actually
tells me much
about you as a person i think
psychometrics is most useful
as a statistical tool so it's most
useful at scale
it's most useful when you start getting
test results for
a large number of people and you start
cross-correlating these test results
because
that gives you information about the
structure
of the human mind particularly about the
structure of human cognitive abilities
so
at scale psychometrics paints
a certain picture of the human mind and
that's interesting
and that's what's relevant to ai the
structure of human currency abilities
yeah it gives you an insight into it i
mean to me i remember when i learned
about g
factor it seemed
it it seemed like it would be impossible
for it even
it to be real even as a statistical
variable like
it felt uh kind of like astrology like
it's like wishful thinking among
psychologists
but uh the more i learned i realized
that there's some
i mean i'm not sure what to make about
human beings the fact that the jig
factor is a thing
that there's a commonality across all of
human species
is there destiny to be a strong
correlation between cognitive abilities
that's kind of fascinating yeah actually
so human connectivities have
uh a structure like the the most
mainstream theory of the structure of
cancer abilities
it's called a chc theory it's a cattle
horn carol
it's name of the industry psychologist
who contributed key pieces of it
and it describes uh cognitive abilities
as a hierarchy with three levels and at
the top you have the g-factor
then you have broad cognitive abilities
for instance fluid intelligence right
that that encompass um a
broad set of possible kinds of tasks
that are all
related and then you have narrow
cognitivity is at the last level
which is uh closer to task specific
skill
and there are actually different
theories
of the structure of clinical abilities
that just emerge from different
statistical analysis of
iq test results but they all describe
a hierarchy with a kind of g factor
at the top and you're right that the g
factor is
it's not quite real in the sense that
it's not
something you can observe and measure
like your height for instance but it's
really in the sense that
you you see it in in a statistical
analysis of the data
right one thing i want to mention is
that the fact that there is a g-factor
does not really mean that
human intelligence is a general in a
strong sense
does not mean human intentions can can
be applied to any problem
at all and that someone who has a high
iq is going to be able to solve any
problem at
all that's not quite what it means i
think um
one one popular analogy to understand it
is the sports analogy if you consider
the concept of
physical fitness it's a concept that's
very similar to intelligence because
it's a useful concept it's something you
can intuitively
understand some people are fit uh maybe
like you
some people are not as fit maybe like me
um but none of us can fly
absolutely it's so constrained even if
you're very fit
that doesn't mean you can do uh anything
at
all in any environment you you obviously
cannot fly you cannot uh survive at the
bottom of the ocean and so on
and if you were a scientist say you want
you wanted to precisely
define and measure physical fitness in
humans then you would come up with a
battery
uh of tests uh like you would you know
have running
android meter uh playing soccer playing
table tennis swimming and so on
and uh if you run these tests over many
different people you will start seeing
correlations and test results for
instance people who are good at soccer
are so good
at sprinting right and
you will explain these correlations with
physical abilities that are
strictly analogous to cognitive
abilities right and then you would start
also observing
correlations between biological uh
characteristics like maybe lung volume
is correlated with being
a a fast runner for instance uh in the
same way that
there are neurophysical uh correlates
of cognitive abilities right and at the
top
of the hierarchy of physical abilities
that you would be able to observe you
would have
a g-factor a physical g-factor which
would map to physical fitness right
and as you just said that doesn't mean
that
people with a with high physical fitness
can fly doesn't mean
uh human morphology and human physiology
is universal
it's actually super specialized we can
only do the things
and that we were evolved to do
right like we are not appropriate to to
to you you could not exist on venus or
mars or in the void of space
but on the ocean so that said one thing
that's really striking
and remarkable is that
our morphology generalizes
far beyond the environments that we
evolved for
like in a way you could say we evolved
to run
after prey in the seminar right that's
very much
where our human morphology comes from
and that said
we can we can do a lot of things that
are that are
completely unrelated to that we can
climb mountains we can
we can swim across lakes uh we can play
a table tennis i mean table tennis is
very different from what we were evolved
to do right
so our morphology our bodies or our
sense of motor affordances
are of a degree of generality that is
absolutely remarkable
right and i think cognition is very
similar to that
our cognitive abilities have a degree of
generality that goes far beyond
what the mind was initially supposed to
do which is why we can you know play
music and write novels and
and go to mass and do all kinds of crazy
things
but it's not universal in the same way
that human morphology and our body
is not appropriate for actually most of
the universe by volume
in the same way you could say that the
human mind is naturally appropriate for
most of
problem space potential problem space uh
by volume so we have very strong
cognitive biases actually that mean that
there are certain types of problems that
we handle very well and certain
certain types of problem that we are
completely adapted for
so that's really how we interpret
the g-factor it's not a sign of strong
generality
it's it's really just a broader the
broadest cognitive ability
but our abilities whether we are talking
about sensory motor abilities
or cognitive abilities they still they
remain very specialized
in the human condition right
within the constraints of the human
cognition
they're general yes absolutely so but
the constraints as you're saying are
very limited
what i think what's yeah limiting so we
we evolved
our cognition and our body evolved in in
very specific environments
because our environment was so viable
fast changing and so unpredictable
part of the constraints that that drove
our evolution
is generality itself so we were in a way
evolved to
to be able to improvise in all kinds of
physical
or cognitive environments right yeah um
and for this reason it turns out that
uh the the minds and bodies that we
ended up with
uh can be applied to much much broader
scope
than what they were evolved for right
and that's truly remarkable
and that goes that's the degree of
generalization that is far beyond
anything you can see in artificial
systems today right
um that's it it does not mean that that
uh human intelligence is anywhere
universal yes yeah it's not general
you know it's a kind of exciting topic
for people even
you know outside of artificial
intelligence iq tests
there i think it's mensa whatever
there's different degrees of difficulty
for questions
we talked about this offline a little
bit too about sort of difficult
questions
you know what makes a question on an iq
test
more difficult or less difficult do you
think so
the the thing to keep in mind is that
there's no such thing as
a question that's intrinsically
difficult
it has to be difficult to respect to the
things you already know
and the things you can already do right
so
in in terms of an iq test question
typically you would have it will be
structured for instance
as a set of demonstration
input and output pairs right and then
you would be given
a test input a prompt and you you
you would need to recognize or produce
the corresponding output
and in that narrow context you could say
a difficult
question is a question where
um the input prompt is
very surprising and unexpected given the
the training examples just even the
nature of the patterns that you're
observing in the input problem
for instance let's say you have a
rotation problem
you must rotate the shape by 90 degrees
if i give you two examples and then i'll
give you one one
prompt which is actually one of the two
training examples
then there is zero generalization
difficulty for the task it's actually
triggered task
you just recognize that it's one one of
the training examples and you produce
the same answer
now if it's uh if it's a more complex
shape
there is you know a little bit more
generalization but it remains that
you are still doing the same thing at
this time
as you were being demonstrated at
training time a difficult task starts to
require
some amount of uh test time adaptation
some amount of
improvisation right so uh
consider i don't know you're teaching a
class on like quantum physics or
something
um if
uh if you wanted to kind of test
the understanding that students have of
the material you would come up with
an exam that's very different
from anything they've seen like on the
internet
when they were cramming uh on the other
hand if you wanted to make it easy
you would just give them something
that's very similar
to the the mock exams that that that
they've taken
something that's just a simple
interpolation of questions that they've
they've already seen
and so that would be an easy exam it's
very similar to what you've been trained
on
and a difficult exam is one that really
probes your understanding because it
forces you
to improvise it forces you to do things
uh that are different from what you were
exposed to before
so that said it doesn't mean that the
exam that requires improvisation is
intrinsically hard right because maybe
you're you're a quantum physics
expert so when you take the exam this is
actually stuff that despite being you
new to the students it's not new to you
right so it can only be difficult
with respect to what the test taker
already knows and with respect to the
information
that the test taker has about the task
so
that's what i mean by controlling for
priors what you
the information you bring to the table
and the exp and experience which is
the training data so in in the case of
the the quantum physics exam that would
be
uh all the the the course material
itself and all the mock exams that
students might have taken online
yeah it's interesting because um i've
also i
i sent you an email and i asked you like
i've been
this just this curious question of um
you know what's a really hard iq test
question
and i've been talking to also people who
have designed iq tests
there's a few folks on the internet it's
like a thing people are really
curious about it first of all most of
the iq tests they designed
they like religiously
protect against the correct answers like
you can't find the correct answers
anywhere
in fact the question is ruined once you
know even like
the approach you're supposed to take so
they're very
the approach is implicit in in the
training examples so here it is the
training examples it's over
well which is why in arc for instance
there is a test set that is private and
no one has seen it
no for really tough iq questions it's
not
obvious it's not because the ambiguity
like it's uh and you have to
look to them but like some number
sequences and so on it's not
completely clear so like you can get a
sense but
there's like some you know when you look
at a number sequence i don't know
uh like your fibonacci number sequence
if you look at the first few numbers
that sequence could be completed in a
lot of different ways
and you know some are if you think
deeply or more correct than others
like there's a kind of intuitive
simplicity and elegance to the correct
solution yes
i am personally not a fan of ambiguity
in
in test questions actually but i think
you can have difficulty
uh without requiring ambiguity simply by
making the test
uh require a lot of extrapolation over
the training examples
but the beautiful question
is difficult but gives away everything
when you give the training example
basically yes meaning that so
the the tests i'm interested in in
creating
are not necessarily difficult uh for
humans
because uh human intelligence is the
benchmark
uh they're supposed to be difficult uh
for machines
in ways that are easy for humans like i
think an ideal
uh test of human and machine
intelligence is a test that is
uh actionable uh that
highlights uh the need for progress
and that highlights the direction in
which you should be making progress i i
think
we'll talk about the arc challenge and
the test you've constructed you have
these elegant examples
i think that highlight like this is
really easy for us humans
but it's really hard for machines but
on the you know the designing an iq test
for iqs of like a higher than 160 and so
on
you have to say you have to take that
and put on steroids right you have to
think like what is hard for
humans and that's a fascinating exercise
in in itself i think
and it was an interesting question of
what it takes to create a really hard
question for humans because um
you again have to do the same process as
you mentioned which is
uh you know something
basically where the experience that you
have likely to have encountered
throughout
your whole life even if you've prepared
for iq
tests which is a big challenge that this
will still be novel for you
yeah i mean novelty is a requirement
you should not be able to practice for
the questions that you're gonna be
tested on that's important because
otherwise what you're doing
is not exhibiting intelligence what
you're doing is just retrieving
uh what you've been exposed before it's
it's the same thing as deep learning
model if you train a deep learning model
on
uh all the possible answers then it will
ace your test in the same way that
uh um you know uh as
a stupid student uh can still ace the
test
if they cram for it they memorize
you know 100 different possible mock
exams and then they hope that
the actual exam will be a very simple
interpolation of the mock exams and that
student could just be a deep learning
model at that point
but you can actually do that without any
understanding of the material and in
fact many students
pass the exams in exactly this way and
if you want to avoid that you need an
exam that's
unlike anything they've seen that really
probes
their understanding so how do we
design an iq test for machines
and intelligent tests for machines all
right so
in the paper i outline a number of
requirements
that you expect of such a test and in
particular we should
start by acknowledging the priors
that we expect to be required in order
to perform the test
so we should be explicit about the
priors right uh
and if the goal is to compare machine
intelligence and human intelligence then
we should assume
uh human cognitive bias right and
secondly we should make sure that we are
testing
for skilled acquisition ability uh skill
acquisition efficiency in particular and
not for skill itself
meaning that every task featured in your
test should be
novel and should not be something that
you can anticipate
so for instance it should not be
possible to
brute force the space of possible
questions right
to pre-generate every possible question
and the answer
so it should be tasks that cannot be
anticipated
not just by the system itself but by the
creators
of the system right yeah you know what's
fascinating i mean one of my favorite
aspects of
the paper and the work you do with the
arc challenge is
the the process of making priors
explicit
just even that act alone is a really
powerful one
of like what are it's a
it's a really powerful question ask of
us humans what are the priors that we
bring to the table
so the the next step is like once you
have those priors how do you use them to
solve a novel task but like just even
making the prize explicit
is a really difficult and really
powerful step
and that's like visually beautiful and
conceptually philosophically beautiful
part of the work you did
with uh and i guess continue to do uh
probably with the with the paper and the
arc challenge
can you talk about some of the priors
that we're talking about here
yes so a researcher has done a lot of
work on
what exactly uh um are the knowledge
priors that
that are innate to humans is elizabeth
spelkie from harvard so
she developed the core knowledge uh
theory
which uh outlines four different
uh core knowledge systems uh so systems
of knowledge that we are
basically either born with or that we
are
hardwired to acquire very early on
in our development and there's no uh
there's no strong
um distinction between the two like if
you are
um primed to acquire as a
certain type of knowledge uh in just a
few weeks you might as well
just be born with it it's just it's just
part of who you are
and so there are there are four
different core knowledge systems like
the first one
is the notion of objectness and
a basic physics like you recognize that
um something that moves uh currently for
instance is an object
so we intuitively naturally innately
divide the world into objects based on
this notion of
coherence physical currents and in terms
of elementary physics there's the
the fact that uh you know objects
can bump against each other
and the fact that they can occlude each
other so these are
things that we are essentially born with
or at least that we are
going to be acquiring extremely early
because really
hard wire to acquire them so a bunch of
points
pixels that move together on objects
are partly the same object yes i mean
i mean that like i don't i don't smoke
weed but
if i did that's something i could sit
like all night and just like think about
i remember right
in your paper just object-ness i wasn't
self-aware i guess of how that
particular prior
that that's such a fascinating prior
that like and that's that's the most
basic one but
yes just identity just yeah object yes
it's it's very basic i suppose but
it's so fundamental is this phenomenal
team and cognition
yeah and uh the second prior that's also
fundamental is
agent-ness which is not a real world a
real world
but so agentness the fact that some of
these objects
uh that you that you segment your
environment into
some of these objects are agents so
what's an agent
it's uh basically it's an object that
has
goals um so that has what that has goals
this this capable of person goals so for
instance if you see
two dots uh moving in in a
roughly synchronized fashion you will
intuitively
infer that one of the dots is pursuing
the other
so that one of the dots is
uh and and one of the dots is an agent
and its goal is to avoid the other dot
and one of the dots the other dot
is also an asian and its goal is to
catch
the first start pelkey has shown that
babies you know as young as three months
identify
uh agentness and goal directedness
in their environment another prior is
basic you know geometry and topology
like the notion of distance the ability
to navigate
in your environment and so on this is
something that is fundamentally
hardwired into our brain
it's in fact backed by very specific
neural mechanisms
like for instance grid cells and plate
cells
so it's it's something that's literally
hard coded at the at the new level
uh you know you know hypocampus and the
last prior
would be the notion of numbers like
numbers are not actually cultural
constructs
we are intuitively innately able
to do some basic counting and to compare
quantities
uh so it doesn't mean we can do
arbitrary arithmetic
uh uh counting the actual accounting
scanning like counting one two three ish
then maybe
more than three uh you can also compare
quantities if i give you
uh uh three dots and five dots you can
tell the the
the side with five dots there's more
dots uh so this is actually
an innate uh prior um so
that said the list may not be exhaustive
uh
so spelke is still you know pursuing
the potential existence of new knowledge
systems for instance
uh knowledge systems that would deal
with social
uh relationships yeah yeah i mean
which is which is much much less
relevant uh uh
to something like arc or iq testing
right so there could be stuff
that's uh like like you said rotation
symmetry is really interesting it's very
likely that there is
uh speaking about rotation that there is
uh in the brain
a hard-coded system that is capable of
performing rotations
uh one one famous experiment uh that
people did in the
uh i don't remember who it was exactly
but in the in
the 70s was that
people found that if you asked people if
you give them
uh two different shapes and one of the
shapes
is a rotated version of the first shape
and you ask them
is is that shape a related version of
first step or not
what you see is that the time it takes
people to answer
is linearly proportional right
to the angle of rotation so it's almost
like you have in somewhere in your brain
like a turntable um with a fixed speed
and if you want to know if two two
objects uh
uh are rotated version of each other you
put the object on the turntable
you let it move around a little bit
and then you and then you stop when you
have a match and and that that's really
interesting
so what's the arc challenge so
in in the paper outline you know all
these principles
that a good test of machine intelligence
and humanitarian should follow
and the arc challenge is one attempt
to embody as many of these principles as
possible
so i don't think it's it's anywhere near
a perfect attempt
right it does not actually follow every
principle but
it is what i was able to do given the
given the constraints
so the format of arc is very similar to
classic iq tests in particular
ravens progressive mattresses ravens
yeah ravens privacy mattresses i mean
if you've done like you test in the past
you know where that is probably
at least you've seen it even if you
don't know what it's called and
so um you have a set of uh tasks
that's what they're called and for each
task you have
um training data which is a set of
input and output pairs so i uh an
input or output pair is a grid of colors
basically the grid the size of the
grades these variables
is the size of the grid is variable
and um you're given an input
and you must transform it into the
proper outputs
right and so you're shown a few
demonstrations
of a task in the form of existing input
output pairs and then you're given a new
input
and you must provide you must produce
the correct output and
the assumptions
in arc is that every task should
only require
cool knowledge priors should not require
any
outside knowledge so for instance uh no
language
uh no english nothing like this uh new
concepts
uh taken from uh our human experience
like
trees dogs cats and so on so only
uh tasks that are reasoning tasks that
are built on top of
a core knowledge priors and some of the
tasks are um
actually explicitly trying to probe uh
specific forms of abstraction right
uh part of the reason why i wanted to
create arc
is i'm a big believer in
you know when you're faced with
uh a problem as murky
as understanding how to autonomously
generate abstraction
in a machine you have to co-evolve the
solution
and the problem and so part of the
reason why i design act was to
clarify my ideas about the nature of
abstraction
right and some of the tasks are actually
designed to
to probe uh bits of that theory and
there are things that are
turned out to be very easy for humans to
perform including
young kids right but turn out to be
near impossible informations so whatever
you learn from the nature of abstraction
uh from from designing that
like what can you clarify what you mean
one of the things you wanted to try to
understand
was this uh idea of abstraction
yes so clarifying uh my own ideas about
abstraction by
forcing myself to produce tasks that
would require
uh the ability to produce that form of
abstraction in order to solve them
got it okay so and by the way just uh i
mean people should check out i'll
probably overlay if you're watching the
video part but the
the grid input output
with the different colors on the grid
and that's it that's i mean it's a very
simple world
but it's kind of beautiful it's it's
very similar to classic acutes like it's
not very original in that sense the main
difference with iq tests is that
we make the priors explicit which is not
usually the case in iq test
so you make it explicit that everything
should only be built
out of core knowledge priors i also
think it's generally
more more diverse than iq tests in
general
and it's it perhaps requires a bit more
manual work to produce solutions because
you have to to click around on a grid
for a while sometimes the grades can be
as large as 30 by 30 cells
so how did you come up um if you can
reveal uh with the questions like what's
the process of the questions was it
mostly you
yeah that came up with the questions
what uh how difficult is it to come up
with a question
like is this um scalable
to a much larger number if you think you
know with iq tests you might
not necessarily want to or need it to be
scalable
with machines it's possible you
could argue that it needs to be scalable
so there are a thousand questions
a thousand tasks yes wow including the
test and the private test set
i think it's fairly difficult in the
sense that a big requirement
is that every task should be novel
and unique and unpredictable right like
you don't want to create
your your own little world that is
uh simple enough that it would be
possible for a human
to reverse and generate and write down
an algorithm that could generate every
possible arc task and their solution for
instance that we completely invalidated
the test so
you're constantly coming up on new stuff
you need yeah you need a source
of novelty of
unthinkable novelty and one thing i
found is that
as a human uh you are not a very good
source
of uh unthinkable novelty and so you
have to
pace the creation of these tasks quite a
bit there are only so many
unique tasks that you can do in a given
day
so that means coming up with truly
original new ideas
um did psychedelics help you at all i'm
just gonna
but i mean that's fascinating to think
about like so you would be like
walking or something like that are you
constantly thinking of something totally
new
yes i mean this is hard
this is yeah i i i mean i i'm not saying
i've done anywhere near a perfect job at
it uh there is some
amount of redundancy and there are many
imperfections in arc
so that said you should you should
consider arc as a work in progress
it is not uh the definitive state
uh where where the the arc tasks today
are not
definitive states of the test i want to
keep refining it
um in the future i also think it should
be possible to open up the creation of
tasks
to a broad audience to do crowdsourcing
um
that would involve several levels of
filtering obviously
but i think it's possible to apply
crowdsourcing to to develop a much
bigger and much more diverse arc data
set
that would also be free of potentially
you know some of
my own personal biases but is there
always need to be
a part of arc that's the test like
is hidden yes absolutely it is
impressive that uh the
test that you're using to actually
benchmark
algorithms is not accessible to the
people developing these algorithms
because otherwise what's going to happen
is that the human engineers are just
going to
solve the tasks themselves and and
encode their solution in program form
but that again what you're seeing here
is
the process of intelligence happening in
the mind of the human and
and then you're just uh capturing its
crystallized output but that
crystallized output
is not the same thing as the process
generated that's right it's not
intelligent in itself so what uh
by the way the idea of crowdsourcing it
is fascinating
i think i think the creation of
questions
is really exciting for people i think i
think there's a lot of really brilliant
people out there that love to create
these kinds of stuff
yeah one thing that uh that kind of
surprised me that i wasn't expecting is
that
lots of people seem to actually enjoy
ark as a as a kind of game
and i was really seeing it as as a test
as a benchmark
uh of uh a fluid uh general intelligence
and lots of people just including kids
just started you know enjoying it as a
game so i think that's that's
encouraging
yeah i'm fascinated by there's a world
of people who create iq
questions i think
i think that's a cool uh it's a cool
activity for machines that for humans
and people
humans are themselves fascinated by
taking the questions like
you know measuring their own
intelligence
i mean that's just really compelling
it's really interesting to me too it
helps
one of the cool things about arc you
said it's kind of uh inspired by iq
tests or whatever follows a similar
process but because of its nature
because of the context in which it lives
it immediately forces you to think about
the nature of intelligence as opposed to
just a test of your own like it forces
you to really think there's i don't know
if it's if it's within the question
inherent in the question or just the
fact that it lives
in the test that's supposed to be a test
of machine intelligence absolutely as
you
as you solve arc tasks as a human
you will uh be forced to basically
introspect
yeah higher how you come up with
solutions and that forces you to reflect
on uh the human problem solving process
and the way your own mind uh generates
uh abstract representations
of the problems uh it's exposed to
i i think it's due to the fact that the
set of core knowledge priors
that arc is built upon is so small it's
all
a recombination of a very very
small set of assumptions
okay so what's the future of ark so you
you held arc as a challenge as part of
like a kegel competition
yes calgary competition and
uh what do you think do you think that's
something that continues for
five years ten years like just continues
growing yes absolutely
so arc itself will keep evolving so i've
talked about crowdsourcing i think
that's a that's a
good avenue another thing i'm starting
is
i'll be collaborating with folks from
the psychology department at nyu
to do human testing on arc and i think
there are lots of interesting questions
you can start asking especially as you
start correlating machine solutions
to arc tasks and and
the human characteristics of solutions
like for instance you can
try to see if there's a relationship
between the
human perceived difficulty of a task
and the machine person yes and and
exactly some measure of machine
perceived difficulties yeah
it's a nice big playground in which to
explore this very difference
it's the same thing as we talked about
the autonomous vehicles the things that
could be difficult for humans might be
very different than the things that yes
absolutely and uh formalizing or making
explicit
that difference in difficulty will teach
us something may teach us something
fundamental about intelligence
so one thing i think we did well uh with
arc
is that it's proving to be a very
uh actionable test in the sense that
uh machine performance and arcs started
at very much
zero initially while you know
humans found actually the tasks very
easy
and that that alone was like a big red
flashing light saying that something is
going on
and that we are missing something and at
the same time
uh machine performance did not stay at
zero for very long actually within two
weeks of the carol competition we
started having
a non-zero number and now the state of
the art is around
uh twenty percent of the test set uh
solved
um and so arc is actually a challenge
where
our capabilities start at zero which
indicates the need for progress
but it's also not an impossible change
it's not accessible you can start making
progress
basically right away at the same time
we are still very far from having solved
it and that's actually
a very positive outcome of the
competition is that the competition has
has proven that there was
no obvious shortcut to solve these tasks
right yeah so the test held up yeah
exactly that was the primary reason to
do the cargo competition is to check
if some some you know clever person was
going to
hack the benchmark and that did not
happen
right like people who are solving the
tasks are essentially doing it
uh uh well in a way they're they're
they're actually exploiting some flaws
of art that we will need to address in
the future especially they're
essentially anticipating what sort of uh
tasks may be contained in the test sets
right right um which is kind of
yeah that's the kind of hacking it's
it's human hacking of the town yes
that that said you know uh uh with the
state of the art it's like
uh 20 percent we're still very very far
uh from even level which is closer to
and so and i i do believe that you know
it will
it will take a while uh until
we reach a human parity
on ark and that by the time we have
human party
we will have ai systems that are
probably pretty close to human level
in terms of general fluid intelligence
which is i mean it's
they're not going to be necessarily
human-like they're not necessarily
uh you would not necessarily recognize
them as you know being an egi
but they would be capable of a degree of
generalization that matches the
generalization
performed by human food intelligence
sure i mean this is a good point in
terms of
general flu intelligence to mention in
your paper you describe different kinds
of generalizations
uh local broad extreme and there's kind
of a hierarchy that you form
so when we say generalizations
what are we talking about what kinds are
there right
so uh generalization is is very old idea
i mean
it's even older than machine learning in
the context of machine learning you say
a system generalizes if it can uh
make sense of an input it has it has not
yet seen
and that's what i would call a
system-centric
uh generalization you is generalization
with respect to novelty
uh for the specific system you're
considering so i think a good test of
intelligence should actually
uh deal with uh developer aware
generalization which is slightly
stronger than system-centric transition
so developer generalization developer
aware generalization would be
the ability to generalize to novelty or
uncertainty that
not only the system itself has not
accessed to but the developer of the
system
could not have access to either that
that's a fascinating that's a
fascinating meta definition
so like the system is uh it's basically
the edge case thing we're talking about
with autonomous vehicles
yes neither the developer nor the system
know about the edge cases
so it's up to they get the system should
be able to generalize the thing that
that uh nobody expected neither the
designer of the training data
nor obviously the contents of the
training
that's a fascinating definition so you
can see generalization degrees of
generalization as a spectrum
and the lowest level is what machine
learning
is trying to do is the assumption that
any new situation is going to be sampled
from a static distribution of possible
situations
and that you already have a
representative sample of that
distribution that's your training data
and so in machine learning you
generalize to a new sample from a known
distribution
and the ways in which your new sample
will be
new or different are ways that are
already understood by the developers of
the system
so you are generalizing to known
unknowns
for one specific task that's what you
would call robustness
you are robust to things like noise
small variations and so on
um for one a fixed known
distribution that that you know through
your training data
and a higher degree would be
flexibility in machine intelligence so
flexibility would be
something like an l5 cell driving car or
maybe a robot that can
you know pass the the coffee cup test
which is
the notion that you would be given a
random kitchen
uh somewhere in the country and you
would have to you know
go make a cup of coffee in that kitchen
right so flexibility would be the
ability to deal with
unknown unknowns so things that could
not
uh dimensions of viability that could
not have been possibly foreseen
by the creators of the system within one
specific task
so generalizing to the long tail of
situations
in self-driving for instance would be
flexibility so you have robustness
flexibility and finally you would have
extreme generalization
which is basically flexibility but
instead of just considering one specific
domain like driving or domestic robotics
you're considering an open-ended range
of possible domains
so a robot would be capable of
extreme generalization if let's say it's
designed and trained
uh to to for cooking for instance
um and if i if i buy the robot
and if i'm able uh if it's able uh to
teach itself
gardening in in a couple weeks it would
be capable
of extreme generalization for instance
so the ultimate goal is extreme
generalization yes
so be uh creating a system that is
so general that it could essentially
achieve a human skill parity over
arbitrary tasks and arbitrary domains
with the same level of you know
improvisation and adaptation power as
humans when
when it encounters new situations and it
would do so
uh over basically the same range of
possible domains and tasks
uh as humans and using this essentially
the same amount of training experience
of practice as humans would require
that will be human level extreme
generalization
so i i don't actually think humans are
anywhere near the uh
optimal intelligence bound if there is
such a thing
so i think for humans or in general in
general
i think it's quite likely you know that
there is an
a hard limit to how intelligent
any system can be but at the same time i
don't think humans are anywhere near
that limit
yeah last time i think we talked i think
you had this idea that
uh we're only as intelligent as the
problems we face
sort of uh yes we are upper bounded by
the problem so
in a way yes we are we are bounded by on
our environments
and we are bounded by the problems we
try to solve
yeah yeah what do you make of neuralink
and
uh outsourcing some of the
brain power like bring computer
interfaces do you think we can expand
our augment our intelligence
i am fairly skeptical
of neural interfaces because they are
trying
to fix one specific bottleneck
in in human machine cognition which is
the bandwidth bottleneck input and
output of information
in the brain and my perception
of the problem is that bandwidth is not
at this time a bottleneck at all
meaning that we already have sensors
that enable us to
to take in far more information than
what we can
actually process well to push back on
that a little bit
uh to sort of play dell's advocate a
little bit is
if you look at the internet wikipedia
let's say wikipedia
i would say that humans after the advent
of wikipedia
are much more intelligent yes
i think that's a good one but that's
also not about that's about
um externalizing our intelligence
via uh uh information processing systems
the accidental function processing
system which is very different from
uh brain computer interfaces right but
the question is
whether if we have direct access if our
brain has direct access
to wikipedia with our brain already has
direct access to wikipedia
it's on your phone and you have your
hands and your eyes
and your ears and so on uh to access
that information and the speed at which
you can access it
is bottlenecked by the customer i think
it's already closed
fairly close to optimal which is why
speed reading for instance
does not work yeah the faster you read
the less you understand
but maybe it's because it uses the eyes
so maybe
um so i don't believe so i think you
know the brain is very slow
um it typically operates you know the
fastest
things that happen in the brain at the
level of 50 milliseconds
uh forming a conscious out can
potentially take
entire seconds right and you can already
read pretty fast
so i think the speed at which you can
take information
in and even the speed at which you can
add with information
can only be very incrementally improved
maybe i think if you're a very fast
typer if you're a very trained typer
the speed at which you can express your
thoughts is already a speed
at which you can form your thoughts
right so that's kind of an idea that
there are fundamental bottlenecks to the
human mind but it's possible
that the everything we have in the human
mind
is just to be able to survive in the
environment and
there's a lot more to expand maybe
you know you said this the speed of the
thought so yeah i i think
augmenting human intelligence is a very
valid
and very powerful avenue right and
that's what computers
are about in fact that's what you know
all of culture
and civilization is about they are
culture
is externalized cognition and we rely on
culture
to think constantly yeah yeah i mean
that's that's another
yeah that's not just not just computers
not just phones and the internet i mean
all of culture like language for
instance is a form of external
recognition books are obviously
external recognition yeah that's right
and you you can scale
that external exclamation you know far
beyond the capability
of the human brain and you could see you
know civ
civilization itself is um
it has capabilities that are far beyond
any individual brain
and will keep scaling it because it's
not rebound by individual brains
it's a different kind of system yeah and
and that system includes non-human
non-humans first of all includes all the
other biological systems which are
probably contributing to the overall
intelligence of the organism
and then yeah computers are part of it
no non-human systems probably not
contributing much but
ais are definitely contributing to that
like google search for instance
big part of it
yeah yeah a huge part
a part we can probably introspect like
how the world has changed in the past 20
years
it's probably very difficult for us to
be able to understand
until of course whoever created the
simulation we're in is
probably doing metrics measuring the
progress
there was probably a big spike in
performance
uh they're enjoying they're enjoying
this
so what are your thoughts on um
the touring test and the lobner prize
which is the
you know one of the most famous attempts
at the test of human intelligence
uh sorry of artificial intelligence
by uh doing a natural language open
dialogue
test that's test that's uh
judged by humans as far as how well the
machine did
so i'm not a fan of the chewing test
itself
or any of its variants for two reasons
so first of all it's um
it's really copping out
of trying to define and measure
intelligence because it's entirely
outsourcing that to a panel
of human judges and these human judges
they may not themselves have any proper
methodology
they may not themselves have any proper
definition of intelligence
they may not be reliable so the joint is
already failing
one of the core psychometrics principles
which is reliability because you have
biased human judges uh it's also
violating the
the standardization requirement and the
freedom from bias requirement
and so it's really a coop out because
you are outsourcing
everything that matters which is
precisely describing
intelligence and finding a standalone
test
uh um to measure it you're outsourcing
everything to
uh to people so it's really cool and by
the way
uh we should keep in mind that uh when
turing
proposed uh the imitation game it was
not
meaning for the imitation game to be an
actual
uh goal for the field of ai an actual
test of intelligence he was using
uh it was using the imitation game as a
thought experiment in a philosophical
discussion
in his uh 1950 paper he was
trying to argue that
theoretically it should be possible
for something very much like the human
mind indistinguishable from the human
mind
to be encoded in ensuring machine and at
the time
that was that was you know um
a very daring idea it was stretching
credibility but
uh nowadays i think it's it's fairly
well accepted that the
the mind is an information processing
system and that you could probably
encode it
into a computer so another reason why
i'm not a fan of this type of test is
that
it the incentives that it creates
are incentives that are not conducive
to proper scientific research
if your goal is to trick to convince
a panel of human judges that they're
talking to a human
then you have an incentive to rely
on on tricks and press the digitization
in the same way that let's say you're
doing physics and you want to solve
teleportation
and what if the test that you set out to
pass is you need to convince
a panel of judges that teleportation
took place
and and they're just sitting there and
watching what you're doing
and that is something that you can
achieve with
you know david copperfield could could
achieve it in his in his show at vegas
right
but is it and what he's doing is very
elaborate
but it's not actually it's not
physics it's not making any progress you
know understanding of the universe right
to push back on that it's possible
that's the hope with these kinds of
subjective evaluations
is that it's easier to solve it
generally than it is to
come up with tricks that convince a
large number of judgments that's the
whole
in practice when it turns out that it's
very easy to deceive people
in the same way that you know you can
you can do magic in vegas you can
actually
very easily convince people that they're
talking to human when they're actually
talking to liberalism
i just disagree i disagree with that i
think it's easy i i would i would push
it's not easy it's uh uh it's doable
it's very easy because i wouldn't say
it's very easy though we are biased
like we have theory of mind we are
constantly projecting
emotions intentions yes uh uh
agentness agentness is one of our core
innate priors right we are projecting
these things on everything around us
like
if you if you paint a smiley on a rock
the rock becomes happy
you know eyes and because we have this
extreme bias that permeates
everything everything we see around this
it's actually pretty easy to trick
people
like this it is very very short i so
totally disagree with that
you brilliantly put there's a huge it's
a
the anthropomorphization that we
naturally do the agentness
of that word is that real word but no
it's not a real word i like it but it's
exactly why it's
useful well it's a useful word let's
make it real it's a huge help
but i still think it's really difficult
to convince uh
if you do like the alexa prize
formulation where you know you talk for
an hour
like there's formulations of the test
you can create where it's very difficult
so i like i like the extra price better
because
it's more pragmatic it's more practical
it's
actually incentivizing developers to
create something that's useful
yeah as a as as a a human
machine interface uh so that's slightly
better
than just the imitation so i like your
your
your ideas like a test which hopefully
help us in creating intelligent systems
as a result like if you create a system
that passes it
it'll be useful for creating further
intelligence systems
yes at least yeah i mean i'm just to
kind of
comment i'm a little bit surprised
how little inspiration people draw from
the touring test today you know
the media and the popular press might
write about it every once in a while
the philosophers might talk about it but
like
most engineers are not really inspired
by it and
i know i know you don't like the touring
test
but uh we'll have this argument another
time you know
i there's something inspiring about it i
think that
as as a philosophical device in a
physical discussion i think there is
something very interesting about it i
don't think it is
in practical terms i don't think it's
it's conducive to to progress
and one of the reasons why is that you
know
i think being very human-like being
indistinguishable from a human
is actually the very last step in the
creation of machine intelligence that
the first
ais that will show strong generalization
uh uh in in uh that that will actually
uh implement human like broad cognitive
abilities
they will not actually be able to look
anything
uh like humans human likeness is the
very last step in that process
and so a good test is a test that points
you towards the first step
uh on the ladder not towards the top of
the ladder right yeah so to push back on
that so i guess
i usually agree with you on most things
i remember you i think at some point
tweeting
something about the turing test not
being being counterproductive or
something like that
and i think a lot of very smart people
agree with that i
uh uh a uh you know uh
computation speaking not very smart
person uh disagree with that because i
think there's some magic to the
interactivity
interactivity with other humans so to
push to play devil's advocate on your
statement
it's possible that in order to
demonstrate the
the generalization abilities of a system
you have to
show your ability in conversation show
your ability to
adjust adapt to the conversation
through not just like as a standalone
system but
through the process of like the
interaction like game theoretic
where you're you really are
changing the environment by your actions
so in the arc challenge for example
you're an observer you can't you can't
scare
the test into into changing you can't
talk to the test
you can't play with it so there's some
aspect of that interactivity
that becomes highly subjective but it
feels like it could be conducive
to yeah generalization you make a great
point the interactivity
is a very good setting to force the
system to show
adaptation to shoot generalization uh
that that said you at the same time uh
it's not something very scalable because
you rely on human judges
it's not something reliable because the
images may not may not so you don't like
human judges
basically yes and i think so i i love
the idea of interactivity
um i initially wanted an arc test
that had some amount of interactivity
where your score on a task would not be
one or zero if you can solve it or not
but would be the
number of attempts
that you can make before you hit the
right solution which
means that now you can start applying
the scientific method
as you solve our tasks that you can
start formulating hypothesis
and and and probing the system to see
whether the hypothesis is
the observation will match the buddhists
or not it would be amazing if you could
also
even higher level than that measure the
quality of your attempts
which of course is impossible but again
that's gets subjective yes like how good
was your thinking
like it's yeah how efficient
was so one thing that's interesting
about this notion of
scoring you as how many attempts you
need is that you can start producing
tasks that are way more ambiguous
right right because
you can with the problem with the with
the different attempts you can actually
probe that ambiguity right right
so that's in a sense which yeah so
it's how good can you adapt to the
uncertainty
and reduce the uncertainty yes
it's half fast with is the efficiency
with which to reduce uncertainty
in in program space exactly very
difficult to come up with that kind of
test though yeah so
uh i would love to be able to create
something like this in practice
it would be it would be very very
difficult but yes but uh
i mean what you're doing what you've
done with the arc challenge is
is uh brilliant i'm also not i'm
surprised that it's not more popular
but i think it's picking up what does it
snitch it does
yeah what are your thoughts about
another test that
talks with marcus hutter he has the
harder prize for
compression of human knowledge and the
idea is really sort of
quantify like reduce the test of
intelligence purely to just
ability to compress what's your
thoughts about this intelligence that's
compression
i mean it's a it's a very uh fun test
because it's it's such a simple idea
like you're given wikipedia basically
english wikipedia
and you must compress it and so it stems
from
the idea that cognition is compression
that the brain is basically a
compression algorithm this is a very old
idea
it's a very i think striking
and beautiful idea i used to believe it
uh i eventually had to realize that it
was it was
very much a flawed idea so i no longer
believe that compression
is recognition is compression so but i
can tell you
what's the difference so it's very easy
to believe
that cognition and compression are the
same thing because
uh so jeff hawkins for instance says
that cognition
is prediction and of course prediction
is basically
the same thing as compression right it's
just
including the temporal axis and it's
very easy to believe this because
compression is something that we do
all the time very naturally we are
constantly you know
compressing information we are um
uh constantly trying we have this bias
towards simplicity we
we're constantly trying to organize
things in our mind and around
us to be more regular right so
uh it's it's a beautiful idea it's very
easy to believe
uh there is a big difference between uh
what we do with our brains
and and compression so compression is
actually kind of
a tool in the human cognitive toolkits
that is
is used in many ways but it's just a
tool it is not
it is a tool for cognition it is not
cognition itself
and the big fundamental difference is
that
cognition is about being able
to operate in future situations
that include fundamental uncertainty
and novelty so for instance consider a
child
at age 10 and so they have 10 years
of life experience they've gotten you
know pain pleasure
rewards and and punishment in a period
of time
if you were to generate the shortest
behavioral program that would have
basically
run that child over this 10 years in an
optimal way
right the shortest optimal behavioral
program given the
experience of that child so far well
that program
that that compress program this is what
you would get if the mind of the child
was a compression algorithm essentially
um would be utterly
enable inappropriate to process the next
70 years in the in the life of the child
so in the models with
we build of the world we are not trying
to make them actually
optimally compressed we are we are using
compression
as a tool to promote simplicity and
efficiency in our models
but they are not perfectly compressed
because they need to include
things that are seemingly useless today
that have seemingly been useless so far
but that may turn out to be useful
in the future because you just don't
know the future unless
that's the fundamental principle uh that
cognition
that intelligence arises from is that
you need to be able to run appropriate
behavioral programs except you have
absolutely no idea
what sort of context environment
situation
are going to be running in and you have
to deal with that with that
uncertainty with that future normality
so an
analogy an analogy that you can make is
with investing for instance um
if i look at the past uh uh you know 20
years of stock market data
and i use a compression algorithm to
figure out the best
trading strategy it's going to be you
know you buy apple stock then maybe the
past few years you buy tesla stock or
something
but is that strategy still going to be
true for the next 20 years
well actually probably not which is why
if you're a smart investor you're not
you're not just going to be
following the strategy that
corresponds to compression of the past
uh you're going to be following
uh uh you're going to have a balanced
portfolio
yeah right because you just don't know
what's going to run
i mean i guess in that same sense the
compression is analogous to
what you talked about which is like
local or robust generalization
versus extreme generalization it's much
closer to that
side of being able to generalize in
in a local sense that's why you know as
humans as uh when we are when we are
children
um in our education so a lot of it is
driven by place
even by curiosity uh we we are not
efficiently compressing things we're
actually exploring
we are um retaining all kinds of uh
uh things from our environment that that
seem to be completely useless because
they might
turn out to be eventually useful right
and it's it's that's what cognition is
really about and that what makes it
antagonistic to compression is that
it is about hedging for future
uncertainty and that's
efficient into compression yes
especially hedging so
uh cognition leverages compression as a
tool to promote uh
uh to promote efficiency right and so
in that sense in our models it's like
einstein said
make it simpler but not however that
quote goes but not
too simple so you want to compression
simplifies things but you don't want to
make it too simple
yes so a good model of the world is
going to include
all kinds of things that are completely
useless actually just because just in
case
yes because you need diversity in the
same way that in your portfolio you need
all kinds of stocks that
that may not have performed well so far
but you need diversity and the reason
you need diversity because
fundamentally you don't know what you're
doing and the same is true of the human
mind
is that it needs to to behave
appropriately
in the future and it has no idea what
the future is going to be like
it's a bit it's not going to be like the
past so compressing the past is not
appropriate because the past
is not uh it's not proactive in the
future
yeah history repeats itself but not
perfectly
i don't think i asked you last time the
most inappropriately absurd question
we've talked a lot about intelligence
but you know the bigger question from
intelligence is of meaning you know
intelligence systems are kind of
goal-oriented
there's throws optimizing for goal if
you look at the
harder prize actually i mean there's
always there's always a clean
formulation of a goal
but the natural questions for us humans
since we don't know our objective
function is what is the meaning of it
all
so the absurd question is what
francois chole do you think is the
meaning of life
what's the meaning of life yeah that's a
that's a big question
um and i think i can i can you know
give you my answer at least
one of my answers and
so you know the one thing that's uh very
important
uh in understanding who we are is that
everything that makes up uh that makes
up ourselves it makes up we are
even even your most personal thoughts
is not actually your own right like even
your most personal thoughts
are expressed in words that you did not
invent and are built
on concepts and images that you did not
invent
we are very much uh cultural beings
right well we are made of culture we are
not that what makes us different from
animals for instance
right so we are everything about
ourselves
is an echo of the past an
echo of people who lived uh before us
right that's who we are and
in the same way if we manage to
contribute
something to the collective edifice of
culture
a new idea maybe a beautiful piece of
music a work of art
a grand theory a new word
maybe um that something
is is going to become a part
of the minds of future humans
essentially
forever so everything we do
creates ripples right that propagates
into the future
and i i and that that's in a way this is
this is
our path to immortality is that
as we contribute things to culture
culture in turn in turn becomes future
humans
and we keep influencing people you know
thousands of years from now so our
actions today
create reports and these reports i think
basically sum up the meaning of life
like in the same way that we are the the
sum
um of the interactions between many
different reports that came from our
past
we are ourselves creating reports that
will propagate into the future
and that's why you know we should be
this seems like
perhaps anything to say but we should be
kind to others during our time
on earth because every act of kindness
creates reports and
and in reverse every act of violence
also creates reports and you want
you want to carefully choose which kind
of reports you want to create
and you want to propagate into the
future and in your case
first of all beautifully put but in your
case creating ripples into the future
human and future agi systems
yes it's fascinating all success
i don't think there's a better way to
end it francois as always
for second time and i'm sure many times
in the future it's been a huge honor
you know one of the most brilliant
people in the machine learning computer
science
science world again that's a huge honor
thanks for talking today it's been a
pleasure thanks a lot for having me
i really appreciate it thanks for
listening to this conversation with
friend squash chole
and thank you to our sponsors babble
master class
and cash app click the sponsor links in
the description to get a discount
and to support this podcast if you enjoy
this thing subscribe on youtube
review five stars on apple podcast
follow on spotify
support on patreon or connect with me on
twitter at lex friedman
and now let me leave you with some words
from rene descartes in 1668
an except of which francois includes in
is on the measure of intelligence paper
if there were machines which bore a
resemblance to our bodies
and imitated our actions as closely as
possible for all practical purposes
we should still have two very certain
means of recognizing
that they were not real men the first is
that they could never use
words or put together signs as we do in
order to declare our thoughts to others
for we can certainly conceive of a
machine so constructed
that it utters words and even utters
words that correspond to bodily actions
causing a change in its organs
but it is not conceivable that such a
machine should produce different
arrangements of words
so as to give it an appropriately
meaningful answer to whatever is said
in his presence as the dullest of men
can do
here descartes is anticipating the
turing test and the argument still
continues to this day
secondly he continues even though some
machines might do
some things as well as we do them or
perhaps even better
they would inevitably fail in others
which would reveal
that they're acting not from
understanding but only from the
disposition of their organs
this is incredible quote for
whereas reason is a universal instrument
which can be used in all kinds of
situations
these organs need some particular action
hence
it is for all practical purposes
impossible for machine to have enough
different organs to make it
act in all the contingencies of life in
the way
in which our reason makes us act that's
the debate
between mimicry memorization versus
understanding
so thank you for listening and hope to
see you
next time
you