Dileep George: Brain-Inspired AI | Lex Fridman Podcast #115
tg_m_LxxRwM • 2020-08-14
Transcript preview
Open
Kind: captions
Language: en
the following is a conversation with the
leap george
a researcher at the intersection of
neuroscience and artificial intelligence
co-founder of vicarius with scott
phoenix
and formerly co-founder of numenta with
jeff hawkins
who's been on this podcast and donna
dubinsky
from his early work on hierarchical
temporal memory to recursive cortical
networks
to today the leaps always sought to
engineer intelligence
that is closely inspired by the human
brain
as a side note i think we understand
very little about the fundamental
principles
underlying the function of the human
brain but the little we do know
gives hints that may be more useful for
engineering intelligence than any
idea in mathematics computer science
physics and
scientific fields outside of biology and
so the brain is a kind of existence
proof that says
it's possible keep at it i should also
say that brain-inspired ai
is often over-hyped and used as fodder
just as
quantum computing for uh marketing speak
but i'm not afraid of exploring these
sometimes over-hyped areas since
where there's smoke there's sometimes
fire
quick summary of the ads three sponsors
babel
raycon earbuds and masterclass please
consider supporting this podcast by
clicking the special links
in the description to get the discount
it really is the best way to support
this podcast
if you enjoy this thing subscribe on
youtube review 5 stars on apple podcast
support on patreon i'll connect with me
on twitter at lex friedman
as usual i'll do a few minutes of ads
now and never any ads in the middle that
can break the flow of the conversation
this show is sponsored by babel an app
and website that gets you speaking in a
new language within weeks
go to babel.com and use codelex to get
three months free
they offer 14 languages including
spanish french
italian german and yes russian
daily lessons are 10 to 15 minutes super
easy effective
designed by over 100 language experts
let me read a few lines from the russian
poem
by alexander block that you'll start to
understand if you sign up to babel
now i say that you'll only start to
understand this poem
because russian starts with the language
and ends with the vodka
now the latter part is definitely not
endorsed or provided by babble
and will probably lose me the
sponsorship but once you graduate from
babel
you can enroll my advanced course of
late night russian conversation over
vodka
i have not yet developed an app for that
it's in progress
so get started by visiting babel.com and
use code lex
to get three months free this show
is sponsored by raycon earbuds get them
at byraycon.com
flex they become my main method of
listening to podcasts audiobooks and
music
when i run do push-ups and pull-ups or
just
living life in fact i often listen to
brown noise with them
when i'm thinking deeply about something
it helps me focus
they're super comfortable pair easily
great sound
great bass six hours of play time
i've been putting in a lot of miles to
get ready for a potential ultra marathon
and listening to audio books on world
war ii
the sound is rich and really comes in
clear so again get them
at byraycon.com lex
this show is sponsored by masterclass
sign up at masterclass.com
lex to get a discount and to support
this podcast
when i first heard about class i thought
it was too good to be true
i still think it's too good to be true
for 180 bucks a year you get an
all-access pass to watch courses from
to list some of my favorites chris
hadfield on space exploration
neil degrasse tyson on scientific
thinking and communication
will wright creator of some city and
sims like game design
every time i do this read i really want
to play
a city builder game carlos santana on
guitar
caspar daniel negrano on poker and many
more
chris hadfield explaining how rockets
work and the experience of being
launched into space alone
is worth the money by the way you can
watch it on basically any device
once again sign up at masterclass.com to
get a discount and to support this
podcast
and now here's my conversation with the
leap
george do you think we need to
understand the brain
in order to build it yes if you want to
build the brain
we definitely need to understand how it
works so
blue brain or henry markham's project
is trying to build a brain without
understanding it like just trying to
uh put details of the brain
from neuroscience experiments into a
giant simulation
by putting more and more neurons more
and more details but that
is not going to work because
when it doesn't perform as uh what you
expect
it to do then what do you do you do you
just keep adding more details
how do you debug it so it's a so unless
you
understand unless you have a theory
about how the system is supposed to work
how the pieces are supposed to fit
together what they're going to
contribute
you can't you can't build it at the
functional level understand
so can you actually linger on and
describe the blue brain project
it's kind of fascinating uh
principle an idea to try to simulate the
brain
as we're talking about the human brain
right right human brains
and rad brains or cat brains
have lots in common that the cortex the
neocortex structure
is very similar so initially they were
trying to
just simulate a cat brain uh and uh
to understand the nature of evil they
understand the nature of evil
or uh as it happens in most of these
simulations
uh you you easily get one thing out
which is oscillations you know yeah if
you
if you simulate a large number of
neurons they
oscillate and you can adjust the
parameters and say that oh selections
match the
rhythm that we see in the brain etc but
uh oh i see so like uh so the idea is
uh is the simulation at the level of uh
individual neurons
yeah so the blue brain project the
original
idea as proposed was um you
you put very detailed bio physical
neurons
uh bios physical models of neurons
and you interconnect them according to
the
statistics of connections that we have
found from real neuroscience experiments
and then uh turn it on and uh see what
happens
uh and and these neural models are you
know incredibly
complicated in themselves right because
these neurons
are modeled using uh this
idea called hodgkin-huxley models which
are about
how signals propagate in a cable and
there are
active dendrites all those phenomena
which
those phenomena themselves we don't
understand that well uh and then
uh we put in connectivity which is part
guess work
part you know observed and of course if
you do not have any theory about how it
is
supposed to work uh we
you know we just have to take whatever
comes out of it as
okay this is something interesting but
in your sense like these models of the
way signal travels
along or like with the axons and all the
basic models
that's they're too crude oh well
actually
they are pretty detailed and pretty
sophisticated
and they do replicate the
neural dynamics if you take a single
neuron
and you you try to uh
turn on the different channels the
calcium channels and
uh the different receptors uh and see
what the effect
of uh turning on or off those channels
are
in the neurons spike output people have
built pretty sophisticated models of
that and
and they are i i would say um
you know in the regime of correct well
see the correctness
that's interesting because you've
mentioned in several levels uh
the correctness is measured by looking
at some kind of aggregate statistics
it would be more of the the spiking
dynamics
in dynamics yeah and and yeah these
models
because they are they are going to the
level of mechanism right so they are
basically looking at uh okay what what
is the effect of
turning on an ion channel uh and um
and you can you can model that using
electric circuits in
and then so they are model so it is not
just
a uh function fitting it is people are
looking at the mechanism underlying it
and uh putting that in terms of electric
circuit
theory signal propagation theory and and
modeling that and
so those models are sophisticated but
getting a
single neurons model 99
right does not still tell you how to
you know it would be the analog of
getting a transistor
model right and now trying to build a
microprocessor
um and if you if you just uh observe you
know if you did not understand how a
microprocessor works
uh but you say oh i have i now can model
one transistor well
and now i will just try to interconnect
uh the transistors
according to whatever i could you know
guess from the experiments and try to
simulate it
um then it is very unlikely that you
will
produce a functioning microprocessor um
you want to you know when you want to
uh produce a functioning microprocessor
you want to understand boolean logic
how does how do the the gates work all
those things
and then you know understand how do
those gates get implemented using
transistors
yeah there's actually i remember this
reminds me this is a paper
maybe you're familiar with it i remember
going through in a reading group
that approaches a microprocessor from a
perspective a neuroscientist
i think it it basically it uses all the
tools that we have of neuroscience to
try to understand like as if we just
aliens showed up to study computers uh
yeah and and to see if if those tools
could be used to get
any kind of sense of how the
microprocessor works
i think the final the takeaway from
the at least this initials uh
exploration is that
we're screwed there's no way that the
tools of neuroscience would be able to
get us
to anything like not even boolean logic
i mean it's
just a any aspect
of the architecture of the uh
function of the processes involved
uh the the clocks the the timing all
that you can't figure that out from the
tools of neuroscience
yes i'm very familiar with this this
particular paper
i think it was uh called um can
uh a neuroscientist understand a
microprocessor
yeah something like that following the
methodology
in that paper even an electrical
engineer would not understand
microprocessors so i could
so i could so i i don't think it is
that bad in the sense of saying um
neuroscientists do
find valuable things uh by observing the
brain
they they do find good insights um
but those insight cannot be put together
just as a simulation you have to you
have to investigate
what are the computational underpinnings
pinnings of
those findings how do all of them
fit together from an information
processing
perspective you have to you have to
somebody has to
uh painstakingly put those things
together and build hypothesis um
so i don't want to this all of
neuroscience is saying oh they are not
finding anything no that
you know that that paper almost went to
that level of uh uh
neuroscientists will never understand uh
no that that's not true i think they do
find lots of useful things
but it has to be put together in a
computational framework
yeah i mean but you know just the ai
systems will be listening to this
podcast a hundred years from now
and it will probably there's some
nonzero probability they'll find your
words laughable
it's like i remember humans thought they
understood
something about the brain they're
totally clueless there's a sense about
neuroscience that we may be in the very
very early days of understanding
uh the brain but i mean that's one
perspective
in your perspective
how far are we into understanding
uh any aspect of the brain
so the the the dynamics of the
individual neuron communication
to the how when they in in a collective
sense how they're able to store
information transfer information
how the intelligence then emerges all
that kind of stuff where are we on that
timeline
yeah so you know timelines are very very
hard to predict
and you can of course be wrong uh and it
can be wrong in
on either side uh you know we know that
uh now when we look back uh the first
flight
was in 1903. uh in 1900
there was a new york times article on
flying machines that
do not fly and and you know humans might
not fly for another hundred years
that was what that article uh stated and
uh so
but no they they flew three years after
that so it is you know it's very hard to
um so well and on that point one of the
wright brothers
uh i think two years before
uh said that uh like he said like some
number like 50 years
he he has become convinced that it's
it's uh it's impossible
even during their experimentation yeah
yeah yeah i mean that's a tribute to
when
that's like the entrepreneurial battle
of like depression of
going through just like thinking this is
impossible
right but there yeah there's something
even the person that's in it
is not able to see uh estimate correctly
exactly but i can i can tell from the
point of you know objectively what are
the things that we
know about the brain and how that can be
used to build
ai models which can then go back and
inform
how the brain works so my way of
understanding the brain would be to
basically say
look at the insights neuroscientists
have found
understand that from a computational
angle information processing angle build
models using that
and then building the that model which
which
functions which is a functional model
which is which is doing the task
that we want the model to do it is not
just trying to model a phenomena in the
brain it is it is trying to
do what the brain is trying to do on on
the whole
functional level and building that model
will help you
fill in the missing pieces that you know
biology just gives you the hints
and building the model you know fills in
the
rest of the the pieces of the puzzle and
then you can go and
connect that back to biology and say
okay now it makes sense that this part
of
the brain is uh doing this or
this layer in the cortical circuit is
doing this uh
and and and then continue this
iteratively
because now that will inform new
experiments in neuroscience
and of course you know building the
model and verifying that in the real
world
will you will also tell you more about
does the model actually work
uh and you can refine the model find
better ways of putting these
neuroscience insights together
so so i would say it is it is you know
it
so neuroscientists alone just from
experimentation
will not be able to build a model of the
of the brain uh or a functional model of
the brain
so we you know there there's uh lots of
efforts which are very impressive
efforts in
collecting more and more connectivity
data
from the brain yeah you know how how are
the micro circuits of the brain
connected with each other those are
beautiful by the way those are beautiful
uh and at the same time those
those do not itself um by themselves
convey the story of how does it work
yeah uh and
and somebody has to understand okay why
are they connected like that
and what what are those things doing uh
and
and we do that by building models in ai
using hints from neuroscience
and and repeat the cycle so what
aspect of the brain are useful in this
whole
endeavor which by the way i should say
you're you're both the neuroscientists
and
and ai person i guess the dream is to
both understand the brain and to build
agi systems
so you're it's like an engineer's
perspective of trying to understand the
brain so what aspects of the brain
uh functioning speaking like you said
you find interesting
yeah quite a lot of things all right so
one is
um you know if you look at the visual
cortex
um uh and and you know the visual cortex
is
is a large part of the brain uh i forget
this exact fraction but it is
it's a it's a huge part of our brain
area is uh
occupied by just just vision um so
vision
visual cortex is not just a feed-forward
cascade of neurons
um uh there are a lot more feedback
connections in the brain
compared to the feed-forward connections
and and
it is surprising to the level of detail
neuroscientists have actually studied
this
if you if you go into neuroscience
literature and poke around and ask you
know
have they studied what will be the
effect of poking a neuron
in level i.t uh in
level v one and uh um have they studied
that uh
and you will say yes they have studied
that so
every every possible combination i mean
it's it's a it's not a random
exploration at all it's very hypothesis
driven right
they are very uh experimental
neuroscientists are very very systematic
in how they probe the brain uh because
experiments are
very costly to conduct they take a lot
of preparation they
they need a lot of control so they they
are very hypothesis driven in how they
probe the brain
and um often what i find is that when we
have a question
in um in ai uh about
have has anybody probably probed how
lateral connections in the brain works
and when you go and read the literature
yes people have probed it and people
have probed it very systematically
and and they have hypothesis about how
those lateral connections are supposedly
contributing
to visual processing uh but of course
they haven't built
very very functional detail models of it
by the way how do
you know studies start to interrupt that
do they do they stimulate like a neuron
in one particular area of the visual
cortex
and then see how the travel of the
signal travels kind of thing
fascinating very very fascinating
experiments right you know so i can i
can give you one example i was impressed
with um this is
uh so before going to that let me like
let me give you a a you know a
overview of how the the layers in the
cortex are organized
right uh visual cortex is organized into
roughly four hierarchical levels
okay so uh v one v two v four i t
and in v one of v three uh well yeah
there's another pathway okay
okay so there's this this is this i'm
talking about just the object
recognition pathway right
okay and then um in v1 itself
um so it's there is a very detailed
micro circuit in v1 itself there is
there is organization within a level
itself
uh the cortical sheet is organized into
uh you know multiple layers and there
are columnar structure
and and this this layer wise and column
structure is repeated in v1 v2 v4
uh it all of them right and and
the connections between these layers
within a level
with you know in v1 itself there are six
layers roughly
and the connections between them there
is a particular structure to them
uh and um now so one example
of an experiment uh uh people did
is when i when you present a stimulus
uh which is um let's say requires
um separating the foreground from the
background of an object so it is
it's a textured triangle on a textured
background
and you can check
does the surface settle first or does
the contour settle first
cerro settle in the sense that the so
when you find finally form the percept
of the
of the triangle you understand where the
contours of the triangle are
and you also know where the inside of
the triangle is right that's when you
form the final percept
uh now you can ask what is the dynamics
of forming that final percept
um do the do the
neurons um first find the edges
and converge on where the edges are and
then
they find the inner surfaces or does it
go the other way the other way around um
so so what's the answer uh in this case
it it turns out that
it first settles on the edges it it
converges on the edge hypothesis first
and then the the surfaces are filled in
from the edges to the inside that's
fascinating uh and
and the detail to which you can study
this it's it's amazing that you can
actually
not only find um the temporal dynamics
of
when this happens uh and then you can
also find
which layer in the you know in v1 which
layer is encoding
uh the edges which layer is encoding the
surfaces
and which layer is encoding the feedback
which there is encoding the feed forward
and what what's the
combination of them that produces the
final person um
and these kinds of experiments stand out
when
you try to explain illusions uh one one
example of a favorite illusion of mine
is the kanetsa triangle i don't know
that you are familiar with this one
so this is um uh this is an example
where
it's a triangle uh but you know the
corners of the only the corners of the
triangle are shown in the stimuli
the stimulus so they look like kind of
pac-man
oh the black pac-man exactly yeah and
then you start to see
your visual system hallucinates the
edges yeah
um and you can you know you when you
look at it you will see a faint edge
right and you can go inside the brain
and look you know do actually neurons
signal the presence of this edge and
and if this signal how do they do it
because they are not
receiving anything from the input in the
the input is black
for those neurons right uh so how do
they signal it
when does the signaling happen you know
does it you know so
so if a real contour is present in the
input then
the signa the neurons immediately signal
okay there is a there is an edge here
when when it is an illusory edge um it
is clearly not in the input
it is coming from the context so those
neurons fire later
and and you can say that okay these are
it's the feedback connections that is
causing them to fire
uh and and they happen later
and you can find the dynamics of them so
so these studies are pretty impressive
and and very detailed
so by the way just uh just take a step
back you said uh that there may be more
feedback connections and feed forward
connections yeah
uh first of all it's just just for like
a machine learning
folks yeah i mean that for that's crazy
that there's
all these feedback connections i mean we
often
think about i think
thanks to deep learning you start to
think about
um the human brain as a kind of feed
forward mechanism
right so what the heck are these
feedback connections yeah what's their
what's the dynamics well what are we
supposed to think about them yeah so
this is
this fits into a very beautiful picture
about how the brain works
right um so the the beautiful picture of
how the brain works is that
our brain is building a model of the
world
uh i know so our visual system is
building a model of how
objects behave in the world and and we
are constantly projecting that model
back onto the world
so what we are seeing is not just a feed
forward
thing that just gets interpreted in in a
few word party we are
constantly projecting our expectations
onto the world and
and what the final percept is a
combination of
what we project onto the world uh
combined with what the actual sensory
input is
almost like trying to calculate the
difference and then trying to interpret
the difference
yeah it's it's um i wouldn't put just
calculating the difference it's more
like
what is the best explanation for the
input stimulus
based on the model of the world i have
got it got it and that's where all the
illusions come in and that's but that's
that's an incredibly efficient
so uh efficient process so the feedback
mechanism it just
helps you constantly uh yeah
so hallucinate how the world should be
based on your world model and then just
looking at uh if there's
novelty uh like trying to explain it
like that
hence that's why movement we detect
movement really well there's all these
kinds of things
and that this is like at all different
levels
of the cortex you're saying this happens
at the lowest level or the highest level
yes yeah in fact feedback connections
are more prevalent
in everywhere in the cortex and and um
so one way to think about it and there's
a lot of evidence for this
is inference um so you know so basically
if you have a model of the world and
when when
some evidence comes in what you are
doing is inference
right you are trying to now explain this
evidence using your model of the world
yep and this inference
includes projecting your model onto the
evidence
and taking the evidence back into the
model
and and doing an iterative procedure
and this iterative procedure is what
happens
using the feed forward feedback
propagation and
feedback affects what you see in the
world and you know it also affects feed
forward propagation
and examples are everywhere we we see
these kinds of things everywhere the
idea that
there can be multiple competing
hypotheses
in our model trying to explain the same
evidence
and then you have to kind of make them
compete
and one hypothesis will explain away
the other hypothesis through this
competition process
wait what so you have competing
models of the world that tried to
explain what do you mean by explain away
so this is a classic example in uh uh
graphical models probabilistic models um
so if you what are those um okay
um i think it's useful to mention
because we'll talk about them more
yeah yeah so neural networks
are one class of machine learning models
um
you know you have distributed set of
nodes which are called the neurons
you know each one is doing a dot product
and you can you can approximate any
function using this
a multi-level network of neurons so
that's
a class of models which are used for
useful for function approximation
there is another class of models in
machine learning
called probabilistic graphical models
and
you can think of them as each node in
that
model is variable which is which is
talking about something you know
it can be a variable representing is is
an edge
present in the input or not and at the
top of the uh network a
node can be uh representing is there an
object present in the
world or not and and then so it can it
is
it is another way of encoding knowledge
and uh
um and then you once you encode the
knowledge
you can uh do inference
in the right way you know how what is
the best way to uh
you know explain some sort of evidence
using this model that you encoded you
know so when you encode the model
you are encoding the relationship
between these different variables how is
the edge
connected to my the model of the object
how is the surface connected to the
model of the object
and then of course this is a very
distributed complicated model
and inference is how do you
explain a piece of evidence when a set
of stimulus comes in if somebody tells
me
there is a 50 probability that there is
an edge here in this part of the model
how does that affect my belief on
whether
i should think that there should be is
the square present in the image
so so this is the process of inference
so
one example of inference is having this
experience of effect between
multiple causes so uh graphical models
can be
used to represent causality in the world
um
so let's say um you know uh your
uh alarm at home
can be uh triggered by a
burglar getting into your house uh or it
can be triggered by
an earthquake both both can be causes of
the alarm going off
so now you you're right you know you're
in your office you heard
burglar alarm going off you are heading
uh home
thinking that there's a burglar got it
but while driving home
if you hear on the radio that there was
an earthquake in the vicinity
now your hype you know uh strength of
evidence for
a burglar getting into their house is
diminished
because now that that piece of evidence
is explained by
the earthquake being present so if you
if you think about these two causes
explaining at lower level
uh variable which is alarm now what we
are seeing is that
increasing the evidence for some cause
ex
you know there is evidence coming in
from below for alarm being present
and initially it was flowing to a
burglar
being present but now since somebody
some this
there the side evidence for this other
cause it explains away this evidence and
it evidence will now flow to the other
course
this is you know two competing causal uh
things
trying to explain the same evidence and
the brain has a similar kind of
mechanism
for doing so that's kind of interesting
and that
how's that all encoded in the brain like
where's the storage of information are
we talking
just maybe to get it a little bit more
specific
is it in the hardware of the actual
connections is it
in uh chemical communication is it
electrical communication
do we do we know so this is you know a
paper that we are bringing out
soon which one this is the cortical
micro circuits paper that
i sent you a draft of of course this is
uh a lot of it is still hypothesis one
hypothesis is that
a you can think of a cortical column as
encoding a
a concept a concept you know think of it
as say
an example of a concept is um is an edge
present or not
or is is an object present or not okay
so it can you can think of it as a
binary variable a binary random variable
the presence of an edge or not or the
presence of an object or not
so each cortical column can be thought
of as representing
that one concept one variable and then
the connections between these cortical
columns are basically encoding
the relationship between these random
variables and
then there are connections within the
cortical column there are
each cortical column is implemented
using multiple layers of neurons
with very very very rich um
structure there you know there are
thousands of neurons in a cortical
column
but but that structure is similar across
the different cortical columns
yeah correct and also these cortical
columns collect connect to a
substructure called thalamus
in the uh you know so all all cortical
columns
pass through this substructure so our
hypothesis
is that yeah the connections between the
cortical columns
implement this uh you know that's where
the knowledge is stored
about you know how these different
connects concepts connect to each other
and then the the neurons inside this
cortical column and in thalamus in
combination
implement this uh actual computations
needed for
inference which includes explaining a
way and competing between the different
uh hypotheses um and it is all very
so what is amazing is that uh
neuroscientists have
actually done experiments to the tune of
showing these things they might not be
putting it in the overall
inference framework but they will show
things like
if i poke this higher level neuron it
will inhibit
through this complicated loop through
the thalamus it will inhibit this other
column
uh so they will do such experiments but
do they use
terminology of concepts for example so
so
you're i mean is it uh
is it something where it's easy to
anthropomorphize
and think about concepts like you start
moving into
logic based kind of reasoning systems so
um i would just think of concepts in
that kind of way
or is it is it a lot messier
a lot more gray area you know
even even more gray even more messy than
the artificial neural network kinds of
abstractions
the easiest way to think of it as a
variable right it's a binary variable
which is showing the presence or absence
of
something so but i guess what i'm asking
is
is that something that we're supposed to
think of something that's human
interpretable
of that something it doesn't need to be
it doesn't need to be human
interpretable there's no need for it to
be human interpretable uh
but it's it's almost like um
you you will be able to find some
interpretation of it
uh because it is connected to the other
things yes that you know
and the the point is it's useful somehow
yeah
it's useful as an entity
in the graph that in connecting to the
other entities that are
let's call them concepts right okay so
uh by the way what's
are these the cortical micro circuits
correct these are the cortical micro
circuits
you know that's what neuroscientists use
to talk about
the circuits in in uh within a level of
the cortex
so you can think of you know let's think
of a neural
network in artificial neural network
terms you know people talk about the
architecture of the you know so
how many how many layers they build uh
you know what is the fan
in fan out etc that is the macro
architecture
so and then within a layer of the
neural network you can you know the
cortical
neural network is much more structured
with you know within a level there's a
lot more intricate
structure there uh but even um even
within an artificial neural network you
can think of
in feature detection plus pooling as one
one level
and so that is kind of a micro circuit
uh it's much more
uh complex in the real brain uh and
and so within a level whatever is that
circuitry within a column
of the cortex and between the layers of
the cortex that's the micro circuitry
i love that terminology uh machine
learning people don't use
the circuit terminology right but they
should it's a nice so okay
uh okay so that's uh that that's the
the cortical micro circuit so what's
interesting about
what can we say what is the paper that
you're working on
propose about the ideas around these
cortical micro circuits
so this is a fully functional
model for the micro circuits of the
visual cortex
so the the paper focuses and your idea
in our discussions now is focusing on
vision yeah the uh visual cortex
okay yeah this is a model this is a full
model it says this is how vision works
but this is this is a model of science
yeah hypothesis okay so let me let me
step back
a bit um so we looked at neuroscience
for insights on how to build a vision
model
right and and and we synthesized all
those insights into a
computational model this is called the
recursive vertical network model
that we we used for breaking captchas
and and we are using the same model for
robotic picking
and uh tracking of objects and that
again is the vision system
that's the best computer vision system
that's the computer mission
takes in images and outputs what
on one side it outputs the class of the
image
and also segments the image uh
and you can also ask it further queries
where is the edge of the object where is
the interior of the object
so so it's a model that you build to
answer multiple questions
so you are not trying to build a model
for just classification
or just segmentation etc so it's a it's
a it's a joint model that can do
multiple things um and um so
so that's the model that we built using
insights from
neuroscience and some of those insights
are what is the role of feedback
connections
what is the role of lateral connections
uh so all those things went into the
model the model
actually uses feedback connections all
these ideas from you know from your
science
yeah so what what what the heck is a
recursive cortical network like what
what are the architecture approaches
interesting aspects here
which is essentially a brain inspired
approach
to computer vision yeah so there are
multiple layers to this question
i can go from the very very top and then
zoom in okay
so one important thing constraint that
went into the model is that
you should not think vision think of
vision
as something in isolation we should not
think perception
as something as a preprocessor for
cognition
perception and cognition are
interconnected and so you should not
think of
one problem in separation from the other
problem um and
so that means if you finally want to
have a system that understand
concepts uh about the world and can
learn in a very conceptual model of the
world
and can reason and connect to language
all of those things
you need to you need to have think all
the way through and
make sure that your perception system is
compatible with
your cognition system and language
system and all of them and one
aspect of that is top-down
controllability
um what does that mean so that means you
know so
so think of it you know you can close
your eyes and
think about the details of one object
right i can i can
zoom in further and further i can you
know so so think of the bottle in front
of me
right and and now you can think about
okay what the cap of that bottle looks
uh i know we can think about what's the
texture on that bottle
of the of the cap you know you can think
about
you know what will happen if uh
something hits that
uh so you can you can you can manipulate
your visual knowledge in uh
cognition driven ways yes uh and so
this top-down controllability uh and
being able to
simulate scenarios in the world so
you're not just a passive uh
player in this perception game you you
can you can control it you gotta you
you have imagination correct so so
so basically you know basically having a
generating network
yeah which is a model and and it is not
just some
arbitrary generated network it has to be
it has to be built in a way that it is
controllable top-down
it is it is not just trying to generate
a whole picture
at once uh you know it's not trying to
generate photorealistic things of the
world you
you know you don't have good
photorealistic models of the world human
brains do not have if i
if i for example ask you the question uh
what is the color of the letter
e in the google logo
you have no idea right now yeah although
you have seen it millions of times
hundreds of times so yeah so it's not
our model is not photorealistic
but but it is but it has other
properties that we can manipulate it
uh in the uh and you can think about
filling in a different color in that
logo
you can think about expanding the the
letter e yeah
you know you can see what in so you can
imagine the consequence of
you know actions that you have never
performed so so these are the
kind of characteristics the genetic
model need to have so this is one
constraint that went into our model like
you know so this is
when you read the just the perception
side of the paper it is not obvious that
this was a constraint
into the inter that went into the model
this top-down controllability
of the generating model uh so what what
does the top-down controllability
in a model look like
it's a really interesting concept
fascinating concept what is that is that
the recursive
recursiveness gives you that or how do
you how do you do it um
quite a few things it's like what what
does the model factor
or factorize you know what are the what
is the model representing us different
pieces
in the puzzle like you know so so in the
rcn
uh network it it thinks of the world you
know
what i say the background of an image
is modeled separately from the
foreground of the image
so the objects are separate from the
background they're different entities
so there's a kind of segmentation that's
built in fundamentally that's why
and and then even that object is
composed of
parts and also and another one is the
the shape of the object
uh is differently modeled from the
texture of the object
got it so there's like these um
i've been you know who francois charles
is
yeah he's so there's uh he developed
this like iq test type of thing
for arc challenge for and uh it's kind
of cool that there's um
these concepts priors that he defines
that you bring to the table
in order to be able to reason about
basic shapes and things
in the iq test right so here you're
making it
quite explicit that here here are the
things that you should be
there these are like distinct things
that you should be able to
uh model and yes keep in mind that
you you can derive this from much more
general principles
it doesn't you don't need to explicitly
put it as oh
objects versus foreground versus
background uh
the surface versus structure now these
are these are derivable from
more fundamental principles of how
you know what's the property of
continuity of natural signals
what's the property of continuity of
natural signals yeah
by the way that sounds very poetic but
yeah uh so you're saying that's a
there's some low-level properties from
which emerges the idea that shapes
should be different than
like uh there should be a parts of an
object there should be
i mean exactly kind of like friends of
water i mean there's objectness
there's all these things that it's kind
of crazy that we're humans
uh i guess evolved to have because it's
useful for us to perceive the world
correct yeah correct and it
derives mostly from the properties of
natural signals
and yeah and so um natural
signals so natural signals are the kind
of things we'll perceive in the
in the natural world i don't know i
don't i don't know why that sounds so
beautiful natural signals yeah as
opposed to a qr code
right which is an artificial signal that
we created humans are not very good at
classifying qr codes we are very good at
saying something is a cat or a dog
but not very good at you know the
classification computers are very good
at classifying qr codes so our visual
system is tuned for
natural signals and there are
fundamental assumptions
in the architecture that are derived
from natural signals
properties i wonder when you take a
hallucinogenic drugs
does that go into natural or is that
closer to the qr code
it's still natural yeah because it's it
is still operating
using your brains by the way on that on
that topic i i mean i haven't been
following i think they're becoming
legalized at certain i can't wait
until they become legalized to the
degree that you
like vision science futures could study
it yeah
just like through through medical
chemical ways modify there could be
ethical concerns but
modif that's another way to study the
brain is to be
be able to chemically modify it there's
probably
um probably very long a way to figure
out how to do it ethically
yeah but i i think there are studies on
that already
yeah i think so uh because it's not
unethical to give
uh it to rats oh that's true that's true
[Laughter]
there's a lot of drugged up rats out
there okay yeah cool
sorry sorry so okay so there's uh so
there's these
uh low-level uh
things from natural signals that uh that
that
from which these properties will emerge
yes uh but it is still
a very hard problem on how to encode
that again so you don't
you know there is no uh so uh you
mentioned
um the the the priors uh francho wanted
to encode in
uh in the abstract reasoning challenge
but it is not straightforward how to
encode those priors
um so so some of those uh challenges
like you know
the object completion challenges are
things that we purely use our visual
system to do it is uh it
looks like abstract reasoning but it is
purely an output of
the the vision system for example
completing the corners of that condenser
triangle completing the lines of that
cancer triangle
it's a purely a visual system property
there is no abstract reasoning involved
it it uses all these priors but it is
stored
in our visual system in a particular way
that is amenable to inference
and and and that is one of the things
that we tackled in the you know so
basically saying okay these are the
prior knowledge
uh which which will be derived from the
world but then
how is that prior knowledge represented
in the model
such that inference when when some piece
of evidence comes in
can be done very efficiently and in a
very distributed way
um because it is very there are so many
ways of representing knowledge
which is not amenable to very quick
inference
in a quick lookups and so that's one
um core part of what we tackled
in uh the rcn model um uh how do you
encode visual knowledge to uh
do very quick inference and yeah can you
maybe comment on uh
so folks listening to this in general
may be familiar with
different kinds of architectures of
neural networks
what what are we talking about with rcn
uh
what are what does the architecture look
like what are different components
is it close to neural networks is it far
away from neural networks what does it
look like
yeah so so you can uh think of the delta
between the model
and a convolutional neural network if
people are familiar with convolutional
neural networks
so convolutional neural networks have
this feed-forward processing cascade
which is called
uh feature detectors and pooling and
that is repeated in the in the hierarchy
in a
multi-level uh system um and if you
if you want an intuitive idea of what
what is happening feature detectors are
uh you know detecting interesting
co-occurrences
in the input it can be a line a corner
a an eye or a piece of texture
etc and the pooling neurons are
doing some local transformation of that
and making it invariant to local
transformations so this is what the
structure of convolutional neural
network is
um recursive cortical network
has a similar structure when you look at
just the feed forward pathway
but in addition to that it is also
structured in a way that it is
generating
so that again it can run it backward and
combine the forward with the backward
another aspect that it has is
it has lateral connections these
lateral connections um which is between
so if you have an edge here and an edge
here
it has connections between these edges
it is not just feed forward connections
it is um something between these edges
which is the nodes are presenting these
edges which is to enforce compatibility
between them
so otherwise what will happen is the
constraints it's a constraint it's
basically
if you if you do just feature detection
followed by pooling
then your your transformations in
different parts of the visual field are
not coordinated
uh and so you can you will create a
jagged
when you when you generate from the
model you will create jagged um
things and uncoordinated transformations
so these lateral connections are
enforcing
the the transformations is the whole
thing still differentiable
uh no okay no it's not it's not
trade using uh backprop okay that's
really important so
uh so there's this feed forward there's
feedback mechanisms
there's some interesting connectivity
things it's still layered
like uh yes there are multiple levels
multiple
layers okay very very interesting uh and
yeah okay so the interconnection between
um adjacent the connections across
service constraints
that like keep the thing stable got it
okay so what else uh and then there is
this idea of
doing inference a neural network does
not do
inference on the fly so an example of
why
this inference is important is you know
so one of the first applications
of that we showed in the paper was to
crack uh text-based captchas what are
captures by the way
by the way one of the most awesome like
the people don't use this term anymore
is human computation i think
uh i love this term the guy who created
captures
i think came up with this term yeah i
love it anyway uh
yeah uh what what are captures so
captchas
are those strings that you fill in
uh when you're you know when if you're
opening a new account in google
they show you a picture you know usually
it used to be
a set of garbage letters uh that you
have to kind of
figure out what what what is that string
of characters and type in
and the reason cap just exist is because
you know
google or twitter do not want
automatic creation of accounts you can
use a computer
to create millions of accounts uh and uh
use that for in nefarious purposes uh so
you want to make sure that
to the extent possible the interaction
that your their system is having
is with a human so it's a it's called a
human interaction proof
a captcha is a human interaction proof
um so
so this is a captchas are by design
things that are
easy for humans to solve but hard for
computers hard for robots yeah
so and text-based captchas where
was the one which is prevalent and
around 2014
because at that time text-based voice
captures were hard for
computers to crack even now they are
actually
in the sense of an arbitrary text based
capture
will be unsolvable even now but with the
techniques that we have developed it can
be you know you can quickly develop
a mechanism that solves the captcha
they've probably gotten a lot harder too
the people
they've been getting clever and clever
generating these text characters yeah
right so okay so that was one of the
things you've tested
on is these kinds of captures in 2014
15.
got that kind of stuff right right so
what uh
what i mean why by the way why captchas
yeah
yeah even now i would say captcha is a
very
very good challenge problem uh if you
want to
understand how human perception works
and if you want to build
uh systems that work like the human
brain
uh and i wouldn't say captcha is a
solved problem we have
cracked the fundamental defense of
captures but it is not solved
in the way that humans solve it um
so i can give an example i can um take a
five-year-old child
who has just learned characters uh and
uh
show them any new capture that we create
they will be able to solve it
uh i can show you pretty much any new
capture
from any new website you'll be able to
solve it without getting
any training examples from that
particular style of captcha you're
assuming i'm human yeah yes
yeah that's right so if you are human
and
if you otherwise i will be able to
figure that out using this one
but uh so this whole podcast is just a
touring test
a long turing test anyway i'm sorry so
yeah
so human humans can figure it out with
very few examples
or no training examples like no training
examples from that particular style of
capture
and and so you can you know so uh even
now this is
unreachable for the current deep
learning system so
basically there is no i don't think a
system exists where you can basically
say
train on whatever you want and then now
say hey i will show you a new captcha
which i did not show you in
in the in the training setup will the
system be able to solve it
um it still doesn't exist so that is
the magic of human perception yeah and
doug have starter
uh put this uh very beautifully in
one of his uh talks the the central
problem
in ai is what is the letter a
if you can if you can build a system
that reliably
can detect all the variations of the
letter a you don't even need to go to
the
v and the c yeah you don't even know the
b and c or the strings of characters
and uh so that that is the spirit at
which you know with which we
uh tackle that what does it mean by that
i mean is it uh
like without training examples try to
figure out
the fundamental uh elements that make
up the letter a in all of its forms in
all of its forms it can be
a can be made with two humans standing
leaning against each other holding the
hands yeah
and uh it can be made of leaves it can
be yeah you might have to understand
uh everything about this world in order
to understand letter a
yeah exactly so it's common sense
reasoning essentially yeah
right so so to finally to really solve
finally to say that you have solved
captcha
uh you have to solve the whole problem
yeah okay so what how does this kind of
the rcn architecture help us to get a do
a
better job of that kind of yeah so uh as
i mentioned
one of the important t
Resume
Read
file updated 2026-02-13 13:23:12 UTC
Categories
Manage