Transcript
s3MuSOl1Rog • MIT Sloan: Intro to Machine Learning (in 360/VR)
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0026_s3MuSOl1Rog.txt
Kind: captions
Language: en
the video you're watching now is in 360
resolution is not great but we wanted to
try something different so if you're on
a desktop or laptop you can pan around
with your mouse or if you're in a phone
or tablet you should be able to just
move your device to look around
of course it's best viewed with a VR
headset the video that follows is a
guest lecture on machine learning that I
gave an MIT Sloan course on the business
of artificial intelligence the lecture
is non technical and intended to build
intuition about these ideas amongst the
business students in the audience the
room was a half circle so we thought why
not film the lecture in 360 we recorded
a screencast of the slides and pasted it
into the video so that the slides are
more crisp let me know what you think
and remember it's an experiment so this
course is talking about the broad
context the impact of artificial
intelligence the global there's global
which is the global impact of artificial
intelligence says the business which is
when you have to take these fun research
ideas that I'll talk about today a lot
of them are cool on toy examples when
you bring them to reality you face real
challenges which is what I would like to
really highlight today that's the
business part when you want to make real
impact when you Miller make these
technologies of reality so I'll talk
about how amazing the technology is for
a nerd like me but also talk about how
when you take that into the real world
what are the challenges you face so
machine learning which is the technology
at the core of artificial intelligence
will talk about the promise the
excitement that I feel about it the
limitations will bring it down a little
bit what are the real capabilities the
technology where for the first time
really as a civilization exploring the
meaning of intelligence it is if you
pause for a second and just think you
know maybe of many of you want to make
money out of this technology many of you
want to save lives help people but also
in the philosophical level we get to
explore what
makes us human so while I'll talk about
the low-level technologies also think
about the incredible opportunity here we
get to almost psychoanalyze ourselves by
trying to build versions of ourselves in
the machine alright so here's the open
question how powerful is artificial
intelligence how powerful is machine
learning that lies at the core of
artificial intelligence is it simply a
helpful tool a special-purpose tool to
help you solve simple problems if your
which is what it currently is currently
machine learning artificial intelligence
is a way if you can formally define the
problem you can formally define the
tools you're working with you can
formally define the utility function
where you want to achieve with those
tools as long as you can define those
things we can come up with algorithms
that can solve them as long as you have
the right kind of data which is all I'll
talk about data is key and the question
is into the future can we break past
this very narrow definition of what
machine learning can give us which is
solve specific problems to something
bigger to where we approach the general
intelligence that we exhibit as human
beings when we're born we know nothing
and we learn quickly from very little
data the right answer is we don't know
we don't know what are the limitations
of technology what kind of machine
learning are there there are several
flavors the first two is what's really
the first is what's achieved success
today supervised learning what I'm
showing here on the left of the slide is
the teachers is the data that is fed to
the system and on the right is the
students which is the system itself for
machine learning so they're supervised
learning whenever everybody talks about
machine learning today what for the most
part they're referred to supervised
learning which means every single piece
of data that is used to train the model
is seen by human eyes and those human
eyes with an accompanying brain label
that data
in a way that makes it useful to the
machine this is this is critical because
that's one the blue box the human is
really costly so whenever every single
piece of data that needs to be that's
used to train the machine needs to be
seen by a human you need to pay for that
human and second you're limited to just
the the time there's the amount of data
necessary to label what it means to
exist in this world is humongous
augmented supervised learning is when
you get machine to really to help you a
little bit there's a few tricks there
but still it's still only tricks it's
still the human is at the core of it and
the promise of future research that
we're pursuing that I'm pursuing and
perhaps in the applications if we get to
discuss or some of the speakers here get
to discuss they're pursuing in
semi-supervised and reinforcement
learning where the human starts to play
a smaller and smaller role in how much
they get to annotate they have to
annotate the data and the dream of the
sort of Wizards of the dark arts of deep
learning are all excited about
unsupervised learning that has very few
actual successes in application in the
real world today but it is the idea that
you can build a machine that doesn't
require a human teacher a human being to
teach it anything is fills us artificial
intelligence researchers with excitement
there's a theme here machine learning is
really simple the learning system in the
middle there's a training stage where
you teach it something all you need is
some data input data and you need to
teach it the correct output for that
input data so you have to have a lot of
pairs of input data and correct output
there'll be a theme of cats throughout
this presentation so if you want to
teach in a system difference being a cat
and a dog you need a lot of images of
cats
you need to tell it that this is a cat
this bounding box here and the images of
cat you have to give it a lot of images
of dogs and tell it ok for this in this
in these pictures they're dogs and then
then there's a spelling mistake on the
second stage is the testing stage when
you actually give it new input it has
never seen before and you hope that it
has given for cat versus dog enough data
to guess is this new image that I've
never seen before a cat or a dog now the
one of the open questions do you want to
keep in mind is what in this world can
we not model in this way what activity
what task what goal my I offer to you
that there's nothing you can't model in
this way so let's think about what in
terms of machine learning can be so it
starts small what can be modeled in this
way first on the bottom of the slide
left is one-to-one mapping where the
input is an image of a cat and the
output is a is a label that says cat or
dog you can also do one-to-many where
the image the input is a image of a cat
and the output is a story about that cat
captioning of the image you can first of
all you can do the other way many to one
mapping where you give it a story about
a cat and it generates an image there's
many to many this is Google Translate we
translate a sentence from one language
to another
and there's various flavors of that
again same theme here input data
provided with correct output and then
let it go into the wild where it runs on
input data hasn't seen before to provide
guesses and it's as simple as this
whatever you can come
into one of the following four things
numbers vector of numbers so bunch of
numbers sequence of numbers or the
temporal dynamics matters so like audio
video where the sequence the ordering
matters or sequence of vector numbers
just a bunch of numbers if you can
convert it into numbers and I propose to
you that there's nothing you can't
convert it to numbers if you can convert
it to numbers you can have a system
learn to do it and the same thing with
the output generate numbers vectors and
numbers sequence the numbers or sequence
of vectors and numbers first
is there any questions at this point
well we have a lot of fun slides to get
through but I'll pause every once in a
while to make sure we're on the same
page here so what kind of input are we
talking about just to fly through it
images so faces or medical applications
for looking looking at scans of
different parts of the body to determine
if they're to diagnose any kind of
medical conditions text so conversations
your texts article blog posts for
sentiment analysis question and
answering so you ask it a question where
the output you hope is answers sound so
voice recognition any kind of anything
you could tell from audio time series
data so financial data stock market can
use it to predict anything you want
about the stock market including whether
to buy or sell I think if you're curious
doesn't work quite well as a machine
learning application physical world so
cars or any kind of object any kind of
robot that exists in this world so
location of where I am location of where
other things are the actions of others
that could be all the input all of it
can be converted to numbers and the
correct output same thing classification
a bunch of numbers classification is
saying is it's a cat or dog regression
is saying to what degree I turn the
steering wheel sequence is generating
audio generating video generating
stories captioning
text images generate anything you could
think of as numbers and at the core of
it is a bunch of data agnostic machine
learning algorithms there's traditional
ones nearest neighbors Navy base support
machine
support vector machines a lot of them
are limited in all describe how and then
there's neural networks there's nothing
special and new about neural networks
and I'll describe exactly the very
subtle thing that is powerful that's
always been there all along and certain
things have now been able to unlock that
power about neural networks but it's
still just the flavor of a machine
learning algorithm and the inspiration
for neural networks as Jonathan showed
last time is our human brain as perhaps
why the media perhaps why the hype is
captivated by the idea of neural
networks is because you immediately jump
to this feeling like because there's
this mysterious structure to them that
scientists don't understand
artificial neural networks I'm referring
to and the biological ones we don't
understand them and the similarity
captivates our minds that we think well
this approach is perhaps as limited as
our as limitless as our own human mind
but the comparison ends there in fact
the artificial neuron their artificial
neural networks are much simpler
computational units at the core of
everything is this neuron if this is a
computational unit that does a very two
very simple operations on the left side
it takes a set of numbers as inputs it
applies weights to those inputs sums
them together applies a little bias and
provides an output somewhere between 0
and 1 so you can think of it as
computational entity that gets excited
when it sees certain inputs and gets
totally turned off when it gets other
kinds of inputs so maybe this neuron
with a zero with a point seven point six
one point four weights it gets really
excited when it sees pictures of cats
and totally doesn't care about dogs some
of us are like that so that's the job of
this neuron it's to detect cats now what
the way you build an artificial neural
network the way you release the power
that I'll talk about in the following
slides about the applications what could
be achieved
it's just stacking a bunch of these
together think about it this is this is
a extremely simple computational unit
there so you need to sort of pause
whenever we talk about the following
slides and think that there there's a
few slides that I'll show that say
neural networks are amazing now I want
you to think back to this slide that
everything is built on top of these
really simple addition operations with
the a simple nonlinear function applied
at the end just a tiny math operation we
stack them together within a
feed-forward way so there's a bunch of
layers and when people talk about deep
neural networks it means there's a bunch
of those layers and then there's
recurrent neural networks that are also
a special flavor that's able to have
memory so as opposed to just pushing
input into output directly it's also
able to do stuff on the inside in a loop
where it remembers things this is useful
for natural language processing for
audio processing whenever the sequence
is not the length of the sequence is not
defined okay slide number one in terms
of neural networks are amazing
this is this is perhaps for the math
nerds but also I want you to use your
imagination there's a universality to
neural networks means that this simple
computational unit on the left is an
input on the right is the output of this
network with just a single hidden layer
it's called a hidden layer because it
sits there in the middle of the input
and the output layers a single hidden
layer with some number of notes can
represent any function any function that
means anything you want to build in this
world everyone in this room can be
represented with a neural network with a
single hidden layer so the power and
this is just one hidden layer the power
of these things is limitless the problem
of course is how do you find the network
so how do you build a network that as as
clever as many of the people in this
room but the fact that you can build
such a network is incredible it's
amazing I want you to think about that
and the way you train a network so it's
born as a blank slate some random
weights assigned to the edges again a
network is represented the numbers at
the core the parameters of the core of
this network are the numbers on each of
those arrows each of those edges and you
start knowing nothing this is a baby
Network and the way you teach it
something unfortunately currently as I
said in a supervised learning mechanism
you have to give it pairs of input and
output you have to give it pictures of
cats and labels on those pictures saying
that they're cats and the basic
fundamental operation of learning is
when
you compute the measure of an error and
you back propagate it to the network
what I mean
everything is easier with cats I
apologize I apologize too many cats and
so the input here is a cat and the
neural network we trained
it's just guessing it doesn't know say I
don't know it's guessing cat well it
happens to be right so we have to this
is the measure of error yes you got a
right and you have to back propagate
that error you have to reward the
network for doing a good job and all you
do what I mean by a reward there's
weights on each of those edges and so
the the node that individual neurons
that were responsible that back to that
cat neuron that cat neuron needs to be
rewarded for seeing the cat so you just
increase the weights on the neurons that
were associated with producing the
correct answer now you give it a picture
of a dog and the neural networks is cat
well that's an incorrect answer so no
there's a high error needs to be back
propagated to the network so the weights
are responsible with classifying this
out of this picture as a cat need to be
punished
they need to be decreased simple and you
just repeat this process over and over
this is what we do as kids when we're
first learning i I'm you know for the
most part that we have to
we're also supervised learning machines
in the sense that we have our parents
and we have the environment the world
that teaches about what's correct and
what's incorrect and we back propagate
this error and reward through our brain
to learn the problem is as human beings
we don't need too many examples and I'll
talk about some of the drawbacks of
these approaches we don't need too many
examples you fall off your bike once or
twice and you learn how to ride the bike
unfortunately neural networks needs need
tens of thousands of times when they
fall off the bike in order to learn how
to not do it that's one of the
limitation
and one key thing I didn't mention here
is when we refer to input data it's when
we refer to input data we usually refer
to sensory data raw data we have to
represent that data in some clever way
in some deeply clever way where we can
reason about it whether it's in our
brains or in the neural network in a
very simple example here to illustrate
what representation of data matters so
the way you represent the data can make
the discrimination of one class from
another a cat versus dog either
incredibly difficult or incredibly
simple here is a visualization of the
same kind of data and Cartesian
coordinates and polar coordinates on the
right you can just draw a simple line to
separate the two what you want is a
system that's able to learn the polar
coordinate representation versus the
Cartesian representation automatically
and this is where deep learning has
stepped in and revealed the incredible
power of this approach which deep
learning is the smallest circle there is
a type of representational learning
machine learning is the bigger second to
the biggest up so this class is about
the biggest circle AI includes robotics
includes all the fun things that are
built on learning and I'll discuss while
machine learning I think will close this
entire circle into one but for now AI is
the biggest circle then a subset of that
is machine learning and a smaller subset
of that is representation learning so
deep learning is not only able to say
given a few examples of cats and dogs to
discriminate between a cat and a dog
it's able to represent what it means to
be a cat it's so it's able to
automatically determine what are the
fundamental units
at the low level and the high level
talking about this very Plato what it
means to represent a cat from the
whiskers to the high level shape of the
head to the the fuzziness and the
deformable aspects of the cat not a cat
expert but I hear this these are the
features of a cat verses that are
essential to discriminate between a cat
and a dog learning those features as
opposed to having to have experts this
is the drawback of systems that Jonathan
talked about from the 80s and 90s where
you have to bring in experts for any
specific domain that you try to solve
you had to have them encode that
information deep learning this is this
is simply the only big difference
between deep learning and other methods
is that it learns the representation for
you it learns what it means to be a cat
nobody has to step in and help it figure
out what what that cats have whiskers
and dogs don't what does this mean the
fact that it can learn these features
these whisker features is as opposed to
having five or ten or a hundred or five
hundred features that are encoded by
brilliant engineers with PhDs it can
find hundreds of thousands millions of
features automatically hundreds of
millions of features so stuff that that
can't be put into words are described in
fact it's one of the limitations the
neural networks is they find so many
fundamental things about what it means
to be a cat that you can't visualize
what it really knows it just seems to
know stuff and it finds that stuff
automatically what what does this mean
it's the critical thing here is because
it's able to automatically learn those
hundreds of millions of features it's
able to utilize data it doesn't start
the diminishing returns don't hit on
until what we don't know when they hit
the point is with the classical machine
learning algorithms you start hitting a
wall when you have tens of thousands of
images of cats with deep learning you
get better
better with more data neural networks
are amazing slide two here's here's a
game a simple arcade game where there's
two paddles the bouncing a ball back and
forth okay great
you can figure out an artificial
intelligence agent that can play this
game it can and not even that well just
kind of it kind of learns to do all
right and eventually win here's the
fascinating thing with deep learning as
opposed to encoding the position of the
paddles the position of the ball having
an expert in this game as many come in
and encode the physics of this game the
input to the neural network is the raw
pixels of the game so it's learning in
the following way you give it an
evolution of the game you give it a
bunch of pixels pixels are you know
images are built up of pixels they're
just numbers from 0 to 256 so there's
this array of numbers that represent
each image and then you give it several
tens of thousands of images they're
represented game so you have the stack
of pixels and stack of images that
represents a game and the only thing you
know this giant stack of numbers the
only thing he knows at the end you won
or lost that's it so based on that you
have to figure out how to play the game
you know nothing about games you know
nothing about colors or balls or paddles
or winning or anything that's it so this
is it's why is this amazing that it even
works and it works too it wins it's
amazing because that's exactly what we
do as human beings this is general
intelligence so I need you to pause and
think about this well we'll talk about
special intelligence - the usefulness
and it ok there's cool tricks here and
there that we can do to get you an edge
on your high-frequency trading system
but this is general intelligence general
intelligence is the same intelligence we
use as babies when we're born what we
get is an input sensory input of image
sensory input right now all of us most
of us are seeing hearing feeling with
touch and that's the only input we get
we know nothing and with that input we
have to learn something
nobody is pre teaching us stuff and this
is an example of that a trivial example
but one of the first examples where this
is truly working I sorry to linger on
this but it's a fundamental fact the
fact that we have systems that and now
outperform human beings in these simple
arcade games is incredible
this is the research side of things but
let me step back these again the
takeaways that previous slide is why I
think machine learning is limitless in
the future currently it's limited again
the representation of the data matters
and if you want to have impact we
currently can only tackle the small
problems what are those problems image
recognition we can classify given the
entire image of a leopard of a boat of a
mite with pretty good accuracy of what's
in that image
that's image classification what else we
can find exactly where in that image
each individual object is that's called
image segmentation again the same the
process is the same as the learning
system in the middle and neural network
as long as you give it a set of numbers
as input and the correct set of labels
as output it learns to do that for data
hasn't seen the best let me pause a
second and maybe if you have any
questions does anyone have any questions
about
the techniques of neural networks yes
so that's a great question and in a
couple of slides I'll get to it exactly
so the the the data representation I'll
elaborate in a little bit but loosely
the data representation is for a neural
network is in the weights of each of
those arrows that connecting your ons
that's where the representation is so
I'll show to really clarify that example
of what that means the Cartesian versus
polar coordinates is just the visual
very simple visualization of the concept
but you want to be able to represent the
data in an arbitrary way where there's
no limits to the representation it could
be highly nonlinear highly complex any
other questions
so I have a couple of slides almost
asking this questions because there's no
good answers but one could argue and I
think somebody in last class brought up
that you know is machine learning just
pattern recognition it's possible that
reasoning thinking is just pattern
recognition and I'll describe sort of an
intuition behind that so we tend to
respect thinking a lot because we've
recently as human beings learned to do
it in our evolutionary time we think
that it's somehow special from for
example perception we've had visual
perception for several orders of
magnitude longer in our evolution
evolution as a living species we've
started to learn to reason I think about
a hundred thousand years ago so we think
it's somehow special from the same kind
of mechanism we use for seeing things
perhaps it's exactly the same thing it's
so perception is pattern recognition
perhaps reasoning is just a few more
layers of that that's the hope that's an
open question it's
yes that's a great question there
there's been very few breakthroughs in
your networks since through the AI
winters that we discussed through a lot
of excitement in spurts and even
recently there's been a very few
algorithmic innovations the big gains
came from compute so improvements in GPU
and better faster computers the you
can't underestimate the power of
community so the ability to share code
and the internet ability to communicate
together through the internet and work
on code together and then digitization
of data
so like ability to have large datasets
easily accessible and downloadable all
of those little things but I think it in
terms of the future of deep learning and
machine learning it it all rides on
compute I think meaning continued bigger
and faster computers that doesn't
necessarily mean Moore's law in making
small and smaller chips it means getting
clever in different directions massive
parallelization coming up with ways to
do super efficient power efficient
implementations and neural networks and
so on so let me just fly through a few
examples of what we can do with machine
learning just to give you a flavor I
think in future lectures as possible
we'll discuss different speakers the
different specific applications really
dig into those so we can as opposed to
working with just images you can work
with videos and segments those I
mentioned image segmentation we do video
segmentation so through video segments
the different parts of a scene that's
useful to a particular
application here and driving you can
segment the road from cars and
vegetation and lane markings you can
also a subtle but important point these
very small piece of information that we
just we know are important like there is
a red light like I have to stop I have
to slow down so hard questions so the
question was how do you detect the
traffic light and lights so how do we do
it as human beings first of all let's
start there the way we do it is by the
knowledge we'll bring to the table so we
we know what it means to be on the road
there's a lot of the huge network of
knowledge that you come with and so that
makes the perception problem much easier
this is pure perception you take an
image and you separate different parts
based purely on tiny patterns of pixels
so first it finds all the edges and it
learns that traffic lights have certain
kinds of edges around them and then zoom
out a little bit they have a certain
collection of edges that make up this
black rectangle type shape so it's all
about shapes it kind of build up knowing
this this shape structure of things but
it's a purely perception problem and one
of the things that argue is that if it's
purely a perception approach and you
bring no knowledge to the table about
the physics of the world the
three-dimensional physics and the
temporal dynamics that you are now going
to be able to successfully achieve near
100% accuracy and some of the
so that's exactly the right question is
you for all of these things think about
how you as a human being would solve
these problems and what is lacking in
the machine learning approach what data
is lacking in the machine learning
approach in order to achieve the same
kind of results the same kind of
reasoning required to that you would use
as a human so there is also image
detection image detection which means
the subtle but important point the stuff
I've mentioned before image
classification is given them image of a
cat you find the cat noting the side you
don't find the cat you say this images
of a cat or not and then detection or
localization is when you actually find
where in the image that is that problem
is much harder but also doable with
machine learning with with deep neural
networks now as I said inputs outputs
can be anything the input could be a
video the output could be video and you
could do anything you want with these
videos you can colorize the video you
can add take an old black-and-white film
and produce color images again in terms
of being out in terms of having an
impact in the world using these
applications you have to think this is a
cool demonstration but how well does it
actually work in the real world
translation whether that's from text to
text or image to image you can translate
here dark-chocolate from one language to
another it's class global business of
artificial intelligence there's a
reference below there you can go and
generate your own text you can generate
the writing of the act of generating
handwriting so you can type in some text
and given different styles that it
learns from other handwriting samples it
can generate any kind of text using
handwriting
again the input is language the output
is a sequence of writing of pen
movements on the screen you can complete
sentences this is kind of a fun one
where if you start
so you can generate language you can
generate language where you start you
feed the system some input first so in
black there's says life is and then have
the neural network complete those
sentences life is about kids life about
life is about the weather there's a lot
of knowledge here
I think being conveyed and you can start
the sentence with the meaning of life is
the meaning of life is literary
recognition true for us academics or the
meaning of life is the tradition of
ancient human production also true but
these are all generated by a computer
you can also caption this has been
become very popular recently is caption
generation given us input as an image
the output is a set of text the cap
captures the content of the image you
find the different objects in the in the
image that's a perception problem and
once you find the different objects you
stitch them together in a sentence that
makes sense
you generate a bunch of sentences and
classify which sentence is the most
likely to fit this this image and you
can so certainly in the I tried to avoid
mentioning to driving too much because
it is my field with this what I'm
excited about what then the moment I
start talking about driving it'll all be
about driving so but I should mention of
course the deep learning is critical to
driving applications for the both the
perception and what is really exciting
to us now is the end-to-end the
end-to-end approach so whenever you say
end-to-end in any application what that
means is you start from the very raw
inputs that the system gets and you
produce the very final output that's
expected of the system so supposed to in
the self-driving car case as opposed to
breaking a car down into each individual
components of perception localization
mapping control planning and just taking
the whole stack and just ignoring all
the super complex problems in the middle
just taking the external scene as input
and as output produced steering and
acceleration of braking commands
and so in this way taking this input is
the image of the external world in this
case in a Tesla we can generate steering
commands for the car again input a bunch
of numbers that that's just images I'll
put a single number that gives you the
steering of the of the car okay
so let's step back for a second and
think about what can't we do with
machine learning we talked we talked
about you can map numbers to numbers
let's think about what we can't do this
at the core of artificial intelligence
in terms of making an impact on this
world is robotics so what can't we solve
in robotics and artificial intelligence
with a machine learning approach and
let's break down what artificial
intelligence means here's a stack
starting at the very top is the
environment the world you operate in
their sensors that sense that world
there is feature extraction and learning
from that data and there's some
reasoning planning and effectors are the
ways you manipulate the world what can't
we learn in this way so we've had a lot
of success as Jonathan talked about in
the history of AI with formal tasks
playing games solving puzzles recently
we're having a lot of breakthroughs with
medical diagnosis we're still we're
still struggling but are very excited
about in the robotic space with more
mundane tasks of walking of basic
perception of natural language written
and spoken and then there is the human
tasks which are perhaps completely out
of reach of this pipeline at the moment
is cognition imagination suggests a
subjective experience so high-level
reasoning not just common sense or high
level human level reasoning so let's fly
through this pipeline they're sensors
cameras lidar audio
there is communication that flies to the
air or wired or wireless or wired I am
you measuring the movement of things so
that's the way you think about it that's
the way assuming beings and as any kind
of system that you design you measure
the world you don't just get an API to
the world you need to somehow measure
aspects of this world so that's how you
get the data so that's how you convert
the world into data you can play with
and once you have the data this is the
representation side you have to convert
that raw data of raw pixels of raw audio
raw lidar data you have to convert that
into data that's useful for the
intelligence system for the learning
system to to use to discriminate between
one thing and another for vision that's
finding edges corners object parts and
entire objects
there's the machine learning that I'll
talk about that I've talked about
there's different kinds of mapping of
the representation that you've learned
to an actual outputs there is once you
have this so you have this idea of and
this goes to maybe a little bit of
Simon's question is reasoning this is
something that's out of reach or machine
learning at the moment this is going to
your question then we can we can build a
world-class machine learning system for
taking an image and classifying that
it's a duck I wonder if this will work
wake you up so we could take this is
well studied exceptionally well studied
problem could take audio sample of a doc
and tell that it's a duck
in fact what species of bird it's
incredible how much research there is in
bird species classification and you can
look at video and we could tell that we
can do extra recognition that it's just
swimming but we can't do with learning
now is reason that if it looks like a
duck it swims like a duck and quacks
like a duck is very likely to be a duck
this is the reasoning problem this is
the task that I personally am obsessed
with and that I hope that machine
learning can close and then there is the
planning action and the effectors
so this is another place where machine
learning has not had many strides
there's mechanical issues here that
incredibly difficult the degrees of
freedom with all the actuators involved
with all the just just the ability to
localize every party yourself in this
dynamic space where things are
constantly changing when there's degrees
of uncertainty when there's noise just
that basic problem is exceptionally
difficult
let me just pose this question we talked
about how machine what machine learning
can do with the cats and the duck we
could do that given a representation it
could predict what's in the image but
one of the open questions is and deep
learning has been able to do the feature
extraction the representation learning
this is the big breakthrough that
everybody's excited about but can also
reason these are the open questions in a
reason can it do the planning in action
and as human beings do can it close the
loop entirely from sensors to effectors
so learn not only the brain but the way
you sense the world and the way you
affect the world
it the so the question was about the
pong game thank you talk to it a little
longer it it doesn't get punished when
it doesn't detect the ball this is the
beautiful thing it gets punished only at
the very end of the game for losing the
game and gets her water for winning the
game so it knows nothing about that ball
and it learns about that ball that's
something you really sit and think about
has like how do as human beings imagine
if you're playing with a physical ball
how do you learn what a ball is you you
get hurt by it you like squeeze and you
throw it you feel the dynamics of it the
physics of it and nobody tells you about
what a ball is you're just using the raw
sensor input we take you for granted
and maybe this is what I can end on is
this is what's something Jonathan
brought up is we take the simplicity of
this task for granted because we've been
we've had eyes we broadly speaking as
living species on planet Earth there's
eyes have been evolved for 540 million
years so we have 540 million years of
data we've been walking for close to
that bipedal mammals we have been
thinking only very recently so a hundred
thousand years versus a hundred million
years and that's why we can't some of
these problems that we're trying to
solve you can't take for granted how
actually difficult they are so for
example this is the marvex paradox the
Jonathan brought up is that the easy
problems are hard the things would think
are easy actually really hard this is
state-of-the-art robot on the right
playing soccer and that was a
state-of-the-art human on the left
playing soccer
and I'll give it a second the question
was you know there's a fundamental
difference between the way with train
your networks and the way we've trained
biological neural networks for evolution
by discarding through natural selection
a bunch of the the the the neural
networks that didn't work so well
that's so first of all the process of
evolution is I think not well understood
meaning sorry the raw huh says he
careful here the role of evolution in
the evolution of our cognition of our
intelligence I don't know if that's so
this is an open question
so maybe clarify this point his neural
networks artificial neural networks are
fixed for the most part in size this is
exactly right it's like a single human
being that gets to learn we don't have
mechanisms of of modifying or revolving
those neural networks yet although you
could think of researchers as doing
exactly that
there you have grad students working on
different neural networks and the ones
that don't do a good job don't get
promoted and get a good you know there
is a natural selection there but other
than that it's a it's an open question
stay tuned and keep your head up because
the future I believe is really promising
and the slides will be made available
for sure
I think a lot of the explorations of
what it means to build an intelligent
machine has been in sci-fi movies we're
now beginning to actually make it a
reality this is Space Odyssey to keep
with that theme in the previous lecture
go ahead this is as opposed to the
dreamlike monolith view when the
astronaut is gazing out into the open
sky at the stars we're going to look at
the practice of AI today and how we go
if you're familiar with the movie when
this new technology appeared before our
eyes in we're full of excitement how we
transfer that into actual practical
impact on our lives to quickly review
what we talked about last time we I
presented the technology and asked the
question of whether this technology
merely serves a special purpose to
answer specific tasks that can be
formalized or whether it can be through
through the process of transferring the
knowledge learned on one domain be
generalizable to where an intelligent
system that's trained in a small domain
can be used to achieve general
intelligent tasks like we do as human
beings the this is kind of a stack of
artificial intelligence of going from
all the way up into the top of the
environment the world
the sensors sets the data the the
intelligence system the way it perceives
this world then once you have this you
convert the world into some numbers you
able to extract some representation of
that world and this is where machine
learning starts to come into play and
then there's the part where I rate I
will raise it again today is can machine
learning be doing the following steps to
that we can do very well as human beings
is the reasoning step you know you can
tell the difference in a cat and a dog
but can you now start to reason about
what it means to be alive what it means
to be a cat with living creature and
what it means to be this kind of
physical object or this kind of physical
object and take what's called common
sense things we take for granted start
to construct models of the world through
reasoning Descartes I think therefore I
am
we want our neural networks to
come up with that on their own and once
you do that action you'll go right back
into the world you start acting in that
world so the question is can machine
learning can this be learned from data
or does do experts need to encode the
knowledge of reasoning the knowledge of
actions the set of actions that's kind
of the question open questions I raise
it continues throughout the talk today
and so as we start to think about how
artificial intelligence especially
machine learning as it realizes itself
through robotics gets to impact the
world we start thinking about what are
the easy problems what are the hard
problems and it seems to us that vision
and movement walking is easy because
we've been doing it for millions of
years hundreds of millions of years and
thinking it's hard reasoning is hard I
propose to you that it's perhaps because
we've only been doing it for a short
time and so so think we're quite special
because we're able to think so we have
to kind of question of what it's easy
and what is hard because when we start
to develop some of these systems and
what you start to realize that all these
problems are equally hard so the problem
of walking that we take for granted the
actuation and the physical the ability
to recognize where you are in the
physical space the sense the world
around you to deal deal with the
uncertainty of the perception problem
and then so all of these robots by the
way this is for the most recent DARPA
challenge which MIT was also part of and
so what what are these robots doing they
they don't have any they only have
sparse communication with human beings
on the periphery so most of the stuff
they have to do autonomously like get
inside a car this is an MIT robot
unfortunately that they have to get in
the car and the hardest tasks they have
to get out of the car that's walking so
this kind of raises to you the very real
aspect here you want to build
applications that actually work in the
real world and that's the first
challenge an opportunity here
than many of the technologies we talked
about currently crumble under the the
reality of our world when we transfer
them from a small data set in the lab to
the real world for the computer vision
is perhaps one of the best illustrations
of this computer vision is the task as
we talked about of interpreting images
and so when you there's been a lot of
great accomplishments on interpreting
images cats versus dogs now when you try
to create a system like the Tesla
vehicle that I've often that we work
with and I always talk about is it's a
vision based robot right as radar for
basic obstacle avoidance but most of the
understanding of the world comes from a
single monocular camera now they've
expanded the number of cameras but for
the most time there's been a hundred
thousand vehicles driving on the roads
today with a single essentially a single
webcam so when you start to do that you
have to perform all of these extraction
of texture color optical flow so the the
movement through time temporal dynamics
of the images you have to construct
these patterns construct the
understanding of objects and entities
and how they interact and from that you
have to act in this world and that's all
based on this computer vision system so
it's no longer cats versus dogs it's
it's a huge detection of pedestrians or
the wrong classification the wrong
detection is the difference between life
and death so let's look at cats those
were things a little more comfortable so
computer vision and I would like to
illustrate to you why this is such a
hard task which we've talked about we've
been doing it for 500 million years so
we think it's easy computer vision is
actually incredible so all you're
getting with your human eyes is you're
getting essentially pixels in there's
light coming into your eyes and all
you're getting is the reflection from
the different surfaces in here of light
and there's perception they're sensors
inside your eyes can
burning that into numbers it's really
very similar to this numbers in this in
the case of what we use with computers
RGB images or the individual pixels that
are numbers from 0 to 255 so 256
possible numbers and there's just a
bunch of them and that's all we get we
get a collection of numbers where
they're spatially connected ones that
are close together are part of the same
object so cat-cat pixels are all
connected together that's the only thing
we have to help us but the rest of it is
just numbers intensity in hours and we
have to use those numbers to classify
what's in the image and if you really
think about it this is a really
difficult task all you get is these
numbers how the heck are you supposed to
form a model of the world with which you
can detect pedestrians with a with
really 99.99999% accuracy because these
pedestrians are these cars are cyclists
in the car context or any kind of
applications you're looking at even if
your job is in the factory floor to
detect the the defective gummy bears
they're flying past that like a hundred
miles an hour
your task is you don't want that bad
gummy bear to get by that your product
and the the brand will be damaged
however serious are not serious your
application is what you have to be you
have to have a computer vision system
that deals with all of these aspects
viewpoint variation scale variation no
matter the size of the object is still
the same object then no matter the
viewpoint from which area you look at
that object is still the same object the
lighting that moves with lighting
consistent here because we're indoors
but when you're outdoors or you're
moving the scene is moving the lighting
the complexity of the lighting
variations is incredible from the
illumination to just the movement of the
different objects in the scene I think
about you and this particular one it's
Twilight and the light is changing I
think you know almost every time I Drive
there's one or two
things that I see there really that I'm
drawing like 200 million years in order
to be able to figure out it's not it's a
guy who's open his car door and I can't
see him but I can just see the light
doesn't look quite right on that side of
the road and I'm yeah somehow I know I
might in my mind it's a person but it
seems like a almost impossible problem
for the machines to get right I will
argue that that the pure perception task
is too hard that you come to the table
as human beings with all this huge
amount of knowledge that you're not
actually interpreting all the complex
lighting variations that you're seeing
you actually know enough about the world
enough about your commute home enough
about the way the kinds of things you
would see in this world about Boston
about the way pedestrians move there's a
certain light of day you bring all that
to the table that makes the perception
task doable and that's one of the big
missing pieces in the technology as I'll
talk about that's the open problem of
machine learning it's how to bring all
that knowledge
first of all build that knowledge and
then bring that knowledge to the table
as opposed to starting from scratch
every time and so Katz the promise gets
okay so the to me occlusion for most of
the computer vision community this is
one of the biggest challenges and it
really highlights how far we are from
being able to reason about this world
occlusions are when what what an
inclusion is is when the objects you're
trying to detect something about
classify the object detect object the
object is blocked partially by another
object in front of them this is
something you think it's trivial perhaps
you don't even really think about it
because we we reason a three-dimensional
way but the occlusion aspect is is makes
makes perception incredibly difficult so
we have to design is think about this so
this image is converted into numbers and
we for the task of detecting is there a
cat in this image yes or no you have to
be able to reason about this image with
object in the scene most of us are able
to very easily detect if there's a cat
in this image we're able to detect that
there is a cat in this image now think
about this there's a single eye and
there's an ear so you have to think
about what is it part of our brain that
allows us to understand to suppose that
with some high degree of accuracy that
there's a cat here in this picture I
mean the degree of occlusion here is
immense
and so I promise so this is for most of
you some of you will think this is in
fact a monkey eating a banana but I
would venture to say that most of us are
able to tell it's nevertheless a cat you
watch this for hours and so let me give
you another this is kind of a paper
that's often cited our set of papers to
illustrate how difficult computer vision
is how thin the line that we're walking
with all of these impressive results
that we've been able to show recently in
the machine learning community in this
case for deep neural networks are easily
fooled paper the seminal paper at this
point shows that when you apply network
trained on imagenet so basically on
detecting cats versus dogs or different
categories in inside images if you're
you can find an arbitrary number of
images that look like noise up in the
top row where the algorithm used to
classify those images in the image net
of cat versus dog is able to confidently
say with 99.6% accuracy or above that
it's seeing a robin or a cheetah or an
armadillo or a panda you know in that
noise
so it's confidently saying given this
noise that that's obviously a robin so
you have to realize that the kind of
this is patterns the kind of processes
it's using to understand what's
containing the
image is purely a collection of patterns
that it has been able to extract from
other images that has been human
annotated by humans and that perhaps is
very limiting to trying to create a
system that's able to operate in the
real world this is a very sort of this
is very clean illustration of that
concept and the same you can confidently
predict and those images below where
there are strong patterns it's not even
noise strong patterns that have nothing
to do with the entities being detected
again confidently that same algorithm is
able to see a penguin a starfish a
baseball in the guitar in the in that
noise a more serious for people
designing robots like myself in the on
the sensor side you can flip that and
say I can take a image and I can distort
it with some very little amount of noise
and if that if that noise is applied to
the image I can completely change the
confident prediction about what's in
that image so to explain what's being
shown so on the left and the column on
the left and again here what's the the
same kind of neural network is able to
predict accurately confidently that
there is a dog in that image but if we
apply just a little bit of noise to that
image to produce that image
imperceptible to our human eyes the
difference between those two the same
algorithm is is saying that there is
confidently in an ostrich in that image
so another thing to really think about
that noise can have such a significant
impact on the prediction of these
algorithms this is really really quite
honestly out of all the things I'll say
today and I'm aware of one of the
biggest challenges of machine learning
being applied in the real world is
robustness how much noise can you add
into the system before everything falls
apart so how do you validate sensors so
say a car company has to produce a
vehicle and it has
sensors in that vehicle how do you know
that that those sensors will not start
generating slight noise due to
interference of various kinds and
because of that noise instead of seeing
a pedestrian you will see nothing or the
opposite you'll see pedestrians
everywhere so of course the most
dangerous is when it will not see an
object and collide with it in the case
of cars
there's also spoofing which a lot of
people as always with security people
are really concerned about and perhaps
people here are really concerned about
this issue I think this is a really
important issue but because you can
apply noise and convince the system that
you're seeing an ostrich when there is
in fact no ostrich you can do the same
thing in a in an attacking way so you
can attack the sensors of a car and make
it believe like with lidar spoofing so
spoof lidar radar or ultrasonic sensors
to believe that you're seeing
pedestrians when they're not there and
the opposite to hide pedestrians make
pedestrians invisible to the sensor when
they're in fact there so whenever you
have Indulgence systems operating in
this world they become susceptible to
the fact that everything so much of the
work is done in software and based on
sensors so at any point in the chain if
there's a failure you have to be able to
detect that failure and right now we
have no mechanisms for automatically
detecting that failure so on the data
side so one challenge is that we're
constantly dealing with is that we are
the algorithms in machine learning
algorithms that we're using our need
labeled data and we have very little
labeled data labeled data again is when
you have pairs of input data and the
ground truth the the true label
annotation class that that image belongs
to or concept and the it doesn't have to
be an image it could be any source of
data it's a really costly process to do
so because it's so costly we
rely every breakthrough we've had so far
relies on that label data and because of
its cost we don't have much of it so all
the problems that come from data can
either be solved by having a lot more of
this data which I believe is most people
believe it's too challenging it's too
challenging to have human beings
annotate huge amounts of data or we have
to develop algorithms that are able to
do something with the unlabeled data its
the unsupervised semi-supervised
sparsely supervised reinforcement
learning as we talked about last time I
mention again here so one way you
understand something about data when you
don't have labels is you reason about it
all you're given is a few facts when
you're a baby your parents give you a
few facts and you go into this world
with those facts and you grow your
knowledge graph your knowledge base your
understanding of the world from those
few facts we don't have a good method of
doing that an automated unrestricted way
the inefficiency of our learners the
machine learning algorithms I've talked
about the neural networks need a lot of
examples of every single concept that
they're given in order to learn anything
about them thousands tens of thousands
of cats are needed to understand what
the spatial patterns at every level the
representation of a cat the visual
representation would cap we don't we
can't do anything with a single example
there's a few approaches but nothing
quite robust yet and we haven't come up
with a way this is also possible to make
annotation this labeling process somehow
be very cheap so leveraging this is
something being called human computation
that term has fallen out of favor a
little bit one of my big passions is
human computation is using something
about our behavior something about what
we do in this world online or in the
real world to annotate data
automatically so for example as you
drive which is what we do everybody has
to draw
and we can collect data about you
driving in order to train self-driving
vehicles to to to drive and that's a
free annotation so here are the
annotated data sets we have the
supervised learning data sets
there's many but these are ones some of
the more famous ones from the very from
the toy data sets of M NIST
- the large broad arbitrary categories
of images data sets and there which is
what image net is and there's in
healthcare there's an audio there's an
video there's are you know there's a
huge number of data sets now but each
one of them is usually in the scale of
hundreds of thousands millions tens of
millions not billions or trillions which
is what we need to create systems that
operate in the in the real world and
again these are the kinds of machine
learning algorithms we have there's five
listed here the teachers on the left is
what is what is the input to the system
that requires to Train it from the
supervised learning at the very top is
what we have all of our successes and
everything else is where the promise
lies the semi-supervised the
reinforcement or the fully unsupervised
learning where the input from the human
is very minimal and another way to think
about this so every whenever you think
about machine learning today
whenever somebody talks about machine
learning what they're talking about is
systems that memorize that memorize
patterns and so this is one of the big
criticisms of the current machine
learning approaches where all they're
doing is you're providing there only as
good as the human annotated data that
they're provided we don't have
mechanisms for actually understanding
you can pause and think about this in
order to create an intelligent system it
shouldn't just memorize it should
understand the representations inside
that data in order to operate in that
world and that's the open question one
of them and one of the challenges and
opportunities for machine learning
researchers today is to extend machine
learning
memorization to understanding this is
that duck the reasoning if you get
information from the perception systems
that it looks like a duck from the audio
processing that it quacks like a duck
and then from video classification that
it the activity recognition that it
swims like a duck
the reasoning step is how to connect
those facts to then say that it is in
fact a duck okay so that's on the
algorithm side and the data side now
this is one of the reasons compute
computational power computational
hardware that is at the core of the
success of machine learning so our
algorithms have been the same since the
60s since the 80s 90s depending on how
you're counting the big breakthroughs
came and compute so there's Moore's law
most of you know the way our the CPU
side of our computers works for a single
CPU is that it's for the most part
executing a single action at a time in a
sequence so sequential very different
from our brain which is a massively
parallel eyes system so because it's
sequential the clock speed matters
because that's how fast essentially
those instructions are able to be
executed and so we're we're leveling off
physics is stopping us from continuing
Moore's Law so Intel AMD are
aggressively pushing this Moore's law
forward but and there's some promise
that it will actually continue for
another ten or fifteen years then
there's another form of parallelism
massive parallelism is the GPU and this
is this is essential for neural networks
this is essential to the success recent
success of neural networks is the
ability to utilize these inherently
parallel architectures of graphics
processing units GPUs the same thing
used for video games this is the this is
the reason Nvidia stock
doing extremely well is is GPUs so it's
parallelism of basic computational
processes that make machine learning
work on the GPU one of the limitations
of GPUs one of the challenges is in
bringing them to in scaling and bringing
them into real-world applications this
power usage its power consumption and so
there is a lot of specialized chips
specialized just from the neural network
architectures coming out from Google
with their tensor processing unit from
IBM Intel and so on it's unclear how far
this goes so this is sort of the
direction of trying to design an
electronic brain so it has the
efficiency our human brain is
exceptionally efficient at running the
neural networks in our heads and the
orders of magnitude more efficient than
our computers are and this is trying to
design systems they're able to grow
towards that efficiency why do you care
about efficiency for several reasons one
of course as I'm sure will talk about
throughout this class is about the thing
in our smart phones battery usage and
this is the big one community I think I
think it could be attributed to the big
breakthroughs in machine learning
recently in the last decade is the you
know compute as important algorithm
development is important but it's the
community of nerds global this is global
artificial intelligence and I will show
in several ways why global is essential
here is is tens of hundreds of thousands
millions of programmers Mechanical
Engineers building robots building
intelligent systems building machine
learning algorithms the exciting nature
of the growth of the community perhaps
is the key for the future to unlock in
the power of machine learning so this is
just one example github is a repository
for code and this is showing on the
y-axis at the bottom is 2008 one github
first open
Institute going up to 2012 quick near
exponential growth of the number of
users participating and the number of
repositories so these are standalone
unique projects that are being hosted on
github so this is one example I'll show
you about this competition that we're
recently running and then I'll challenge
people here to participate in this
competition if you dare so this is a
chance for you to build a neural network
in your browser so you can do this on
your phone
later tonight of course on your phone
you can specify various parameters of
the neural network specify different
numbers of layers and the depth the
depth of the network the number of
neurons in network the type of layers
and it's pretty it's pretty
self-explanatory it's super easy in
terms of just tweaking little things and
remember machine learning to a large
part is an art at this point it's a more
perhaps than even you know more than a
well understood theoretically bounded
science which is one of the challenges
but it's also an opportunity deep
traffic is a chance so we've all been
stuck in traffic
there you go Americans spend 8 billion
hours stuck in traffic every year
that's our pitch for this competition so
deep neural networks can help and so you
have a neural network that drives that
little car with an MIT logo red one on
this highway and tries to weave in and
out of traffic to get to his destination
and trying to achieve a speed of 80
miles an hour which is the speed limit
which is a physical speed limit of the
car of course the actual speed limit of
the road is 65 miles an hour but we
don't care about that we just want to
get to work as quickly as possible at
home so what the basic structure of this
game is and I want to explain this game
a little bit and then tell you how
incredibly popular it's gotten and how
incredibly powerful the networks that
people built from all over the world the
community has built of this over a
single month is incredible and this
happens for thousands of projects out
there now another challenging
opportunity ok so you may have seen this
this is kind of ethics
most engineers most I don't like I love
the love philosophy but this kind of
construction of ethics that's often
presented here is one that is not
usually concerned to engineering so what
is this question you know when you have
a car you have a bunch of pedestrians do
you hit the larger group of pedestrians
or the smaller group of pedestrians do
you avoid the group of pedestrians but
put yourself into danger these kinds of
ethical questions of an intelligent
system it's a very interesting question
it's it's one that we can debate and
there's really no good answer quite
honestly but it's a problem that both
humans and machines struggle with and so
it's not interesting on the engineering
side we're interested with problems that
we can solve on the engineering side so
the kind of problem that I am obsessed
with and very interested in is the
real-world problem of controlling a
vehicle through this space so there's it
happens in in a few seconds here so this
is a Manhattan New York intersection
right
this is pedestrians walking perfectly
legally I think they have a green light
of course there's a lot of jaywalking
too as well well this car just slide
it's not part of the point but yes
exactly there's an ambulance and so
there's another car that starts making a
left turn in a little bit I may have
missed it hopefully not so yeah and then
there's another car after that too that
just illustrates when you design an
algorithm that's supposed to move
through the space like watch this car
the aggression it shows now this isn't a
trivial example for those that try to
build robots this is this is the real
question is how do you design a system
that's able so you have to think you
have to put reward functions objective
functions utility functions under which
it performs the planning so a car like
that has several thousand candidate
trajectories you can take that
intersection you can take a trajectory
where it speeds up to 60 miles an hour
it doesn't stop and just swerves and
hits everything okay that's a bad
trajectory right then there is a
trajectory which most companies take
which is most a Google self-driving car
and every company that's is concerned
about PR is whenever there's any kind of
obstacle any
kind of risk that's it all reasonable
that you can maybe even touch an
obstacle then you're not going to take
that trajectory so what that means is
you're going to navigate to this
intersection at 10 miles an hour and you
let people abuse you by walking in front
of you because they know you're not
going to stop and so in the middle there
is hundreds thousands of trajectories
that are ethically questionable in the
sense that you're putting other human
beings at risk in order to safely and
successfully navigate to an intersection
and the design of those objective
functions is is the kind of question you
have to ask for intelligent systems fork
for cars is there's no grandma and a few
children you have to choose who gets to
die very very difficult problems of
course but the problem of when I'm very
interested in in streets of Boston
streets of New York is how to gently
nudge yourself through a crowd of
pedestrians in the way we all actually
do when we drive in New York in order to
be able to safely navigate these
environments and these questions come up
in healthcare these questions come up in
Factory in robust in in armed and
humanoid robots that operate with other
human beings and that's one of the big
challenges another sort of funny
illustration that folks that openly I
use often to illustrate well let me just
pause for a second the the gamified
version of this there's a game called
coast runners and you're you're racing
against other boats along this track and
your job is
there's your score here at the
bottom-left number of laps your time and
you're trying to get to the destination
as quickly as possible while also
collecting funky little things like
there's these green these green little
things along the way okay so what
they've done is the bill Denton system
the one the general-purpose one that we
talked about last time that learns oops
that learns how to navigate successfully
through the space so you're trying to
maximize the reward and what this boat
learns to do is instead of finishing the
race it learns to find a loop
it can keep going around and around
collecting those green dots and it
learns the fact that they regenerate
with time so learns to maximize this
score by going around and round now
these are the kinds of things this is
the big challenge of our award functions
of designing systems of designing what
you want your system to achieve is not
only is it difficult to the ethical
questions are difficult but just
avoiding the pitfalls of local optima of
vet figuring out something really good
that happens in the short-term the
greedy what it is that those psychology
experiments of the kid eats the
marshmallow and can't wait for you know
can't delayed gratification this kind of
the idea of delayed gratification in the
case of designing intelligent system was
a huge actual serious problem and this
is a good illustration of that so we
flew through a few concepts here is
there any is there any questions about
some of the compute and the algorithm
side we talked about today yes
so the question was yeah used you
highlighted some of the limitations of
machine computer vision algorithms
machine learning algorithms but you
haven't highlighted some of the
limitations of human beings and if you
put those in a column and you compare
those it's our machines doing better
overall or is there any kind of way to
compare those I mean that there's
actually interesting work on image net
so image net is this categorization task
of where you have to classify images and
you can ask the question when I present
you images of cats and dogs where our
machine is better than humans and when
when are they not so you can compare
when machines do better what are the
fail points and what are the fail points
for humans and there's a lot of
interesting visual perception questions
there I think overall it's certainly
true that machines fail differently than
human beings but in order to make an
artificial intelligence system that's
usable and could make you a lot of money
and people would want to use it has to
be better for that particular task in
every single way in order in order for
you to want to use a system
has to be it has to be superior to human
performance and usually far superior to
human performance so so it's on the
philosophical level it's an interesting
thing to compare what are we good at
what are not but if you're using Amazon
echo your voice recognition or any kind
of natural language chatbots or a car
you're not gonna be well this car is not
so good with pedestrians but I
appreciate the fact that you can stay in
the lane fortunately you have a very
high standard for every single thing
that you're good at and it has to be
superior to that I I think maybe maybe
that's unfair to the robots I'm more of
the nerd that makes the technology
happen but it's certainly on the
self-driving car aspect policy is
probably the biggest challenge and I
don't think there's good answers there
some of those ethical questions that
come up well it's it's it feels like so
we work a lot with Tesla in Drive so I'm
driving a Tesla round every day and
we're playing around with it and
studying human behavior inside Tesla's
and it seems like there's so much hunger
amongst the media to jump on something
and it feels like a very shaky PR
terrain a very shaky policy terrain
we're all walking because we have no
idea how how we coexist with intelligent
systems and so and and then of course
government is nervous because how to
regulate the shaky terrain and
everybody's nervous and excited so I'm
not sure there's no same kind of
question to Jason a moment thanks a lot
legs for another great session
[Applause]