File TXT tidak ditemukan.
Stephen Wolfram: Computational Universe | MIT 6.S099: Artificial General Intelligence (AGI)
P7kX7BuHSFI • 2018-03-02
Transcript preview
Open
Kind: captions
Language: en
welcome back to success $0.99 artificial
general intelligence today we have
Stephen Wolfram Wow
that's the first I didn't even get
started you're already clapping in his
book a new kind of science he has
explored and revealed the power beauty
and complexity of cellular automata as
simple computational systems for which
incredible complexity can emerge it's
actually one of the books that really
inspired me to get into artificial
intelligence he's created the Wolfram
Alpha competition knowledge engine
created Mathematica that has now
expanded to become Wolfram language both
he and his son were involved in helping
analyze create the alien language from
the movie arrival of which they use the
Wolfram language please again gives
Steven a warm welcome boy so I gather
the brief here is to talk about how
artificial general intelligence is going
to be achieved is that they set the
basic picture so I maybe I'm reminded of
kind of a storage I don't think I've
ever told in public but that something
that happened just a few buildings over
from here so this was 2009 and Wolfram
Alpha was was about to arrive on the
scene I assume most of you have used
wolf now for a scene wolf alpha yes the
how many of you've used wolf alpha ok
that's good so I had long been a friend
of Marvin Minsky's and Marvin was a sort
of pioneer of the AI world and I kind of
seen for years you know question
answering systems that tried to do sort
of general intelligence question
answering and so at Marvin and so I was
going to show Marvin you know Wolfram
Alpha he looks at it and he's like okay
that's fine whatever said no Marvin this
time it actually works
you can try real questions this is
actually something useful this is not
just a toy and it was kind of
interesting to see it took took about
five minutes for Marvin to realize that
this was finally a
to an answering system that could
actually answer questions that were
useful to people and so one question is
how did we how do we achieve that so you
know you go to Wolf's malphur and you
can ask it I mean it's I don't know what
we can ask it I don't know what's the
some random question what is the
population of Cambridge actually here's
a question / let's try that what's the
population of Cambridge is probably
going to figure out that we mean
Cambridge Massachusetts it's going to
give us some number it's gonna give us
some plot actually what I want to know
is number of students at MIT divided by
population of Cambridge see if it can
figure that out and okay it's kind of
interesting right oh no that's / ah
that's interesting a guest that we were
talking about Cambridge University as
the as the denominator there so it says
the number of students at MIT divided by
the number of students at Cambridge
University that's interesting I'm
actually surprised let's see what
happens if I say Cambridge MA there now
as it probably fail horribly no that's
that's good okay so no that's
interesting that's a plot as a function
of time of the fraction of the of okay
so anyway so I'm glad it works the so
one one question is how did we manage to
get so that many things have to work in
order to get stuff like this to work you
have to be able to understand the
natural language you have to have that
data sources you have to be able to
compute things from the data and so on
one of the things that was a surprise to
me was in terms of natural language
understanding was the critical thing
turned out to be just knowing a lot of
stuff the actual pausing of the natural
language is kind of I think it's kind of
clever and we use a bunch of ideas that
came from my new kind of science project
and so on but I think the most important
thing is just knowing a lot of stuff
about the world is is really important
to actually being able to to understand
natural language in a useful situation I
think the other thing is having
actually having access to lots of data
let me show you a typical example here
of what is needed so I asked about the
ISS and hopefully it'll wake up and tell
us something here come on what's going
on here there we go okay so it figured
out that we probably are talking about a
spacecraft not a file format and now
it's going to give us a plot that shows
us where the ISS is right now so to make
this work we obviously have to have some
feed of you know radar tracking data
about satellites and so on which we have
for every satellite that's that's out
there but then that's not good enough to
just have that feed then you also have
to be able to do celestial mechanics to
work out well where is the ISS actually
right now based on the orbital elements
that have been deduced from radar and
then if we want to know things like okay
when is it going to it's not currently
visible from Boston Massachusetts it
will next rise at 7:30 6:00 p.m. on
Monday on today so you know this
requires a mixture of data about what's
going on in the world together with
models about how the world is supposed
to work being able to predict things and
so on and I think another thing that
kind of realized about about AI and so
on from the wolfman alpha effort has
been that you know one of the earlier
ideas for how one would achieve AI was
let's make it work kind of like brains
do and let's make it figure stuff out
and so if it has to do physics let's
have it do physics by pure reasoning
like you know people at least used to do
physics but in the last 300 years we've
had a different way to do physics that
wasn't sort of based on natural
philosophy it was instead based on
things like mathematics and so one of
the things that we were doing in in
Wolfman alpha was to kind of cheat
relative to what had been done in
previous AI systems which was instead of
using kind of reasoning type methods
we're just saying okay we want to
compute where the ISS is going to be
well we've got a bunch of equations of
motion that corresponds to differential
equations we're just going to solve the
equations of motion and get an answer
that's kind of leveraging the last 300
years or so of
of exact science that have been done
rather than trying to make use of kind
of human reasoning ideas and I might
might say that in terms of the the
history of the wolf malphur project when
I was a kid a disgustingly a long time
ago I was interested in AI kinds of
things and I in fact I was kind of upset
recently to find a bunch of stuff I did
when I was 12 years old kind of trying
to assemble a pre version of Wolfram
Alpha way back before it was
technologically possible but it's also a
reminder that one just does the same
thing once whole life so to speak at
some level um but what happened was when
when I am I started off working mainly
in physics and then I got involved in
building computer systems to do things
like mathematical computation and so on
and I then sort of got interested in
okay so can we generalize this stuff and
can we can we really make systems that
can answer sort of arbitrary questions
about the world and for example sort of
the the the the promise would be if
there's something that is systematically
known in our civilization make it
automatic to answer questions on the
basis of that systematic knowledge and
back in the in around late 1970s early
1980s my conclusion was if you want to
do something like that the only
realistic path to being able to do it
was to build something much like a brain
and so I got interested in neural nets
and I tried to do things with neural
nets back in 1980 and nothing very
interesting happened well I couldn't get
him to do anything very interesting and
that um so I kind of had the idea that
that the only way to get the kind of
thing that now exists in alpha for
example was to build a brain like thing
and then many years later for reasons I
can explain I kind of came back to this
and realized actually it wasn't true
that you had to build a brain like
things sort of mere computation was
sufficient and that was kind of what got
me started actually trying to build
Wolfram Alpha when we started building
wolf malphur one of these I did was go
to a sort of a field trip to a big
reference library and you know you see
all these shelves of books and so on and
the question is can we take all of this
knowledge that exists in all of these
books and actually automate being able
to answer questions on the base
Javad and I think we've pretty much done
that for that at least the books you
find in a typical reference library so
that was it looked kind of daunting at
the beginning because it's this there's
a lot of knowledge and information out
there but actually it turns out there
are a few thousand domains and we've
steadily gone through and worked on
these different domains another feature
of the worth mouthful project was that
we didn't really you know I've been
involved a lot in doing basic science
and in trying to have sort of grand
theories of the world one of my
principles in building Wolfram Alpha was
not to start from a grand theory of the
world that is not to kind of start from
some global ontology of the world and
then try and build down into all these
different domains but instead to work up
from having you know hundreds then
thousands of domains that actually work
whether they're you know information
about cars or information about sports
or information about movies or whatever
else how each of these domains sort of
building up from the bottom in each of
these domains and then finding that
there were common themes in these
domains that we could then build into
frameworks and then sort of construct
the whole system on the basis of that
and that's kind of that's kind of how
its worked and I can talk about some of
the actual frameworks that we end up
using and so on but maybe I should
explain a little bit more so so one
question is how does how does Wolf's
mouth actually sort of work inside and
the answer is it's a big program it's
about it's the core system is about 15
million lines of Wolfram language code
and it's some number of terabytes of raw
data and so the the way the thing that
sort of made building wolf now for
possible was this language wolf and
language which started with Mathematica
which came out in 1988 and has been sort
of progressively growing since then so
maybe I should show you some things
about both language and and you know
it's easy you can you know use this mit
has a site license for it you can use it
all over the places you can find it on
the web but cetera etc etc but okay
the basics work the let's let's start
off with something like let's make a
random graph and let's say we have
a random graph with two hundred nodes
400 vertices okay so there's a random
graph a first important thing about
wolfing language is it's a symbolic
language so I can just pick up this
graph and I could say you know I don't
do some analysis of this graph that
graph is just a symbolic thing that I
can just do computations on oh I could
say let's let's get a another good thing
to always do is get a current image see
there we go and now I could go and say
something like let's let's do some basic
thing let's say let's edgy detect that
image again this this image is just a a
thing that we can manipulate we could
take the image we could make it I don't
know we could take the image and
partition it little pieces do
computations on that I don't know simple
let's do let's just say sort each row of
the image assemble the image again
whoops assemble that image again we'll
get some some mixed up picture there if
I wanted to I could for example let's
say let's make that the current image
and let's say make that dynamic now I
can be just running that code hopefully
and little loop and there we can make
that work so the you know one one
general point here is there's you know
this is just an image for us is just a
piece of data like anything else if we
just have a variable a thing called X it
just says okay that's X I don't need to
know particular value it's just a
symbolic thing the corresponds to that's
a thing called X now you know what gets
interesting when you have a symbolic
language and so on is we're interested
in having it represent stuff about the
world as well as just abstract kinds of
things that many you know I can
abstractly say you know find some funky
integral I don't know what you know
that's then representing using symbolic
variables to represent algebraic kinds
of things but I could also just say I
don't know something like Boston and
Boston is another kind of symbolic thing
that has if I say what what is it really
inside that's it's the
today a City Boston Massachusetts United
States actually noticed when I type that
in I was using natural language to type
it in and it gave me a bunch of
disambiguation here it said assuming
Boston is a city assuming Boston
Massachusetts use Boston New York or
okay there's let's use let's use Boston
and the Philippines which I've never
heard of but but um let's try using that
instead and now if I look at that it'll
say it's Boston in some province of the
Philippines etc etc now I might ask it
of that I could say something like
what's the population of that and it um
okay it's a fairly small place or I
could say for example let me let me do
this let me say a geo list plot from
that Boston
let's take from that Boston - and now
let's type in Boston again and now let's
have it used the default meaning of the
word of Boston and then let's join those
up and now this should plot this should
show me a plot there we go okay so
there's the path from the Boston that we
picked in the Philippines to the Boston
here oh we could ask you don't know I
could just say I could ask it the
distance from one to another or
something like that so the the one of
the things here one things we found
really really useful actually in
language was first of all there's a way
of representing stuff about the world
like cities for example or let's say I
want to say let's let's do this let's
say let's do something with cities let's
say capital cities in South America okay
so notice this is a piece of natural
language this will get interpreted into
something which is precise symbolic
wolfram language code that we can then
compute with and that will give us the
citizens out the capital cities in South
America I could for example let's say I
say find shortest to US and I'm going to
use some some oops no I don't want to do
that what I want to do first is to say
show me the geo positions of all those
cities on line 21 there so now it will
find the geo positions and now it will
say compute the shortest tour
so that's saying there's a 10,000 mile
traveling salesman tour around those
cities so I could take those cities were
on line 21 and I could say order the
cities according to this and then I
could make another geo list plot of that
join it up and this should now show us a
traveling salesman tour of the of the
capital cities in South America um so
you know it's it's sort of interesting
to see what's involved in making stuff
like this work the one of you know my my
goal has been to sort of automate as
much as possible about things that have
to be computed and that means knowing as
many algorithms as possible and also
knowing as much data about the world as
possible and I kind of view this as sort
of a knowledge-based programming
approach where you have you know a
typical kind of idea in programming
languages is you know you have some
small programming languages has a few
primitives that are pretty much tied
into what a machine can intrinsically do
and then maybe you'll have libraries
that add on to that and so on
my kind of crazy idea of many many years
ago has been to build an integrated
system where all of the stuff about
different domains of knowledge and so on
are all just built into the system and
and designed in a coherent way I mean
this has been kind of the story of my
life for the last thirty years is trying
to keep the design of the system
coherent even as one adds all sorts of
different areas of of capability so as
some I mean we can go and dive into all
sorts of different kinds of things here
but maybe as an example well let's do
what could we do here we could take come
let's try how about this is that a bone
I think so that's a bone so let's try
that as a mesh region see if that works
so this will now use a completely
different domain of human endeavor okay
oops there's two of those bones let's
try let's just try them let's try
humorous let's try the that the mesh
region for that and now we should have a
bone here okay there's a there's a
representation of a bone let's take that
bone and we could for example say let's
take the surface area of that as in some
some units or I could let's do some much
more outrageous thing let's say we take
region distance so we're going to take
the distance from some from that bone to
a point let's say 0 0 Z and let's make a
plot of that distance with Z going from
let's say I don't have no idea where the
where the spawn is but let's try
something like this so that was really
boring um let's try them so what this is
doing again a whole bunch of stuff has
to work in order for this to operate
this has to be this is a this is some
region in 3d space that's represented by
some mesh you have to compute you know
do the computational geometry to figure
out where it is if I want it to let's
try anatomy anatomy plot 3d and let's
say something like left hand for example
and now it's going to show us probably
the complete data that it has about the
geometry of the left hand there we go ok
so there's there's the results and we
could take that apart and start
computing things from it and so on so
what um so this this is some so there's
a there's a lot of kind of computational
knowledge that's built in here one let's
talk a little bit about kind of the
modern machine learning story so for
instance if I say let's get a picture
here let's say um let's let's just say
picture of symbol got a favorite kind of
animal what's Panda okay so let's try ok
giant panda
okay okay there's a panda let's see what
now let's try saying um let's try for
this panda let's try saying image
identify and now here we'll be
embarrassed probably but let's just see
let's see what happens if I say image
identify that and now it'll hopefully
wake up wake up wake up this only takes
a few hundred milliseconds
okay very good giant panda let's let's
see what it's we'll see what the
runners-up were to the giant panda let's
say we want to say the ten runners-up in
all categories for that thing okay so a
giant panda a prop here Ned which I've
never heard of are pandas carniverous
ate bamboo shoots okay so that was so
lucky I didn't get that one it's really
sure it's a mammal and it's absolutely
certain it's a vertebrate okay so you
might ask how did it figure this out and
so then you can kind of look under the
hood and say so we have a whole
framework for representing neural nets
symbolically and so this is the actual
model that it's using to do this so this
is a so there's a neural net and it's
got we can drill down and we can see
there's there's a piece of the neuron
that we can drill down even further to
one of these and we can probably see
what that's a batch normalization layer
somewhere deep deep inside the entrails
of the not panda but of this thing okay
so now let's take that object which is
just a symbolic object and let's feed it
the picture of the Panda and we can see
and there oops I was not giving it the
right thing what did I just do wrong
here okay let's let's take our isolated
okay let's take this thing and feed it
the picture of the Panda and it says a
giant panda okay how about we do
something more outrageous let's take
that neuron that and let's only use the
first let's say 10 layers of the neuron
that so let's just take out 10 layers of
the neuron that's and feed it the Panda
and now what we'll get is something from
the insides of the neuron that and I
could say for example let's just make
those into images okay so that's what
that's what the neuron that
had figured out about the Panda after 10
layers of going through the neuron that
and maybe actually be interesting to see
let's do a feature space plots and now
we're going to of those intermediate
things in the sort of in the brain of
the neuron that sort of speak this is
now taking so what this is just doing is
to do dimension reduction on this space
of images and so it's not very exciting
it's probably mostly distinguishing
these by total gray level but that's
kind of showing us the space of of
different ton of different sort of
features of the insides of the Shinra on
that so it's also what's interesting to
see here is things like the symbolic
representation of the neuron that's and
if you if you're wondering how does that
hatch will work inside it's underneath
it's using a max net which we happen to
have contributed to a lot and there's
sort of a bunch of symbolic layers on
top of that that feed into that and
maybe I can show you here let me show
you how you would train one of these
neural nets that's also kind of fun so
we have a data repository that has all
sorts of useful data one piece of data
it has is a bunch of neuron that
training sets so this is a standard emne
straining set of handwritten digits
okay so there's m missed and you notice
that these things here that's just an
image which i could copy out and i could
do you know let's say I could do color
negate on that image because it's just
an image and there's there's the results
and so on and now I could say let's take
let's take a neuron that like let's take
a simple neuron that like Linette for
example okay so let's take Linette and
then let's take the untrained initial
evaluation Network so this is now a
version of Linette simple standard
neural nets that didn't get trained so
for example if I if I take that that
symbolic representation of Lynette and I
could say net initialize then it will
take that and it'll just put random
weights into Lynette okay so if I take
those random weights and I feed it a
zero here I feed it that image of a zero
it will presumably produce something
completely random in this particular
case - right so now now what I would
like to do is to take this so that was
just randomly initializing the weights
so now what I'd like to do is to take
the emne straining set and I'd like to
actually train Lynette using MMS
training set so let's take let's take
this and let's take a random sample of
let's say I don't know a thousand pieces
of Lynette come on why is it having to
load it again there we go okay so
there's a there's a random sample there
was on line 21 and now let me go down
here and say where was it well look we
can just take this this thing here so
this is the uninitialized version of
Lynette and we can say take that and
then let's say net train of that with
the thing on line 21 which was that
thousand instances so now what it's
doing is its running training on and
that's you see the loss going down and
so on it's running training for for
those thousand instances of Lynette and
it will we can stop it if we want to
actually this is a new display this is
very nice
this is this is a new version of both
languages is coming out next week which
I'm showing you but it's quite similar
to what exists today but because that's
one of the features of running a
software company is that you always run
the the very latest version of things
for better or worse and that's and this
is also a good way to debug it because
supposed to come out next week if I find
some horrifying bug maybe it will get
delayed but let's try them let's sum
let's try this okay now it says it's
zero okay
and so so this is now a trained version
of Lynette trained with that with that
training data um one of the things so
you know we can talk about all kinds of
details of your mats and so on but maybe
I should zoom out to talk a little bit
about bigger picture as I see it so one
question is sort of a question of what
is in principle possible to do with
computation so you know we have as we're
you know we're building all kinds of
things we're making image identifies
we're figuring out those kinds of things
about where the International Space
Station is and so on question is what is
what is in principle
possible to compute and so the you know
one of the places one can ask that
question is when one looks at for
example models of the natural world one
can say you know how do we make models
of the natural world kind of a a
traditional approach has been let's use
mathematical equations to make models of
the natural world a question is if we
want to kind of generalize that and say
well what are all possible ways to make
models of things what can we say about
that question so I spent many years of
my life trying to address that question
and basically what what I've thought
about a lot is that if you want to make
a model of a thing you have to have
definite rules by which the thing
operates what's the most general way to
represent possible rules well in today's
world we think of that as a program so
the next question is well what does the
space of all possible programs look like
and most of the time you know we're
writing programs like Wolfen language is
50 million lines of code and it's a big
complicated program that was for built
for a fairly specific purpose but the
question is if we just look at sort of
the space of possible programs more or
less at random what's out there in the
space of possible program so I got an
interest in many years ago in cellular
automata which are a really good example
of a very simple kind of program so let
me show you an example of one of these
so this is these are the rules for a
typical cellular automaton and this just
says you have a row of black and white
squares and this just says you look at a
black a look at a square say what color
is that square what color left or it's
left and right neighbors decide what
color the square will be on the next
step based on that rule okay so really
simple rule so now let's let's take a
look at what what actually happens if we
use that rule a bunch of times so we can
take that rule the 254 is just the
binary digits that correspond to those
positions in this rule so now I can say
this I could say let's do 50 steps let
me do this sum and now if I run
according to the rule I just defined it
turns out to be pretty trivial it's just
saying if any if any square is if we
start off with a black square if any
square is if any neighboring square is
black make a black square so we've we've
used a very simple program
we've got a very simple results out okay
let's try a different program we can try
changing this we'll get some that's a
program with one bit different now we
get that kind of pattern so the question
is well what happens you might say okay
if you've got such a trivial program
it's not surprising you're just going to
get Trevor a results out so but you can
do an experiment to test that hypothesis
you can just say let's take all possible
programs there are 256 possible programs
that are based on these eight bits here
let's just take well let's just whoops
let's just take come let's say the first
64 of those programs and let's just make
a echo let's just make a table of the
results that we get by running those
first 64 programs here so here we get
the result and what you see is well most
of them are pretty trivially the lake
they start off with one black cell in
the middle and it just tools after one
side occasionally we get something more
exciting happening like here's a nice
nested pattern that we get if we were to
continue it longer it would it would
make you know more detailed nesting but
then my all-time favorite science
discovery if you go on and just look at
these after a while you find this one
here which is rule 30 in this in this
numbering scheme and that's doing
something a bit more complicated you say
well what's going on here you know we
just started off with this very simple
rule let's see what happens maybe after
a while you know if we run rule 30 long
enough it will resolve into something
simpler so let's try running it let's
say 500 steps and that's the whoops
that's the result we get I'd say let's
just make it fullscreen okay
it's aliasing a bit on the projector
there but but you get the basic idea
this is a so this just started off from
one black cell at the top and this is
what it made and that's pretty weird
because all this is you know this is
sort of not the way it's supposed things
are supposed to work because what we
have here is just that little program
down there and it makes this big
complicated pattern here and you know we
can see there's a certain amount of
regularity on one side but for example
the center column
this pattern is for all practical
purposes completely random in fact it
was reused as a random number generator
in Mathematica and Wolfram language for
many years it was recently retired after
after excellent service because we found
a somewhat more efficient one um
the but the so you know what do we learn
from this what we learn from this is out
in the computational universe of
possible programs it's possible to get
even with very simple programs very rich
complicated behavior well that's
important if you're interested in
modeling the natural world because you
might think that there are programs that
represent systems in nature that might
work this way and so on it's also
important for technology because it says
ok let's say you're trying to find a
let's say you're trying to find a
program that's a good random number
generator how are you going to do that
well you could start thinking very hard
and you could try makeup you know you
could try and write down all kinds of
flowcharts about how this random number
generator is going to work or you can
say forget that I'm just going to search
the computational universe for possible
programs and just look for one that
serves as a good random number generator
in this particular case after you've
searched 30 programs you'll find one
that makes a good random number
generator why does it work that's a
complicated story it's not a story that
I think necessarily we can really tell
very well but what's important is that
this is this idea that out in the
computational universe there's a lot of
rich sophisticated stuff that can be
essentially mind for our technological
purposes that's the important thing
whether we understand how this works is
a different matter I mean it's like when
we look at the natural world the
physical world were used to kind of
mining things you know we started using
magnets to do magnetic stuff long before
we understand understood the theory of
ferromagnetism and so on and so
similarly here we can sort of go out
into the computational universe and find
stuff that's useful for our purposes now
in fact the world of sort of deep
learning and neural nets and so on is a
little bit like this it uses the trick
that there's a certain degree of
differentiability there so you can kind
of home in on let's try and find
something that's incremental II better
and for certain kinds of problems that
works pretty well
I think the thing that we've done a lot
I've done a lot it's just sort of
exhaustive search in the computational
universe of possible programs just
search of trillion programs and try and
find one that does something interesting
and useful for you um there's a lot of
things to say about what well actually
in in these search of trillion programs
and find one that's useful let me show
you another example of that um see so I
was interested a while ago in the I have
to look something up here sorry um in C
in boolean algebra and in I was
interested in in the space of all
possible mathematic says um and let me
just see here I I'm not finding what I
wanted to find sorry I was a good
example I should have memorized this but
I haven't so um there we go there it is
um so I was interested in if you just
look at so we talked about sort of
looking at the space of all possible the
space of all possible programs another
thing you can do is say if you're going
to invent mathematics from nothing what
possible axiom systems could be used in
mathematics so I was curious where do
and that again might seem like a
completely crazy thing to do to just say
let's just start enumerate axiom systems
at random and see if we find one that's
interesting and useful but it turns out
once you have this idea that out in the
computational universe or possible
programs there's actually a lot of
low-hanging fruit to be found it turns
out you can apply that in lots of places
I mean the thing to understand is why
why do we not see a lot of engineering
structures that look like this the
reason is because our traditional model
of engineering has been we engineer
things in a way where we where we can
foresee what the outcome of our
engineering steps are going to be and
when it comes to something like this we
can find it out in the computational
universe what we can't readily foresee
what's going to happen we can't do sort
of a step by step design
of this particular thing and so in
engineering and human engineering as
it's been practiced so far most of it
has consisted of building things where
we can foresee step by step what the
outcome of our engineering going to be
and we see that in programs we see that
in other kinds of engineering structures
and so there's sort of a different kind
of engineering which is about mining the
computational universe of possible
programs and it's worth realizing
there's a lot more that can be done a
lot more efficiently by mining the
computational universe of possible
programs than by just constructing
things step by step as a human so for
example if you look for optimal
algorithms for things like I don't know
even something like sorting networks the
optimal sorting networks look very
complicated they're not things that you
would construct by sort of step-by-step
thinking about things with in a kind of
in a kind of typical human way and so
this this idea you know if you're really
going to have computation work
efficiently you are going to end up with
these programs that are sort of just
mined from the computational universe
and one of the issues with mining things
so they're there this makes use of
computation much more efficiently than a
typical thing that we might construct
now one feature of this is it's hard to
understand what's going on and there's
actually a fundamental reason for that
which is in our efforts to sort of
understand what's going on we get to use
our brains our computers our mathematics
or whatever and our goal is this this
particular little program did a certain
amount of computation to work out this
pattern the question is can we kind of
outrun that computation and say oh I can
tell that actually this particular bit
down here is going to be a black black
bit you don't have to go and do all that
computation but it turns out that then
again this will maybe as a digression
which which there's this phenomenon I
call computational irreducibility which
i think is really common and it's a
consequence of this thing I call
principle of computational equivalence
and that the principle of computational
equivalence basically says as soon as
you have a system whose behavior isn't
fairly easy to analyze the chances are
that the computation it's doing is
essentially as sophisticated as it could
be and that has consequences like it
implies that the typical thing like
this will correspond to a universal
computer that you can use to program
anything it also has the consequence of
this computational irreducibility
phenomenon that says you can't expect
our brains to be able to outrun the
computations that are going on inside
the system if there was computational
reducibility then we can expect that
this thing went to a lot of trouble and
did a million steps of evolution but
actually just by using our brains we can
jump ahead and see what the answer will
be computational irreducibility suggests
that isn't the case if we're going to
make the most efficient use of
computational resources we will
inevitably run into computational
irreducibility all over the place it has
the consequence that we get the
situation where we can't readily sort of
foresee and understand what's going to
happen so back to mathematics for a
second so this is just an axiom system
that so I looked for all possible look
through sort of all possible axiom
systems starting off with very really
tiny ones and I asked the question
what's the first axiom system that
corresponds to boolean algebra so it
turns out this this thing here this tiny
little thing here generates all theorems
of boolean algebra it is that it is the
simplest axiom for boolean algebra now
something I have to show you this
because it's a new feature you see they
um if I say find equation or proof let's
say I want to prove commutativity of the
NAND operation I'm going to show you
something here this is going to try to
generate let's see if this works this is
going to try to generate an automated
proof based on that axiom system of that
result so it had 102 steps in the proof
and let's try and say let's look at for
example the proof network here actually
let's look at the proof data set um now
that's not what I wanted I should learn
how to use this shouldn't I um let's see
what I want is the you know proof data
set there we go very good ok so this is
actually let's let's say first of all
let's say the proof graph ok so this is
going to show me the how that proof was
done so
they're a bunch of lemmas that got
proved and from those lemmas those
lemmas were combined and eventually it
proved the result so let's let's take a
look at the let's take a look at what
some of those llamas were okay so here's
the results so after so it goes through
and these are various lemmas it's using
and eventually after many pages of
nonsense it will get to the result okay
each one of these some of these llamas
are kind of complicated there that's
that's that llama it's a pretty
complicated lemma etc etcetera etcetera
so you might ask what on earth is going
on here and the answer is so I first
generated a version of this proof 20
years ago and I tried to understand what
was going on and I completely failed and
it's sort of embarrassing because this
is supposed to be a proof it's supposed
to be you know demonstrating some
results and what we realize is that you
know what does it mean to have a proof
of something what does it mean to
explain how a thing is done you know
what is the purpose of a proof purpose
of a proof is basically to let humans
understand why something is true and so
for example if you go to let's say we go
to wolf now fur and we do you know some
random thing where we say let's do you
know an integral of something or another
it will be able to very quickly in fact
it will take it only milliseconds
internally to work out the answer to
that integral okay but then somebody
whose wants to hand in a piece of
homework or something like that needs to
explain why is this true okay well we
have this handy step-by-step solution
thing here which
explains why it's true now the thing I
should admit about the step-by-step
solution is it's completely fake that is
the steps that are described in the step
by step solution have absolutely nothing
to do with the way that internally that
integral was computed these are steps
created purely for the purpose of
telling a story to humans about why this
integral came out the way it did and now
what we're seeing and so that's a so
that's one thing is knowing the answer
the other thing is being able to tell a
story about why the answer worked that
way
well what we see here is this is a proof
but it was an automatically generated
proof and it's a really lousy story for
us humans I mean if it turned out that
one of these theorems here was one that
had been proved by Gauss or something
and appeared in all the textbooks we
would be much happier because then we
would start to have a kind of human
representable story about what was going
on instead we just get a bunch of
machine generated lemmas that we can't
understand that we can't kind of wrap
our brains around and it's sort of the
same thing that's going on in when we
look at when these neural nets we're
seeing you know when we were looking
wherever it was at the innards of that
neuron that and we say well how is it
figuring out that that's a picture of a
panda well the answer is it decided that
you know if we humans were saying how
would you figure out if it's a picture
of panda we might say well look and see
if it has eyes that's a clue for whether
it's an animal look and see if it's
looks like it's kind of round and furry
and things that's a version of whether
it's a panda and Len cetera etcetera
etcetera but what it's doing is it
learnt a bunch of criteria for you know
is it a panda or is it one of 10,000
other possible things that it could have
recognized and it learnt those criteria
in a way that was somehow optimal based
on the training that it got and so on
but it learnt things which were
distinctions which are different from
the distinctions that we humans make in
the language that we as humans use and
so in some sense you know when we start
talking about will describe a picture we
have a certain human language for
describing that picture we have you know
in our human in typical human languages
we have maybe thirty to fifty thousand
words that we use to describe things
those words are words that have sort of
evolved as being useful for describing
the world that we live in
um when it comes to there's known that
it could be using it could say well that
the words that it is effectively learnt
which allow it to make distinctions
about what's going on in the in the
analysis that it's doing it has
effectively invented words that describe
distinctions but those words have
nothing to do with our historically
invented words that exist in our
languages so it's kind of an interesting
situation that that it is its way of
thinking so to speak if you say well
what's it thinking about how do we
describe what it's thinking that's a
tough thing to answer because just like
with the with the automated theorem
we're we're sort of stuck having to say
well we can't really tell a human story
because the things that it invented are
things for which we don't even have
words in our languages and so on okay so
one thing to realize is in this kind of
space of sort of all possible
computations there's a lot of stuff out
there that can be done there's this kind
of ocean of sophisticated computation
and then the question that we have to
ask for us humans is okay how do we make
use of all of that stuff so what we've
got kind of on the one hand is we've got
the things we know how to think about
human language is our way of describing
things our way of talking about stuff
that's the one one set of things the
other set of things we have is this very
powerful kind of seething ocean of
computation on the other side where lots
of things can happen so the question is
how do we make use of this sort of ocean
of computation in the best possible way
for our human purposes and building
technology and so on and so the the way
I see you know my kind of part of what
I've spent a very long time doing is
kind of building a language that allows
us to take human thinking on the one
hand and describe and sort of provide a
sort of computational communication
language that allows us to get the
benefit of what's possible over in the
sort of ocean of computation in a way
that's rooted in what we humans actually
want to do and so I kind of view both
from language as being sort of an
attempt to make a bridge between so you
on the one hand there's all possible
computations on the other hand there's
things we think we want to do and I view
or from language as being my best
attempt right now to make a way to take
our sort of human computational thinking
and be able to actually implement it so
in a sense it's a language which works
in two on two sides it's both a language
where you as a as a the machine can
understand okay it's it's looking at
this and that's what it's going to
compute but on the other hand it's also
a language for us humans to think about
things in computational terms so you
know if I go and I don't know one of
these one of these things that I'm doing
here whatever it is that this wasn't
that exciting but but you know fine
shortest tour of the Geo position of the
capital cities in South America that is
a language that's a representation in a
precise language of something and the
idea is that that's a language which we
humans can find useful in thinking about
things in computational terms it also
happens to be a language that the
machine can immediately understand and
execute and so I think this is sort of a
general you know when I think about AI
in general the you know what is the sort
of what's the overall problem well part
of the overall problem is so how do we
tell the AI is what to do so to speak
there's this very powerful you know this
sort of ocean of computation is what we
get to mine for purposes of building AI
kinds of things but then the question is
how do we tell the AI is what to do and
the what I see what I've tried to do
with Wolfram language is to provide a a
way of kind of accessing that
computation and sort of making use of
the knowledge that our civilization has
accumulated and because that's the you
know there's the general computation on
on this side and there's the specific
things that we humans have thought about
and the question is to make use of the
things that we've thought about to do do
things that we care about doing actually
if you're interested in these kinds of
things I happen to just write a blog
post where last couple of days ago it's
kind
the funny blog posts it's about some but
you can see the title there it came
because a friend of miners has this
crazy project to put little little sort
of discs or something that should
represent kind of the best achievements
of human civilization so to speak to
send out it's it's hitchhiking on
various spacecraft that are going out
into the solar system in the next little
while and the question is what to put on
this little disc that kind of represents
you know the achievements of
civilization it's kind of it's kind of
depressing when you go back and you look
at what some what people have tried to
do on this before and realizing how hard
it is to tell even whether something is
an artifact or not but this is this was
sort of a yeah that's a good one that's
from 11,000 years ago can you the
question is can you figure out what on
earth it is and what it means and and
this is but but so what what's relevant
about this is the this this whole
question of there are things that are
out there in the computational universe
and you know when we think about
extraterrestrial intelligence I find it
kind of interesting that artificial
intelligence is our first example of an
alien intelligence we don't happen to
have found what we view as
extraterrestrial intelligence right now
but we are in the process of building
pretty decent version of an alien
intelligence here and the question is if
you ask questions like well you know
what is it thinking is it does it have a
purpose and what it's doing and so on
and you're confronted with things like
this it's very we you can kind of do a
test run of you know what's what's its
purpose what is it trying to do in a way
that is very similar to the kinds of
questions you would ask about about
extraterrestrial intelligence but in
case the the that the main point is that
I see this sort of ocean of computation
there's the let's describe what we
actually want to do with that ocean of
computation and that's where you know
that's one of the primary problems we
have now people talk about you know AI
and what is AI going to allow us to
automate and my basic answer that would
be we'll be able to automate everything
that we can describe the problem is
it's not clear what we can describe or
put another way you know you imagine
various jobs and people are doing things
they're repeated judgment jobs things
like this there where we can readily
automate those things but the thing that
we can't really automate is saying well
what are we trying to do that is what
are our goals because in a sense when
when we see one of these systems you
know let's say let's say it's a cellular
tartan here okay the question is what is
this cellular automaton trying to do
maybe I can maybe I'll give you another
cellular automaton that is a little bit
more exciting here let's do this one so
that the the question is what is this
cellular automaton trying to do you know
it's got this whole big structure here
and things are happening with it we can
go we can run it for a couple thousand
steps we can ask it's a nice example of
kind of undecidability in action what's
going to happen here this is kind of the
halting problem is this going to halt
what's it going to do
there's computational irreducibility so
we actually can't tell this is the case
where we know this is a universal
computer in fact eventually well I don't
even spoil it for you if I went on long
enough it would it would go into some
kind of cycle but um we can ask what is
this thing trying to do what is it you
know is it what's it thinking about
what's its um
you know what's its goal what's its
purpose and you know we get very quickly
in a big mess thinking about those kinds
of things I've one of the things that
comes out of this principle of
computational equivalence is thinking
about what kinds of things have are
capable of sophisticated computation so
so I mentioned a while back here sort of
my personal history with Wolff malphur
of having thought about doing something
like wolf now for when I was a kid and
then believing that you sort of had to
build a brain to make that possible and
so on and one of the things that I the
Resume
Read
file updated 2026-02-13 13:23:28 UTC
Categories
Manage