File TXT tidak ditemukan.
Stephen Wolfram: Computational Universe | MIT 6.S099: Artificial General Intelligence (AGI)
P7kX7BuHSFI • 2018-03-02
Transcript preview
Open
Kind: captions Language: en welcome back to success $0.99 artificial general intelligence today we have Stephen Wolfram Wow that's the first I didn't even get started you're already clapping in his book a new kind of science he has explored and revealed the power beauty and complexity of cellular automata as simple computational systems for which incredible complexity can emerge it's actually one of the books that really inspired me to get into artificial intelligence he's created the Wolfram Alpha competition knowledge engine created Mathematica that has now expanded to become Wolfram language both he and his son were involved in helping analyze create the alien language from the movie arrival of which they use the Wolfram language please again gives Steven a warm welcome boy so I gather the brief here is to talk about how artificial general intelligence is going to be achieved is that they set the basic picture so I maybe I'm reminded of kind of a storage I don't think I've ever told in public but that something that happened just a few buildings over from here so this was 2009 and Wolfram Alpha was was about to arrive on the scene I assume most of you have used wolf now for a scene wolf alpha yes the how many of you've used wolf alpha ok that's good so I had long been a friend of Marvin Minsky's and Marvin was a sort of pioneer of the AI world and I kind of seen for years you know question answering systems that tried to do sort of general intelligence question answering and so at Marvin and so I was going to show Marvin you know Wolfram Alpha he looks at it and he's like okay that's fine whatever said no Marvin this time it actually works you can try real questions this is actually something useful this is not just a toy and it was kind of interesting to see it took took about five minutes for Marvin to realize that this was finally a to an answering system that could actually answer questions that were useful to people and so one question is how did we how do we achieve that so you know you go to Wolf's malphur and you can ask it I mean it's I don't know what we can ask it I don't know what's the some random question what is the population of Cambridge actually here's a question / let's try that what's the population of Cambridge is probably going to figure out that we mean Cambridge Massachusetts it's going to give us some number it's gonna give us some plot actually what I want to know is number of students at MIT divided by population of Cambridge see if it can figure that out and okay it's kind of interesting right oh no that's / ah that's interesting a guest that we were talking about Cambridge University as the as the denominator there so it says the number of students at MIT divided by the number of students at Cambridge University that's interesting I'm actually surprised let's see what happens if I say Cambridge MA there now as it probably fail horribly no that's that's good okay so no that's interesting that's a plot as a function of time of the fraction of the of okay so anyway so I'm glad it works the so one one question is how did we manage to get so that many things have to work in order to get stuff like this to work you have to be able to understand the natural language you have to have that data sources you have to be able to compute things from the data and so on one of the things that was a surprise to me was in terms of natural language understanding was the critical thing turned out to be just knowing a lot of stuff the actual pausing of the natural language is kind of I think it's kind of clever and we use a bunch of ideas that came from my new kind of science project and so on but I think the most important thing is just knowing a lot of stuff about the world is is really important to actually being able to to understand natural language in a useful situation I think the other thing is having actually having access to lots of data let me show you a typical example here of what is needed so I asked about the ISS and hopefully it'll wake up and tell us something here come on what's going on here there we go okay so it figured out that we probably are talking about a spacecraft not a file format and now it's going to give us a plot that shows us where the ISS is right now so to make this work we obviously have to have some feed of you know radar tracking data about satellites and so on which we have for every satellite that's that's out there but then that's not good enough to just have that feed then you also have to be able to do celestial mechanics to work out well where is the ISS actually right now based on the orbital elements that have been deduced from radar and then if we want to know things like okay when is it going to it's not currently visible from Boston Massachusetts it will next rise at 7:30 6:00 p.m. on Monday on today so you know this requires a mixture of data about what's going on in the world together with models about how the world is supposed to work being able to predict things and so on and I think another thing that kind of realized about about AI and so on from the wolfman alpha effort has been that you know one of the earlier ideas for how one would achieve AI was let's make it work kind of like brains do and let's make it figure stuff out and so if it has to do physics let's have it do physics by pure reasoning like you know people at least used to do physics but in the last 300 years we've had a different way to do physics that wasn't sort of based on natural philosophy it was instead based on things like mathematics and so one of the things that we were doing in in Wolfman alpha was to kind of cheat relative to what had been done in previous AI systems which was instead of using kind of reasoning type methods we're just saying okay we want to compute where the ISS is going to be well we've got a bunch of equations of motion that corresponds to differential equations we're just going to solve the equations of motion and get an answer that's kind of leveraging the last 300 years or so of of exact science that have been done rather than trying to make use of kind of human reasoning ideas and I might might say that in terms of the the history of the wolf malphur project when I was a kid a disgustingly a long time ago I was interested in AI kinds of things and I in fact I was kind of upset recently to find a bunch of stuff I did when I was 12 years old kind of trying to assemble a pre version of Wolfram Alpha way back before it was technologically possible but it's also a reminder that one just does the same thing once whole life so to speak at some level um but what happened was when when I am I started off working mainly in physics and then I got involved in building computer systems to do things like mathematical computation and so on and I then sort of got interested in okay so can we generalize this stuff and can we can we really make systems that can answer sort of arbitrary questions about the world and for example sort of the the the the promise would be if there's something that is systematically known in our civilization make it automatic to answer questions on the basis of that systematic knowledge and back in the in around late 1970s early 1980s my conclusion was if you want to do something like that the only realistic path to being able to do it was to build something much like a brain and so I got interested in neural nets and I tried to do things with neural nets back in 1980 and nothing very interesting happened well I couldn't get him to do anything very interesting and that um so I kind of had the idea that that the only way to get the kind of thing that now exists in alpha for example was to build a brain like thing and then many years later for reasons I can explain I kind of came back to this and realized actually it wasn't true that you had to build a brain like things sort of mere computation was sufficient and that was kind of what got me started actually trying to build Wolfram Alpha when we started building wolf malphur one of these I did was go to a sort of a field trip to a big reference library and you know you see all these shelves of books and so on and the question is can we take all of this knowledge that exists in all of these books and actually automate being able to answer questions on the base Javad and I think we've pretty much done that for that at least the books you find in a typical reference library so that was it looked kind of daunting at the beginning because it's this there's a lot of knowledge and information out there but actually it turns out there are a few thousand domains and we've steadily gone through and worked on these different domains another feature of the worth mouthful project was that we didn't really you know I've been involved a lot in doing basic science and in trying to have sort of grand theories of the world one of my principles in building Wolfram Alpha was not to start from a grand theory of the world that is not to kind of start from some global ontology of the world and then try and build down into all these different domains but instead to work up from having you know hundreds then thousands of domains that actually work whether they're you know information about cars or information about sports or information about movies or whatever else how each of these domains sort of building up from the bottom in each of these domains and then finding that there were common themes in these domains that we could then build into frameworks and then sort of construct the whole system on the basis of that and that's kind of that's kind of how its worked and I can talk about some of the actual frameworks that we end up using and so on but maybe I should explain a little bit more so so one question is how does how does Wolf's mouth actually sort of work inside and the answer is it's a big program it's about it's the core system is about 15 million lines of Wolfram language code and it's some number of terabytes of raw data and so the the way the thing that sort of made building wolf now for possible was this language wolf and language which started with Mathematica which came out in 1988 and has been sort of progressively growing since then so maybe I should show you some things about both language and and you know it's easy you can you know use this mit has a site license for it you can use it all over the places you can find it on the web but cetera etc etc but okay the basics work the let's let's start off with something like let's make a random graph and let's say we have a random graph with two hundred nodes 400 vertices okay so there's a random graph a first important thing about wolfing language is it's a symbolic language so I can just pick up this graph and I could say you know I don't do some analysis of this graph that graph is just a symbolic thing that I can just do computations on oh I could say let's let's get a another good thing to always do is get a current image see there we go and now I could go and say something like let's let's do some basic thing let's say let's edgy detect that image again this this image is just a a thing that we can manipulate we could take the image we could make it I don't know we could take the image and partition it little pieces do computations on that I don't know simple let's do let's just say sort each row of the image assemble the image again whoops assemble that image again we'll get some some mixed up picture there if I wanted to I could for example let's say let's make that the current image and let's say make that dynamic now I can be just running that code hopefully and little loop and there we can make that work so the you know one one general point here is there's you know this is just an image for us is just a piece of data like anything else if we just have a variable a thing called X it just says okay that's X I don't need to know particular value it's just a symbolic thing the corresponds to that's a thing called X now you know what gets interesting when you have a symbolic language and so on is we're interested in having it represent stuff about the world as well as just abstract kinds of things that many you know I can abstractly say you know find some funky integral I don't know what you know that's then representing using symbolic variables to represent algebraic kinds of things but I could also just say I don't know something like Boston and Boston is another kind of symbolic thing that has if I say what what is it really inside that's it's the today a City Boston Massachusetts United States actually noticed when I type that in I was using natural language to type it in and it gave me a bunch of disambiguation here it said assuming Boston is a city assuming Boston Massachusetts use Boston New York or okay there's let's use let's use Boston and the Philippines which I've never heard of but but um let's try using that instead and now if I look at that it'll say it's Boston in some province of the Philippines etc etc now I might ask it of that I could say something like what's the population of that and it um okay it's a fairly small place or I could say for example let me let me do this let me say a geo list plot from that Boston let's take from that Boston - and now let's type in Boston again and now let's have it used the default meaning of the word of Boston and then let's join those up and now this should plot this should show me a plot there we go okay so there's the path from the Boston that we picked in the Philippines to the Boston here oh we could ask you don't know I could just say I could ask it the distance from one to another or something like that so the the one of the things here one things we found really really useful actually in language was first of all there's a way of representing stuff about the world like cities for example or let's say I want to say let's let's do this let's say let's do something with cities let's say capital cities in South America okay so notice this is a piece of natural language this will get interpreted into something which is precise symbolic wolfram language code that we can then compute with and that will give us the citizens out the capital cities in South America I could for example let's say I say find shortest to US and I'm going to use some some oops no I don't want to do that what I want to do first is to say show me the geo positions of all those cities on line 21 there so now it will find the geo positions and now it will say compute the shortest tour so that's saying there's a 10,000 mile traveling salesman tour around those cities so I could take those cities were on line 21 and I could say order the cities according to this and then I could make another geo list plot of that join it up and this should now show us a traveling salesman tour of the of the capital cities in South America um so you know it's it's sort of interesting to see what's involved in making stuff like this work the one of you know my my goal has been to sort of automate as much as possible about things that have to be computed and that means knowing as many algorithms as possible and also knowing as much data about the world as possible and I kind of view this as sort of a knowledge-based programming approach where you have you know a typical kind of idea in programming languages is you know you have some small programming languages has a few primitives that are pretty much tied into what a machine can intrinsically do and then maybe you'll have libraries that add on to that and so on my kind of crazy idea of many many years ago has been to build an integrated system where all of the stuff about different domains of knowledge and so on are all just built into the system and and designed in a coherent way I mean this has been kind of the story of my life for the last thirty years is trying to keep the design of the system coherent even as one adds all sorts of different areas of of capability so as some I mean we can go and dive into all sorts of different kinds of things here but maybe as an example well let's do what could we do here we could take come let's try how about this is that a bone I think so that's a bone so let's try that as a mesh region see if that works so this will now use a completely different domain of human endeavor okay oops there's two of those bones let's try let's just try them let's try humorous let's try the that the mesh region for that and now we should have a bone here okay there's a there's a representation of a bone let's take that bone and we could for example say let's take the surface area of that as in some some units or I could let's do some much more outrageous thing let's say we take region distance so we're going to take the distance from some from that bone to a point let's say 0 0 Z and let's make a plot of that distance with Z going from let's say I don't have no idea where the where the spawn is but let's try something like this so that was really boring um let's try them so what this is doing again a whole bunch of stuff has to work in order for this to operate this has to be this is a this is some region in 3d space that's represented by some mesh you have to compute you know do the computational geometry to figure out where it is if I want it to let's try anatomy anatomy plot 3d and let's say something like left hand for example and now it's going to show us probably the complete data that it has about the geometry of the left hand there we go ok so there's there's the results and we could take that apart and start computing things from it and so on so what um so this this is some so there's a there's a lot of kind of computational knowledge that's built in here one let's talk a little bit about kind of the modern machine learning story so for instance if I say let's get a picture here let's say um let's let's just say picture of symbol got a favorite kind of animal what's Panda okay so let's try ok giant panda okay okay there's a panda let's see what now let's try saying um let's try for this panda let's try saying image identify and now here we'll be embarrassed probably but let's just see let's see what happens if I say image identify that and now it'll hopefully wake up wake up wake up this only takes a few hundred milliseconds okay very good giant panda let's let's see what it's we'll see what the runners-up were to the giant panda let's say we want to say the ten runners-up in all categories for that thing okay so a giant panda a prop here Ned which I've never heard of are pandas carniverous ate bamboo shoots okay so that was so lucky I didn't get that one it's really sure it's a mammal and it's absolutely certain it's a vertebrate okay so you might ask how did it figure this out and so then you can kind of look under the hood and say so we have a whole framework for representing neural nets symbolically and so this is the actual model that it's using to do this so this is a so there's a neural net and it's got we can drill down and we can see there's there's a piece of the neuron that we can drill down even further to one of these and we can probably see what that's a batch normalization layer somewhere deep deep inside the entrails of the not panda but of this thing okay so now let's take that object which is just a symbolic object and let's feed it the picture of the Panda and we can see and there oops I was not giving it the right thing what did I just do wrong here okay let's let's take our isolated okay let's take this thing and feed it the picture of the Panda and it says a giant panda okay how about we do something more outrageous let's take that neuron that and let's only use the first let's say 10 layers of the neuron that so let's just take out 10 layers of the neuron that's and feed it the Panda and now what we'll get is something from the insides of the neuron that and I could say for example let's just make those into images okay so that's what that's what the neuron that had figured out about the Panda after 10 layers of going through the neuron that and maybe actually be interesting to see let's do a feature space plots and now we're going to of those intermediate things in the sort of in the brain of the neuron that sort of speak this is now taking so what this is just doing is to do dimension reduction on this space of images and so it's not very exciting it's probably mostly distinguishing these by total gray level but that's kind of showing us the space of of different ton of different sort of features of the insides of the Shinra on that so it's also what's interesting to see here is things like the symbolic representation of the neuron that's and if you if you're wondering how does that hatch will work inside it's underneath it's using a max net which we happen to have contributed to a lot and there's sort of a bunch of symbolic layers on top of that that feed into that and maybe I can show you here let me show you how you would train one of these neural nets that's also kind of fun so we have a data repository that has all sorts of useful data one piece of data it has is a bunch of neuron that training sets so this is a standard emne straining set of handwritten digits okay so there's m missed and you notice that these things here that's just an image which i could copy out and i could do you know let's say I could do color negate on that image because it's just an image and there's there's the results and so on and now I could say let's take let's take a neuron that like let's take a simple neuron that like Linette for example okay so let's take Linette and then let's take the untrained initial evaluation Network so this is now a version of Linette simple standard neural nets that didn't get trained so for example if I if I take that that symbolic representation of Lynette and I could say net initialize then it will take that and it'll just put random weights into Lynette okay so if I take those random weights and I feed it a zero here I feed it that image of a zero it will presumably produce something completely random in this particular case - right so now now what I would like to do is to take this so that was just randomly initializing the weights so now what I'd like to do is to take the emne straining set and I'd like to actually train Lynette using MMS training set so let's take let's take this and let's take a random sample of let's say I don't know a thousand pieces of Lynette come on why is it having to load it again there we go okay so there's a there's a random sample there was on line 21 and now let me go down here and say where was it well look we can just take this this thing here so this is the uninitialized version of Lynette and we can say take that and then let's say net train of that with the thing on line 21 which was that thousand instances so now what it's doing is its running training on and that's you see the loss going down and so on it's running training for for those thousand instances of Lynette and it will we can stop it if we want to actually this is a new display this is very nice this is this is a new version of both languages is coming out next week which I'm showing you but it's quite similar to what exists today but because that's one of the features of running a software company is that you always run the the very latest version of things for better or worse and that's and this is also a good way to debug it because supposed to come out next week if I find some horrifying bug maybe it will get delayed but let's try them let's sum let's try this okay now it says it's zero okay and so so this is now a trained version of Lynette trained with that with that training data um one of the things so you know we can talk about all kinds of details of your mats and so on but maybe I should zoom out to talk a little bit about bigger picture as I see it so one question is sort of a question of what is in principle possible to do with computation so you know we have as we're you know we're building all kinds of things we're making image identifies we're figuring out those kinds of things about where the International Space Station is and so on question is what is what is in principle possible to compute and so the you know one of the places one can ask that question is when one looks at for example models of the natural world one can say you know how do we make models of the natural world kind of a a traditional approach has been let's use mathematical equations to make models of the natural world a question is if we want to kind of generalize that and say well what are all possible ways to make models of things what can we say about that question so I spent many years of my life trying to address that question and basically what what I've thought about a lot is that if you want to make a model of a thing you have to have definite rules by which the thing operates what's the most general way to represent possible rules well in today's world we think of that as a program so the next question is well what does the space of all possible programs look like and most of the time you know we're writing programs like Wolfen language is 50 million lines of code and it's a big complicated program that was for built for a fairly specific purpose but the question is if we just look at sort of the space of possible programs more or less at random what's out there in the space of possible program so I got an interest in many years ago in cellular automata which are a really good example of a very simple kind of program so let me show you an example of one of these so this is these are the rules for a typical cellular automaton and this just says you have a row of black and white squares and this just says you look at a black a look at a square say what color is that square what color left or it's left and right neighbors decide what color the square will be on the next step based on that rule okay so really simple rule so now let's let's take a look at what what actually happens if we use that rule a bunch of times so we can take that rule the 254 is just the binary digits that correspond to those positions in this rule so now I can say this I could say let's do 50 steps let me do this sum and now if I run according to the rule I just defined it turns out to be pretty trivial it's just saying if any if any square is if we start off with a black square if any square is if any neighboring square is black make a black square so we've we've used a very simple program we've got a very simple results out okay let's try a different program we can try changing this we'll get some that's a program with one bit different now we get that kind of pattern so the question is well what happens you might say okay if you've got such a trivial program it's not surprising you're just going to get Trevor a results out so but you can do an experiment to test that hypothesis you can just say let's take all possible programs there are 256 possible programs that are based on these eight bits here let's just take well let's just whoops let's just take come let's say the first 64 of those programs and let's just make a echo let's just make a table of the results that we get by running those first 64 programs here so here we get the result and what you see is well most of them are pretty trivially the lake they start off with one black cell in the middle and it just tools after one side occasionally we get something more exciting happening like here's a nice nested pattern that we get if we were to continue it longer it would it would make you know more detailed nesting but then my all-time favorite science discovery if you go on and just look at these after a while you find this one here which is rule 30 in this in this numbering scheme and that's doing something a bit more complicated you say well what's going on here you know we just started off with this very simple rule let's see what happens maybe after a while you know if we run rule 30 long enough it will resolve into something simpler so let's try running it let's say 500 steps and that's the whoops that's the result we get I'd say let's just make it fullscreen okay it's aliasing a bit on the projector there but but you get the basic idea this is a so this just started off from one black cell at the top and this is what it made and that's pretty weird because all this is you know this is sort of not the way it's supposed things are supposed to work because what we have here is just that little program down there and it makes this big complicated pattern here and you know we can see there's a certain amount of regularity on one side but for example the center column this pattern is for all practical purposes completely random in fact it was reused as a random number generator in Mathematica and Wolfram language for many years it was recently retired after after excellent service because we found a somewhat more efficient one um the but the so you know what do we learn from this what we learn from this is out in the computational universe of possible programs it's possible to get even with very simple programs very rich complicated behavior well that's important if you're interested in modeling the natural world because you might think that there are programs that represent systems in nature that might work this way and so on it's also important for technology because it says ok let's say you're trying to find a let's say you're trying to find a program that's a good random number generator how are you going to do that well you could start thinking very hard and you could try makeup you know you could try and write down all kinds of flowcharts about how this random number generator is going to work or you can say forget that I'm just going to search the computational universe for possible programs and just look for one that serves as a good random number generator in this particular case after you've searched 30 programs you'll find one that makes a good random number generator why does it work that's a complicated story it's not a story that I think necessarily we can really tell very well but what's important is that this is this idea that out in the computational universe there's a lot of rich sophisticated stuff that can be essentially mind for our technological purposes that's the important thing whether we understand how this works is a different matter I mean it's like when we look at the natural world the physical world were used to kind of mining things you know we started using magnets to do magnetic stuff long before we understand understood the theory of ferromagnetism and so on and so similarly here we can sort of go out into the computational universe and find stuff that's useful for our purposes now in fact the world of sort of deep learning and neural nets and so on is a little bit like this it uses the trick that there's a certain degree of differentiability there so you can kind of home in on let's try and find something that's incremental II better and for certain kinds of problems that works pretty well I think the thing that we've done a lot I've done a lot it's just sort of exhaustive search in the computational universe of possible programs just search of trillion programs and try and find one that does something interesting and useful for you um there's a lot of things to say about what well actually in in these search of trillion programs and find one that's useful let me show you another example of that um see so I was interested a while ago in the I have to look something up here sorry um in C in boolean algebra and in I was interested in in the space of all possible mathematic says um and let me just see here I I'm not finding what I wanted to find sorry I was a good example I should have memorized this but I haven't so um there we go there it is um so I was interested in if you just look at so we talked about sort of looking at the space of all possible the space of all possible programs another thing you can do is say if you're going to invent mathematics from nothing what possible axiom systems could be used in mathematics so I was curious where do and that again might seem like a completely crazy thing to do to just say let's just start enumerate axiom systems at random and see if we find one that's interesting and useful but it turns out once you have this idea that out in the computational universe or possible programs there's actually a lot of low-hanging fruit to be found it turns out you can apply that in lots of places I mean the thing to understand is why why do we not see a lot of engineering structures that look like this the reason is because our traditional model of engineering has been we engineer things in a way where we where we can foresee what the outcome of our engineering steps are going to be and when it comes to something like this we can find it out in the computational universe what we can't readily foresee what's going to happen we can't do sort of a step by step design of this particular thing and so in engineering and human engineering as it's been practiced so far most of it has consisted of building things where we can foresee step by step what the outcome of our engineering going to be and we see that in programs we see that in other kinds of engineering structures and so there's sort of a different kind of engineering which is about mining the computational universe of possible programs and it's worth realizing there's a lot more that can be done a lot more efficiently by mining the computational universe of possible programs than by just constructing things step by step as a human so for example if you look for optimal algorithms for things like I don't know even something like sorting networks the optimal sorting networks look very complicated they're not things that you would construct by sort of step-by-step thinking about things with in a kind of in a kind of typical human way and so this this idea you know if you're really going to have computation work efficiently you are going to end up with these programs that are sort of just mined from the computational universe and one of the issues with mining things so they're there this makes use of computation much more efficiently than a typical thing that we might construct now one feature of this is it's hard to understand what's going on and there's actually a fundamental reason for that which is in our efforts to sort of understand what's going on we get to use our brains our computers our mathematics or whatever and our goal is this this particular little program did a certain amount of computation to work out this pattern the question is can we kind of outrun that computation and say oh I can tell that actually this particular bit down here is going to be a black black bit you don't have to go and do all that computation but it turns out that then again this will maybe as a digression which which there's this phenomenon I call computational irreducibility which i think is really common and it's a consequence of this thing I call principle of computational equivalence and that the principle of computational equivalence basically says as soon as you have a system whose behavior isn't fairly easy to analyze the chances are that the computation it's doing is essentially as sophisticated as it could be and that has consequences like it implies that the typical thing like this will correspond to a universal computer that you can use to program anything it also has the consequence of this computational irreducibility phenomenon that says you can't expect our brains to be able to outrun the computations that are going on inside the system if there was computational reducibility then we can expect that this thing went to a lot of trouble and did a million steps of evolution but actually just by using our brains we can jump ahead and see what the answer will be computational irreducibility suggests that isn't the case if we're going to make the most efficient use of computational resources we will inevitably run into computational irreducibility all over the place it has the consequence that we get the situation where we can't readily sort of foresee and understand what's going to happen so back to mathematics for a second so this is just an axiom system that so I looked for all possible look through sort of all possible axiom systems starting off with very really tiny ones and I asked the question what's the first axiom system that corresponds to boolean algebra so it turns out this this thing here this tiny little thing here generates all theorems of boolean algebra it is that it is the simplest axiom for boolean algebra now something I have to show you this because it's a new feature you see they um if I say find equation or proof let's say I want to prove commutativity of the NAND operation I'm going to show you something here this is going to try to generate let's see if this works this is going to try to generate an automated proof based on that axiom system of that result so it had 102 steps in the proof and let's try and say let's look at for example the proof network here actually let's look at the proof data set um now that's not what I wanted I should learn how to use this shouldn't I um let's see what I want is the you know proof data set there we go very good ok so this is actually let's let's say first of all let's say the proof graph ok so this is going to show me the how that proof was done so they're a bunch of lemmas that got proved and from those lemmas those lemmas were combined and eventually it proved the result so let's let's take a look at the let's take a look at what some of those llamas were okay so here's the results so after so it goes through and these are various lemmas it's using and eventually after many pages of nonsense it will get to the result okay each one of these some of these llamas are kind of complicated there that's that's that llama it's a pretty complicated lemma etc etcetera etcetera so you might ask what on earth is going on here and the answer is so I first generated a version of this proof 20 years ago and I tried to understand what was going on and I completely failed and it's sort of embarrassing because this is supposed to be a proof it's supposed to be you know demonstrating some results and what we realize is that you know what does it mean to have a proof of something what does it mean to explain how a thing is done you know what is the purpose of a proof purpose of a proof is basically to let humans understand why something is true and so for example if you go to let's say we go to wolf now fur and we do you know some random thing where we say let's do you know an integral of something or another it will be able to very quickly in fact it will take it only milliseconds internally to work out the answer to that integral okay but then somebody whose wants to hand in a piece of homework or something like that needs to explain why is this true okay well we have this handy step-by-step solution thing here which explains why it's true now the thing I should admit about the step-by-step solution is it's completely fake that is the steps that are described in the step by step solution have absolutely nothing to do with the way that internally that integral was computed these are steps created purely for the purpose of telling a story to humans about why this integral came out the way it did and now what we're seeing and so that's a so that's one thing is knowing the answer the other thing is being able to tell a story about why the answer worked that way well what we see here is this is a proof but it was an automatically generated proof and it's a really lousy story for us humans I mean if it turned out that one of these theorems here was one that had been proved by Gauss or something and appeared in all the textbooks we would be much happier because then we would start to have a kind of human representable story about what was going on instead we just get a bunch of machine generated lemmas that we can't understand that we can't kind of wrap our brains around and it's sort of the same thing that's going on in when we look at when these neural nets we're seeing you know when we were looking wherever it was at the innards of that neuron that and we say well how is it figuring out that that's a picture of a panda well the answer is it decided that you know if we humans were saying how would you figure out if it's a picture of panda we might say well look and see if it has eyes that's a clue for whether it's an animal look and see if it's looks like it's kind of round and furry and things that's a version of whether it's a panda and Len cetera etcetera etcetera but what it's doing is it learnt a bunch of criteria for you know is it a panda or is it one of 10,000 other possible things that it could have recognized and it learnt those criteria in a way that was somehow optimal based on the training that it got and so on but it learnt things which were distinctions which are different from the distinctions that we humans make in the language that we as humans use and so in some sense you know when we start talking about will describe a picture we have a certain human language for describing that picture we have you know in our human in typical human languages we have maybe thirty to fifty thousand words that we use to describe things those words are words that have sort of evolved as being useful for describing the world that we live in um when it comes to there's known that it could be using it could say well that the words that it is effectively learnt which allow it to make distinctions about what's going on in the in the analysis that it's doing it has effectively invented words that describe distinctions but those words have nothing to do with our historically invented words that exist in our languages so it's kind of an interesting situation that that it is its way of thinking so to speak if you say well what's it thinking about how do we describe what it's thinking that's a tough thing to answer because just like with the with the automated theorem we're we're sort of stuck having to say well we can't really tell a human story because the things that it invented are things for which we don't even have words in our languages and so on okay so one thing to realize is in this kind of space of sort of all possible computations there's a lot of stuff out there that can be done there's this kind of ocean of sophisticated computation and then the question that we have to ask for us humans is okay how do we make use of all of that stuff so what we've got kind of on the one hand is we've got the things we know how to think about human language is our way of describing things our way of talking about stuff that's the one one set of things the other set of things we have is this very powerful kind of seething ocean of computation on the other side where lots of things can happen so the question is how do we make use of this sort of ocean of computation in the best possible way for our human purposes and building technology and so on and so the the way I see you know my kind of part of what I've spent a very long time doing is kind of building a language that allows us to take human thinking on the one hand and describe and sort of provide a sort of computational communication language that allows us to get the benefit of what's possible over in the sort of ocean of computation in a way that's rooted in what we humans actually want to do and so I kind of view both from language as being sort of an attempt to make a bridge between so you on the one hand there's all possible computations on the other hand there's things we think we want to do and I view or from language as being my best attempt right now to make a way to take our sort of human computational thinking and be able to actually implement it so in a sense it's a language which works in two on two sides it's both a language where you as a as a the machine can understand okay it's it's looking at this and that's what it's going to compute but on the other hand it's also a language for us humans to think about things in computational terms so you know if I go and I don't know one of these one of these things that I'm doing here whatever it is that this wasn't that exciting but but you know fine shortest tour of the Geo position of the capital cities in South America that is a language that's a representation in a precise language of something and the idea is that that's a language which we humans can find useful in thinking about things in computational terms it also happens to be a language that the machine can immediately understand and execute and so I think this is sort of a general you know when I think about AI in general the you know what is the sort of what's the overall problem well part of the overall problem is so how do we tell the AI is what to do so to speak there's this very powerful you know this sort of ocean of computation is what we get to mine for purposes of building AI kinds of things but then the question is how do we tell the AI is what to do and the what I see what I've tried to do with Wolfram language is to provide a a way of kind of accessing that computation and sort of making use of the knowledge that our civilization has accumulated and because that's the you know there's the general computation on on this side and there's the specific things that we humans have thought about and the question is to make use of the things that we've thought about to do do things that we care about doing actually if you're interested in these kinds of things I happen to just write a blog post where last couple of days ago it's kind the funny blog posts it's about some but you can see the title there it came because a friend of miners has this crazy project to put little little sort of discs or something that should represent kind of the best achievements of human civilization so to speak to send out it's it's hitchhiking on various spacecraft that are going out into the solar system in the next little while and the question is what to put on this little disc that kind of represents you know the achievements of civilization it's kind of it's kind of depressing when you go back and you look at what some what people have tried to do on this before and realizing how hard it is to tell even whether something is an artifact or not but this is this was sort of a yeah that's a good one that's from 11,000 years ago can you the question is can you figure out what on earth it is and what it means and and this is but but so what what's relevant about this is the this this whole question of there are things that are out there in the computational universe and you know when we think about extraterrestrial intelligence I find it kind of interesting that artificial intelligence is our first example of an alien intelligence we don't happen to have found what we view as extraterrestrial intelligence right now but we are in the process of building pretty decent version of an alien intelligence here and the question is if you ask questions like well you know what is it thinking is it does it have a purpose and what it's doing and so on and you're confronted with things like this it's very we you can kind of do a test run of you know what's what's its purpose what is it trying to do in a way that is very similar to the kinds of questions you would ask about about extraterrestrial intelligence but in case the the that the main point is that I see this sort of ocean of computation there's the let's describe what we actually want to do with that ocean of computation and that's where you know that's one of the primary problems we have now people talk about you know AI and what is AI going to allow us to automate and my basic answer that would be we'll be able to automate everything that we can describe the problem is it's not clear what we can describe or put another way you know you imagine various jobs and people are doing things they're repeated judgment jobs things like this there where we can readily automate those things but the thing that we can't really automate is saying well what are we trying to do that is what are our goals because in a sense when when we see one of these systems you know let's say let's say it's a cellular tartan here okay the question is what is this cellular automaton trying to do maybe I can maybe I'll give you another cellular automaton that is a little bit more exciting here let's do this one so that the the question is what is this cellular automaton trying to do you know it's got this whole big structure here and things are happening with it we can go we can run it for a couple thousand steps we can ask it's a nice example of kind of undecidability in action what's going to happen here this is kind of the halting problem is this going to halt what's it going to do there's computational irreducibility so we actually can't tell this is the case where we know this is a universal computer in fact eventually well I don't even spoil it for you if I went on long enough it would it would go into some kind of cycle but um we can ask what is this thing trying to do what is it you know is it what's it thinking about what's its um you know what's its goal what's its purpose and you know we get very quickly in a big mess thinking about those kinds of things I've one of the things that comes out of this principle of computational equivalence is thinking about what kinds of things have are capable of sophisticated computation so so I mentioned a while back here sort of my personal history with Wolff malphur of having thought about doing something like wolf now for when I was a kid and then believing that you sort of had to build a brain to make that possible and so on and one of the things that I the
Resume
Categories