Kind: captions Language: en the following is a conversation with jurgen schmidhuber he's the co-director of a CSA a lab and a co-creator of long short term memory networks LS TMS are used in billions of devices today for speech recognition translation and much more over 30 years he has proposed a lot of interesting out-of-the-box ideas a meta learning adversarial networks computer vision and even a formal theory of quote creativity curiosity and fun this conversation is part of the MIT course and artificial general intelligence and the artificial intelligence podcast if you enjoy it subscribe on youtube itunes or simply connect with me on twitter at Lex Friedman spelled Fri D and now here's my conversation with jurgen schmidhuber early on you dreamed of AI systems that self-improve recursively when was that dream born when I was a baby no it's not true I mean it was a teenager and what was the catalyst for that birth what was the thing that first inspired you when I was a boy I'm I was thinking about what to do in my life and then I thought the most exciting thing is to solve the riddles of the universe and and that means you have to become a physicist however then I realized that there's something even grander you can try to build a machine that isn't really a machine any longer that learns to become a much better physicist than I could ever hope to be and that's how I thought maybe I can multiply my tiny little bit of creativity into infinity but ultimately that creativity will be multiplied to understand the universe around us that's that's the the curiosity for that mystery that that drove you yes so if you can build a machine that learns to solve more and more complex problems and more and more general problems older then you basically have solved all the problems at least all the solvable problems so how do you think what is the mechanism for that kind of general solver look like obviously we don't quite yet have one or know how to build one who have ideas and you have had throughout your career several ideas about it so how do you think about that mechanism so in the 80s I thought about how to build this machine that learns to solve all these problems I cannot solve myself and I thought it is clear that has to be a machine that not only learns to solve this problem here and problem here but it also has to learn to improve the learning algorithm itself so it has to have the learning algorithm in a representation that allows it to inspect it and modify it such that it can come up with a better learning algorithm so I call that meta learning learning to learn and recursive self-improvement that is really the pinnacle of that why you then not only alarm how to improve on that problem and on that but you also improve the way the machine improves and you also improve the way it improves the way it improves itself and that was my 1987 diploma thesis which was all about that hierarchy of metal or knows that I have no computational limits except for the well known limits that Google identified in 1931 and for the limits our physics in the recent years meta learning has gained popularity in a in a specific kind of form you've talked about how that's not really meta learning with Newall networks that's more basic transfer learning can you talk about the difference between the big general meta learning and a more narrow sense of meta learning the way it's used today the ways talked about today let's take the example of a deep neural networks that has learnt to classify images and maybe you have trained that network on 100 different databases of images and now a new database comes along and you want to quickly learn the new thing as well so one simple way of doing that as you take the network which already knows 100 types of databases and then you would just take the top layer of that and you retrain that using the new label data that you have in the new image database and then it turns out that it really really quickly can learn that to one shot basically because from the first 100 data sets it already has learned so much about about computer vision that it can reuse that and that is then almost good enough to solve the new task except you need a little bit of adjustment on the top so that is transfer learning and it has been done in principle for many decades people have done similar things for decades meta-learning true mental learning is about having the learning algorithm itself open to introspection by the system that is using it and also open to modification such that the learning system has an opportunity to modify any part of the learning algorithm and then evaluate the consequences of that modification and then learn from that to create a better learning algorithm and so on recursively so that's a very different animal where you are opening the space of possible learning algorithms to the learning system itself right so you've like in this 2004 paper you described get all machines and programs that we write themselves yeah right philosophically and even in your paper mathematically these are really compelling ideas but practically do you see these self referential programs being successful in the near term to having an impact where sort of a demonstrates to the world that this direction is a is a good one to pursue in the near term yes we had these two different types of fundamental research how to build a universal problem solver one basically exploiting [Music] proof search and things like that that you need to come up with asymptotic Liam optimal theoretically optimal self-improvement and problems all of us however one has to admit that through this proof search comes in an additive constant an overhead an additive overhead that vanishes in comparison to what you have to do to solve large problems however for many of the small problems that we want to solve in our everyday life we cannot ignore this constant overhead and that's why we also have been doing other things non universal things such as recurrent neural networks which are trained by gradient descent and local search techniques which aren't universal at all which aren't provably optimal at all like the other stuff that we did but which are much more practical as long as we only want to solve the small problems that we are typically trying to solve in this environment here yes so the universal problem solvers like the girdle machine but also Markos who does fastest way of solving all possible problems which he developed around 2012 - in my lab they are associated with these constant overheads for proof search which guarantee is that the thing that you're doing is optimal for example there is this fastest way of solving all problems with a computable solution which is due to Marcus Marcus jota and to explain what's going on there let's take traveling salesman problems with traveling salesman problems you have a number of cities in cities and you try to find the shortest path through all these cities without visiting any city twice and nobody know is the fastest way of solving Traveling Salesman problems tsps but let's assume there is a method of solving them within n to the 5 operations where n is the number of cities then the universal method of Marcus is going to solve the same trolley salesman problem also within n to the 5 steps plus o of 1 plus a constant number of steps that you need for the proof searcher which you need to show that this particular class of problems that Traveling Salesman salesman problems can be solved within a certain time bound within order into the five steps basically and this additive constant doesn't care for in which means as n is getting larger and larger as you have more and more cities the constant overhead pales in comparison and that means that almost all large problems I solved in the best possible way our way today we already have a universal problem solver like sound however it's not practical because the overhead the constant overhead is so large that for the small kinds of problems that we want to solve in this little biosphere by the way when you say small you're talking about things that fall within the constraints of our computational systems thinking they can seem quite large to us mere humans right that's right yeah so they seem large and even unsolvable in a practical sense today but they are still small compared to almost all problems because almost all problems are large problems which are much larger than any constant do you find it useful as a person who is dreamed of creating a general learning system has worked on creating one has done a lot of interesting ideas there to think about P versus NP this formalization of how hard problems are how they scale this kind of worst-case analysis type of thinking do you find that useful or is it only just a mathematical it's a set of mathematical techniques to give you intuition about what's good and bad mm-hmm so P versus NP that's super interesting from a theoretical point of view and in fact as you are thinking about that problem you can also get inspiration for better practical problems always on the other hand we have to admit that at the moment as he best practical problem solvers for all kinds of problems that we are now solving through what is called AI at the moment they are not of the kind that is inspired by these questions you know there we are using general-purpose computers such as recurrent neural networks but we have a search technique which is just local search gradient descent to try to find a program that is running on these recurrent networks such that it can or some interesting problems such as speech recognition machine translation and something like that and there is very little theory behind the best solutions that we have at the moment that can do that do you think that needs to change you think that world change or can we go can we create a general intelligence systems without ever really proving that that system is intelligent in some kind of mathematical way solving machine translation perfectly or something like that within some kind of syntactic definition of a language or can we just be super impressed by the thing working extremely well and that's sufficient there's an old saying and I don't know who brought it up first which says there's nothing more practical than a good theory and um yeah and a good theory of problem-solving under limited resources like here in this universe or on this little planet has to take into account these limited resources and so probably that is locking a theory in which is related to what we already have sees a sim totally optimal comes almost which which tells us what we need in addition to that to come up with a practically optimal problem so long so I believe we will have something like that and maybe just a few little tiny twists unnecessary to to change what we already have to come up with that as well as long as we don't have that we mmm admit that we are taking sub optimal ways and we can y'all not Verizon long shorter memory for equipped with local search techniques and we are happy that it works better than any competing method but that doesn't mean that we we think we are done you've said that an AGI system will ultimately be a simple one a general intelligent system will ultimately be a simple one maybe a pseudocode of a few lines to be able to describe it can you talk through your intuition behind this idea why you feel that uh at its core intelligence is a simple algorithm experience tells us that this stuff that works best is really simple so see asymptotic team optimal ways of solving problems if you look at them and just a few lines of code it's really true although they are these amazing properties just a few lines of code then the most promising and most useful practical things maybe don't have this proof of optimality associated with them however they are so just a few lines of code the most successful mmm we can neural networks you can write them down and five lines of pseudocode that's a beautiful almost poetic idea but what you're describing there is this the lines of pseudocode are sitting on top of layers and layers abstractions in a sense hmm so you're saying at the very top mmm you'll be a beautifully written sort of algorithm but do you think that there's many layers of abstractions we have to first learn to construct yeah of course we are building on all these great abstractions that people have invented over the millennia such as matrix multiplications and real numbers and basic arithmetic and calculus and derivations of error functions and derivatives of error functions and stuff like that so without that language that greatly simplifies our way our thinking about these problems we couldn't do anything so in that sense as always we are standing on the shoulders of the Giants who in the past simplified the problem of problem solving so much that now we have a chance to do the final step the final step will be a simple one oh if we if you take a step back through all of human civilization in just the universe in check how do you think about evolution and what if creating a universe is required to achieve this final step what if going through the very painful and an inefficient process of evolution is needed to come up with this set of abstractions that ultimately to intelligence do you think there's a shortcut or do you think we have to create something like our universe in order to create something like human level intelligence hmm so far the only example we have is this one this universe and you live you better maybe not but we are part of this whole process right so apparently so it might be the key is that the code that runs the universe as really really simple everything points to that possibility because gravity and other basic forces are really simple laws that can be easily described also in just a few lines of code basically and and then there are these other events that the apparently random events in the history of the universe which as far as we know at the moment don't have a compact code but who knows maybe somebody and the near future is going to figure out the pseudo-random generator which is which is computing whether the measurement of that spin up or down thing here is going to be positive or negative underlying quantum mechanics yes so you ultimately think quantum mechanics is a pseudo-random number generator monistic there's no randomness in our universe does God play dice so a couple of years ago a famous physicist quantum physicist Anton Zeilinger he wrote an essay in nature and it started more or less like that one of the fundamental insights our theme of the 20th century was that the universe is fundamentally random on the quantum level and that whenever you measure spin up or down or something like that a new bit of information enters the history of the universe and while I was reading that I was already typing the responds and they had to publish it because I was right that there's no evidence no physical evidence for that so there's an alternative explanation where everything that we consider random is actually pseudo-random such as the decimal expansion of pi supply is interesting because every three-digit sequence every sequence of three digits appears roughly one in a thousand times and every five digit sequence appears roughly one in ten thousand times what do you really would expect if it was run random but there's a very short algorithm short program that computes all of that so it's extremely compressible and who knows maybe tomorrow somebody some grad student at CERN goes back over all these data points better decay and whatever and figures out oh it's the second billion digits of pi or something like that we don't have any fundamental reason at the moment to believe that this is truly random and not just a deterministic video game if it was a deterministic video game it would be much more beautiful because beauty is simplicity and many of the basic laws of the universe like gravity and the other basic forces are very simple so very short programs can explain what these are doing and and it would be awful and ugly the universe would be ugly the history of the universe would be ugly if for the extra things the random the seemingly random data points that we get all the time that we really need a huge number of extra bits to destroy all these um these extra bits of information so as long as we don't have evidence that there is no short program that computes the entire history of the entire universe we are a scientists compelled to look further for that Swiss program your intuition says there exists a shortest a program that can backtrack to the to the creation of the universe so the shortest path to the creation yes including all the entanglement things and all the spin up-and-down measurements that have been taken place since 13.8 billion years ago and so yeah so we don't have a proof that it is random we don't have a proof of that it is compressible to a short program but as long as we don't have that proof we are obliged as scientists to keep looking for that simple explanation absolutely so you said simplicity is beautiful or beauty is simple either one works but you also work on curiosity discovery you know the romantic notion of randomness of serendipity of being surprised by things that are about you kind of in our poetic notion of reality we think as humans require randomness so you don't find randomness beautiful you use you find simple determinism beautiful yeah okay so why why because the explanation becomes shorter a universe that is compressible to a short program is much more elegant and much more beautiful than another one which needs an almost infinite number of bits to be described as far as we know many things that are happening in this universe are really simple in terms are from short programs that compute gravity and the interaction between elementary particles and so on so all of that seems to be very very simple every electron seems to reuse the same sub program all the time as it is interacting with other elementary particles if we now require an extra Oracle injecting new bits of information all the time for these extra things which are currently no understood such as better decay then the whole description length our data that we can observe out of the history of the universe would become much longer and therefore uglier and uglier again the simplicity is elegant and beautiful all the history of science is a history of compression progress yes so you've described sort of as we build up abstractions and you've talked about the idea of compression how do you see this the history of science the history of humanity our civilization and life on earth as some kind of path towards greater and greater compression what do you mean by there how do you think of that indeed the history of science is a history of compression progress what does that mean hundreds of years ago there was an astronomer whose name was Keppler and he looked at the data points that he got by watching planets move and then he had all these data points and suddenly turnouts that he can greatly compress the data by predicting it through an ellipse law so it turns out that all these data points are more or less on ellipses around the Sun and another guy came along whose name was Newton and before him hook and they said the same thing that is making these planets move like that is what makes the apples fall down and it also holds form stones and for all kinds of other objects and suddenly many many of these compression of these observations became much more compressible because as long as you can predict the next thing given what you have seen so far you can compress it you don't have to store that data extra this is called predict coding and then there was still something wrong with that theory of the universe and you had deviations from these predictions of the theory and 300 years later another guy came along whose name was Einstein and he he was able to explain away all these deviations from the predictions of the old theory through a new theory which was called the general theory of relativity which at first glance looks a little bit more complicated and you have to warp space and time but you can't phrase it within one single sentence which is no matter how fast you accelerate and how fast are hard you decelerate and no matter what is the gravity in your local framework Lightspeed always looks the same and from from that you can calculate all the consequences so it's a very simple thing and it allows you to further compress all the observations because suddenly there are hardly any deviations any longer that you can measure from the predictions of this new theory so all of science is a history of compression progress you never arrive immediately at the shortest explanation of the data but you're making progress whenever you are making progress you have an insight you see all first I needed so many bits of information to describe the data to describe my falling apples my video are falling apples I need so many data so many pixels have to be stored but then suddenly I realize no there is a very simple way of predicting the third frame in the video from the first tool and and maybe not every little detail can be predicted but more or less most of these orange blocks blobs that are coming down they accelerate in the same way which means that I can greatly compress the video and the amount of compression progress that is the depth of the insight that you have at that moment that's the fun that you have the Scientific fun that fun in that discovery and we can build artificial systems that do the same thing they measure the depth of their insights as they are looking at the data which is coming in through their own experiments and we give them a reward an intrinsic reward and proportion to this depth of insight and since they are trying to maximize the rewards they get they are suddenly motivated to come up with new action sequences with new experiments that have the property that the data that is coming in as a consequence are these experiments has the property that they can learn something about see a pattern in there which they hadn't seen yet before so there's an idea of power play you've described a training general problem solver in this kind of way of looking for the unsolved problems yeah can you describe that idea a little further it's another very simple idea so normally what you do in computer science you have you have some guy who gives you a problem and then there is a huge search space of potential solution candidates and you somehow try them out and you have more less sophisticated ways of moving around in that search space until you finally found a solution which you consider satisfactory that's what most of computer science is about power play just goes one little step further and says let's not only search for solutions to a given problem but let's search two pairs of problems and their solutions where the system itself has the opportunity to phrase its own problem so we are looking suddenly at pairs of problems and their solutions or modifications are the problems over that is supposed to generate a solution to that new problem and and this additional degree of freedom allows us to build Korea systems that are like scientists in the sense that they not only try to solve and try to find answers to existing questions no they are also free to impose their own questions so if you want to build an artificial scientist we have to give it that freedom and power play is exactly doing that so that's that's a dimension of freedom that's important to have but how do you are hardly you think that how multi-dimensional and difficult the space of them coming up in your questions is yeah so as as it's one of the things that as human beings we consider to be the thing that makes us special the intelligence that makes us special is that brilliant insight yeah that can create something totally new yes so now let's look at the extreme case let's look at the set of all possible problems that you can formally describe which is infinite which should be the next problem that a scientist or power-play is going to solve well it should be the easiest problem that goes beyond what you already know so it should be the simplest problem that the current problems all of that you have which can already sold 100 problems that he cannot solve yet by just generalizing so it has to be new so it has to require a modification of the problem solver such that the new problem solver can solve this new thing but the old problem solver cannot do it and in addition to that we have to make sure that the problem solver doesn't forget any of the previous solutions right and so by definition power play is now trying always to search and this pair of in in the set of pairs of problems and problems over modifications for a combination that minimize the time to achieve these criteria so as always trying to find the problem which is easiest to add to the repertoire so just like grad students and academics and researchers can spend the whole career in a local minima hmm stuck trying to come up with interesting questions but ultimately doing very little do you think it's easy well in this approach of looking for the simplest unsolvable problem to get stuck in a local minima is not never really discovering new you know really jumping outside of the hundred problems the very solved in a genuine creative way no because that's the nature of power play that it's always trying to break its current generalization abilities by coming up with a new problem which is beyond the current horizon just shifting the horizon of knowledge a little bit out there breaking the existing rules search says the new thing becomes solvable but wasn't solvable by the old thing so like adding a new axiom like what Google did when he came up with these new sentences new theorems that didn't have a proof in the phone system which means you can add them to the repertoire hoping that that they are not going to damage the consistency of the whole thing so in the paper with the amazing title formal theory of creativity fun in intrinsic motivation you talk about discovery as intrinsic reward so if you view humans as intelligent agents what do you think is the purpose and meaning of life far as humans is you've talked about this discovery do you see humans as an instance of power play agents yeah so humans are curious and that means they behave like scientists not only the official scientists but even the babies behave like scientists and they play around with toys to figure out how the world works and how it is responding to their actions and that's how they learn about gravity and everything and yeah in 1990 we had the first systems like the hand would just try to to play around with the environment and come up with situations that go beyond what they knew at that time and then get a reward for creating these situations and then becoming more general problem solvers and being able to understand more of the world so yeah I think in principle that that that curiosity strategy or sophisticated versions of whether chess is quiet they are what we have built-in as well because evolution discovered that's a good way of exploring the unknown world and a guy who explores the unknown world has a higher chance of solving problems that he needs to survive in this world on the other hand those guys who were too curious they were weeded out as well so you have to find this trade-off evolution found a certain trade-off apparently in our society there are as a certain percentage of extremely exploitive guy and it doesn't matter if they die because many of the others are more conservative and and and so yeah it would be surprising to me if if that principle of artificial curiosity wouldn't be present and almost exactly the same form here in our brains so you're a bit of a musician and an artist so continuing on this topic of creativity what do you think is the role of creativity and intelligence so you've kind of implied that it's essential for intelligence if you think of intelligence as a problem-solving system as ability to solve problems but do you think it's essential this idea of creativity we never have a program a sub program that is called creativity or something it's just a side effect of when our problem solvers do they are searching a space of problems or a space of candidates of solution candidates until they hopefully find a solution to have given from them but then there are these two types of creativity and both of them are now present in our machines the first one has been around for a long time which is human gives problem to machine machine tries to find a solution to that and this has been happening for many decades and for many decades machines have found creative solutions to interesting problems where humans were not aware of these particularly in creative solutions but then appreciated that the machine found that the second is the pure creativity that I would call what I just mentioned I would call the applied creativity like applied art where somebody tells you now make a nice picture off of this Pope and you will get money for that okay so here is the artist and he makes a convincing picture of the Pope and the Pope likes it and gives him the money and then there is the pure creative creativity which is more like the power play and the artificial curiosity thing where you have the freedom to select your own problem like a scientist who defines his own question to study and so that is the pure creativity of UL and opposed to the applied creativity which serves another and in that distinction there's almost echoes of narrow AI versus general AI so this kind of constrained painting of a pope seems like the the approaches of what people are calling narrow AI and pure creativity seems to be maybe I'm just biased as a human but it seems to be an essential element of human level intelligence is that what you're implying to a degree if you zoom back a little bit and you just look at a general problem-solving machine which is trying to solve arbitrary problems then this machine will figure out in the course of solving problems that it's good to be curious so all of what I said just now about this prewired curiosity and this will to invent new problems that the system doesn't know how to solve yet should be just a byproduct of the general search however apparently evolution has built it into us because it turned out to be so successful a pre-wiring a buyer's a very successful exploratory buyers that that we are born with and you've also said that consciousness in the same kind of way may be a byproduct of problem-solving you know do you think do you find it's an interesting by-product you think it's a useful by-product what are your thoughts on consciousness in general or is it simply a byproduct of greater and greater capabilities of problem-solving that's that's similar to creativity in that sense yeah we never have a procedure called consciousness in our machines however we get as side effects of what these machines are doing things that seem to be closely related to what people call consciousness so for example in 1990 we had simple systems which were basically recurrent networks and therefore universal computers trying to map incoming data into actions that lead to success maximizing reward in a given environment always finding the charging station in time whenever the battery's low and negative signals are coming from the battery always finds the charging station in time without bumping against painful obstacles on the way so complicated things but very easily motivated and then we give these little a separate we can all network which is just predicting what's happening if I do that in that what will happen as a consequence of these actions that I'm executing and it's just trained on the long and long history of interactions with the world so it becomes a predictive model loss of art basically and therefore also a compressor our theme observations after what because whatever you can predict you don't have to store extras or compression is a side effect of prediction and how does this record Network impress well it's inventing little sub programs little sub Network networks that stand for everything that frequently appears in the environment like bottles and microphones and faces maybe lots of faces in my environment so I'm learning to create something like a prototype face and a new face comes along and all I have to encode are the deviations from the prototype so it's compressing all the time the stuff that frequently appears there's one thing that appears all the time that is present all the time when the agent is interacting with its environment which is the agent itself so just for data compression reasons it is extremely natural for this we can network to come up with little sub networks that stand for the properties of the agents the hand you know the the other actuators and all the stuff that you need to better encode the data which is influenced by the actions of the agent so they're just as a side effect of data compression during problem-solving you have inter myself models now you can use this model of the world to plan your future and that's what yours have done since 1990 so the recurrent Network which is the controller which is trying to maximize reward can use this model as a network of the what is this model network as a wild this predictive model of the world to plan ahead and say let's not do this action sequence let's do this action sequence instead because it leads to more predictor to rewards and whenever it's waking up these layers of networks let's stand for itself and it's thinking about itself and it's thinking about itself and it's exploring mentally the consequences of its own actions and and now you tell me what is still missing missing the next the gap to consciousness yeah hi there there isn't that's a really beautiful idea that you know if life is a collection of data and in life is a process of compressing that data to act efficiently you in that data you yourself appear very often so it's useful to form compressions of yourself and it's a really beautiful formulation of what consciousness is a necessary side-effect it's actually quite compelling to me you've described our nen's developed LST aims long short-term memory networks the there type of recurrent neural networks they have gotten a lot of success recently so these are networks that model the temporal aspects in the data temporal patterns in the data and you've called them the deepest of the Newell networks right so what do you think is the value of depth in the models that we use to learn since you mentioned the long short-term memory and the lsdm I have to mention the names of the brilliant students of course that's worse first of all and my first student ever set for writer who had fundamental insights already in this diploma thesis then Felix Kias had additional important contributions Alex gray is a guy from Scotland who is mostly responsible for this CTC algorithm which is now often used to to train the Alice TM to do the speech recognition on all the Google Android phones and whatever and Siri and so on so these guys without these guys I would be nothing it's a lot of incredible work what is now the depth what is the importance of depth well most problems in the real world are deep in the sense that the current input doesn't tell you all you need to know about the environment mm-hmm so instead you have to have a memory of what happened in the past and often important parts of that memory are dated they are pretty old and so when you're doing speech recognition for example and somebody says eleven then that's about half a second or something like that which means it's already fifty-eight time steps and another guy or the same guy says seven so the ending is the same Evan but now the system has to see the distinction between seven and eleven and the only way I can see the differences it has to store that fifty steps ago there wasn't or a nerve eleven or seven so there you have already a problem of depth fifty because for each time step you have something like a virtual a layer and the expanded unrolled version of this Riccar network which is doing the speech recognition so these long time lags they translate into problem depth and most problems and this world Asajj that you really have to look far back in time to understand what is the problem and to solvent but just like with our CMS you don't necessarily need to when you look back in time remember every aspect you just need to remember the important aspects that's right the network has to learn to put the important stuff in into memory and to ignore the unimportant noise so but in that sense deeper and deeper is better or is there a limitation is is there I mean LCM is one of the great examples of architectures that do something beyond just deeper and deeper networks there's clever mechanisms for filtering data for remembering and forgetting so do you think that that kind of thinking is necessary if you think about LCM is a leap a big leap forward over traditional vanilla are nuns what do you think is the next leap hmm it within this context so LCM is a very clever improvement but LCM still don't have the same kind of ability to see far back in the future in the in the past as us humans do the credit assignment problem across way back not just 50 times steps or a hundred or a thousand but millions and billions it's not clear what are the practical limits of the lsdm when it comes to looking back already in 2006 I think we had examples where it not only looked back tens of thousands of steps but really millions of steps and who won Paris artists in my lab I think was the first author of a paper where we really was a 2006 or something had examples word learn to look back for more than 10 million steps so for most problems of speech recognition it's not necessary to look that far back but there are examples where it does now so looking back thing [Music] that's rather easy because there is only one past but there are many possible futures and so a reinforcement learning system which is trying to maximize its future expected rewards and doesn't know yet which of these many possible future should I select given this one single past it's facing problems that the LCN by itself cannot solve so the other sim is good for coming up with a compact representation of the history so far of the history and observations in action so far but now how do you plan in an efficient and good way among all these how do you select one of these many possible action sequences that a reinforcement learning system has to consider to maximize reward in this unknown future so again it behaves this basic setup where you have one week on network which gets in the video and the speech and whatever and it's executing actions and is trying to maximize reward so there is no teacher who tells it what to do at which point in time and then there's the other network which is just predicting what's going to happen if I do that then and that could be an LCM Network and it allows to look back all the way to make better predictions of the next time step so essentially although it's men predicting only the next time step it is motivated to learn to put into memory something that happened maybe a million steps ago because it's important to memorize that if you want to predict that at the next time step the next event you know how can a model of the world like that a predictive model of the world be used by the first guy let's call it the controller and the model the controller and the model how can the model be used by the controller to efficiently select among these many possible futures so naive way we had about 30 years ago was let's just use the model of the world as a stand-in as a simulation of the wall and millisecond by millisecond we planned the future and that means we have to roll it out really in detail and it will work only as the model is really good and it will still be inefficient because we have to look at all these possible futures and and there are so many of them so instead what we do now since 2015 and our cm systems controller model systems we give the controller the opportunity to learn by itself how to use the potentially relevant parts of the M of the model network to solve new problems more quickly and if it wants to it can learn to ignore the M and sometimes it's a good idea to ignore the the M because it's really bad it's a bad predictor in this particular situation of life where the control is currently trying to maximize r1 however it can also allow and to address and exploit some of the sub programs that came about in the model network through compressing the data by predicting it so it now has an opportunity to reuse that code the ethnic information in the modern are trying to reduce its own search space such that it can solve a new problem more quickly than without the model compression so you're ultimately optimistic and excited about the power of ära of reinforcement learning in the context of real systems absolutely yeah so you see RL as a potential having a huge impact beyond just sort of the M part is often develop on supervised learning methods you see RL as a four problems of cell traffic cars or any kind of applied cyber BOTS X that's the correct interesting direction for research in your view I do think so we have a company called Mason's Mason's which has applied to enforcement learning to little Howdy's there are DS which learn to park without a teacher the same principles were used of course so these little Audi's they are small maybe like that so I'm much smaller than the real Howdy's but they have all the sensors that you find the real howdy is you find the cameras that lead on sensors they go up to 120 20 kilometres an hour if you if they want to and and they are from pain sensors basically and they don't want to bump against obstacles and other Howdy's and so they must learn like little babies to a park take the wrong vision input and translate that into actions that lead to successful packing behavior which is a rewarding thing and yes they learn that they are salt we have examples like that and it's only in the beginning this is just the tip of the iceberg and I believe the next wave of a line is going to be all about that so at the moment the current wave of AI is about passive pattern observation and prediction and and that's what you have on your smartphone and what the major companies on the Pacific of em are using to sell you ads to do marketing that's the current sort of profit in AI and that's only one or two percent of the world economy which is big enough to make these company is pretty much the most valuable companies in the world but there's a much much bigger fraction of the economy going to be affected by the next wave which is really about machines that shape the data through our own actions and you think simulation is ultimately the biggest way that that though those methods will be successful in the next 10 20 years we're not talking about a hundred years from now we're talking about sort of the near-term impact of RL do you think really good simulation is required or is there other techniques like imitation learning you know observing other humans yeah operating in the real world where do you think this success will come from so at the moment we have a tendency of using physics simulations to learn behavior for machines that learn to solve problems that humans also do not know how to solve however this is not the future because the future is and what little babies do they don't use a physics engine to simulate the world no they learn a predictive model of the world which maybe sometimes is wrong in many ways but captures all kinds of important abstract high-level predictions which are really important to be successful and and that's what is what was the future thirty years ago when you started that type of research but it's still the future and now we are know much better how to go there to to move there to move forward and to really make working systems based on that where you have a learning model of the world a model of the world that learns to predict what's going to happen if I do that and that and then the controller uses that model to more quickly learn successful action sequences and then of course always this crazy thing in the beginning the model is stupid so the controller should be motivated to come up with experiments with action sequences that lead to data that improve the model do you think improving the model constructing an understanding of the world in this connection is the in now the popular approaches have been successful you know grounded in ideas of neural networks but in the 80s with expert systems there's symbolic AI approaches which to us humans are more intuitive in a sense that it makes sense that you build up knowledge in this knowledge representation what kind of lessons can we draw in our current approaches mmm for from expert systems from symbolic yeah so I became aware of all of that in the 80s and back then a logic program logic programming was a huge thing was inspiring to yourself did you find it compelling because most a lot of your work was not so much in that realm mary is more in learning systems yes or no but we did all of that so we my first publication ever actually was 1987 was a the implementation of genetic algorithm of a genetic programming system in prologue prologue that's what you learn back then which is a logic programming language and the Japanese the anthers huge fifth-generation AI project which was mostly about logic programming back then although a neural networks existed and were well known back then and deep learning has existed since 1965 since this guy and the UK and even anko started it but the Japanese and many other people they focus really on this logic programming and I was influenced to the extent that I said okay let's take these biologically inspired rules like evolution programs and and and implement that in the language which I know which was Prolog for example back then and then in in many ways as came back later because the Garuda machine for example has approved search on board and without that it would not be optimal well Marcus what does universal algorithm for solving all well-defined problems as approved search on board so that's very much logic programming without that it would not be a Centanni optimum but then on the other hand because we have a very pragmatic is also we focused on we cannula networks and and and some optimal stuff such as gradient based search and program space rather than provably optimal things the logic programming does it certainly has a usefulness in when you're trying to construct something provably optimal or probably good or something like that but is it useful for for practical problems it's really useful at volunteer improving the best theorem provers today are not neural networks right no say our logic programming systems and they are much better theorem provers than most math students and the first or second semester on but for reasoning to for playing games of go or chess or for robots autonomous vehicles that operate in the real world or object manipulation you know you think learning yeah as long as the problems have little to do with with C or improving themselves then as long as that is not the case you you just want to have better pattern recognition so to build a self-driving car you want to have better pattern recognition and and pedestrian recognition and all these things and you want to your minimum you want to minimize the number of false positives which is currently is slowing down self-driving cars in many ways and and all that has very little to do with logic programming yeah what are you most excited about in terms of directions of artificial intelligence at this moment in the next few years in your own research and in the broader community so I think in the not so distant future we will have for the first time little robots that learn like kids and I will be able to say to the robot um look here robot we are going to assemble a smartphone it's takes a slab of plastic and the school driver and let's screw in the screw like that no no not like that like so hmm not like that like that and I don't have a data glove or something he will see me and he will hear me and he will try to do something with his own actuators which will be really different from mine but he will understand the difference and will learn to imitate me but not in the supervised way where a teacher is giving target signals for all his muscles all the time no by doing this high level imitation where he first has to learn to imitate me and then to interpret these additional noises coming from my mouth as helping helpful signals to to do that Hannah and then it will by itself come up with faster ways and more efficient ways of doing the same thing and finally I stopped his learning algorithm and make a million copies and sell it and so at the moment this is not possible but we already see how we are going to get there and you can imagine to the extent that this works economically and cheaply it's going to change everything almost all our production is going to be affected by that and a much bigger wave much bigger ai wave is coming than the one that we are currently witnessing which is mostly about passive pattern recognition on your smartphone this is about active machines that shapes data Susy actions they are executing and they learn to do that in a good way so many of the traditional industries are going to be affected by that all the companies that are building machines well equip these machines with cameras and other sensors and they are going to learn to solve all kinds of problems through interaction with humans but also a lot on their own to improve what they already can do and lots of old economy is going to be affected by that and in recent years I have seen that all the economy is actually waking up and realizing that those vacations and are you optimistic about the future are you concerned there's a lot of people concerned in the near term about the transformation of the nature of work the kind of ideas that you just suggested would have a significant impact of what kind of things could be automated are you optimistic about that future are you nervous about that future and looking a little bit farther into the future there's people like you la musk - a rustle concerned about the existential threats of that future so in the near term job loss in the long term existential threat are these concerns to you or yalta mele optimistic so let's first address the near future we have had predictions of job losses for many decades for example when industrial robots came along many people many people predicted and lots of jobs are going to get lost and in a sense say were right because back then there were car factories and hundreds of people and these factories assembled cars and today the same car factories have hundreds of robots and maybe three guys watching the robots on the other hand those countries that have lots of robots per capita Japan Korea and Germany Switzerland a couple of other countries they have really low unemployment rates somehow all kinds of new jobs were created back then nobody anticipated those jobs and decades ago I already said it's really easy to say which jobs are going to get lost but it's really hard to predict the new ones 30 years ago who would have predicted all these people making money as YouTube bloggers 200 years ago 60% of all people used to work in agriculture today maybe 1% but still only I don't know 5% unemployment lots of new jobs were created and Homo Luden's the the playing man is inventing new jobs all the time most of these jobs are not existentially necessary for the survival of our species there are only very few existentially necessary jobs such as farming and building houses and and warming up the houses but less than 10% of the population is doing that and most of these newly invented jobs are about interacting with other people in new ways through new media and so on getting new high types of kudos and forms of likes and whatever and even making money through that so homo Luden's the playing man doesn't want to be unemployed and that's why he is inventing new jobs all the time and he keeps considering these jobs as really important and is investing a lot of energy and hours of work into into those and new jobs it's quite beautifully put were really nervous about the future because we can't predict what kind of new jobs would be created but your ultimate ly optimistic that we humans are so Restless that we create and give meaning to newer in your jobs telling you likes on faith things that get likes on Facebook or whatever the social platform is so what about long-term existential threat of AI where our whole civilization may be swallowed up by this ultra super intelligent systems maybe it's not going to be smaller DUP but I'd be surprised if B were B humans were the last step and the evolution of the universe you you've actually at this beautiful comment somewhere that I've seen saying that artificial quite insightful artificial general intelligence systems just like us humans will likely not want to interact with humans they'll just interact amongst themselves just like ants interact amongst themselves and only tangentially interact with humans hmm and it's quite an interesting idea that once we create a GI that will lose interest in humans and and have compete for their own Facebook Likes on their own social platforms so within that quite elegant idea how do we know in a hypothetical sense that there's not already intelligent systems out there how do you think broadly of general intelligence greater than us how do we know it's out there mmm how would we know it's around us and could it already be I'd be surprised even with within the next few decades or something like that we we won't have a eyes that truly smarts in every single way and better problem solvers and almost every single important way and I'd be surprised as they wouldn't realize what we have realized a long time ago which is that almost all physical resources are not here and this biosphere but for thou the rest of the solar system gets 2 billion times more solar energy than our little planet there's lots of material out there that you can use to build robots and self-replicating robot factories and all this stuff and they are going to do that and there will be scientists and curious and they will explore what they can do and in the beginning they will be fascinated by life and by their own origins and our civilization they will want to understand that completely just like people today would like to understand how life works and um and also the history of our own existence and civilization and also on the physical laws that created all of that so they in the beginning they will be fascinated my life once they understand that I was interest like anybody who loses interest and things he understands and then as you said the most interesting sources information for them will be others of their own kind so at least in the long run there seems to be some sort of protection through lack of interest on the other side and now it seems also clear as far as we understand physics you need matter and energy to compute and to build more robots and infrastructure and more AI civilization and III ecology is consisting of trillions of different types of AIS and and so it seems inconceivable to me that this thing is not going to expand some AI ecology not controlled by one AI but one by trillions of different types of AI is competing and all kinds of quickly evolving and disappearing ecological niches in ways that we cannot fathom at the moment but it's going to expand limited by Lightspeed and physics it's going to expand and and now we realize that the universe is still young it's only 13.8 billion years old and it's going to be a thousand times older than that so there's plenty of time to conquer the entire universe and to fill it with intelligence and senders and receivers such that AI scan trouble the way they are traveling in our labs today which is by radio from sender to receiver and let's call the current age of the universe one Eon one Eon now it will take just a few eons from now and the entire visible universe is going to be full of that stuff and let's look ahead to a time when the universe is going to be one thousand times older than it is now they will look back and they will say look almost immediately after the Big Bang only a few eons later the entire universe started to become intelligent now to your question how do we see whether anything like that has already happened or is already in a more advanced stage in some other part of the universe of the visible universe we are trying to look out there and nothing like that has happened so far or is that her do you think we'll recognize it or how do we know it's not among us how do we know planets aren't in themselves intelligent beings how do we know ants seen as a collective are not much greater intelligence in our own these kinds of ideas no but it was a boy I was thinking about these things and I thought hmm maybe it has already happened because back then I know I knew I learned from popular physics books that the structure the large-scale structure of the universe is not homogeneous and you have these clusters of galaxies and then in between there are these huge empty spaces and I thought hmm maybe they aren't really empty it's just that in the middle of that some AI civilization already has expanded and then has covered a bottle of a billion light-years diameter and is using all the energy of all the stars within that bubble for its own unfathomable purposes and so it always happened and we just failed to interpret the signs but then alarmed effect gravity by itself explains the large-scale structure of the universe and that this is not a convincing explanation and then I thought maybe maybe it's the dark matter because as far as we know today 80% of the measurable matter is invisible and we know that because otherwise our galaxy or other galaxies would fall apart they would they are rotating too quickly and then the idea was maybe all us he is AI civilizations and hourly out there they they just invisible because they are really efficient in using the energies at their own local systems and that's why they appear dark to us but this is awesome at a convincing explanation because then the question becomes why is there are there still any visible stars left in our own galaxy which also must have a lot of dark matter so that is also not a convincing thing and today I like to think it's quite plausible that maybe are the first at least in our local light cone within a few hundreds of millions of light years that we can reliably observe is there exciting to you it will might be the first and it would make us much more important because if we mess it up through a nuclear war then then maybe this will have an effect on the on the on the development on of the entire universe so let's not mess it up let's not mess it up Union thank you so much for talking today I really appreciate it it's my pleasure you