Yann LeCun: Deep Learning, ConvNets, and Self-Supervised Learning | Lex Fridman Podcast #36
SGSOCuByo24 • 2019-08-31
Transcript preview
Open
Kind: captions Language: en the following is a conversation with Jana kun he's considered to be one of the fathers of deep learning which if you've been hiding under a rock is the recent revolution in AI that's captivated the world with the possibility of what machines can learn from data he's a professor in New York University a vice president and chief AI scientist a Facebook & Co recipient of the Turing Award for his work on deep learning he's probably best known as the founding father of convolutional neural networks in particular their application to optical character recognition and the famed M NIST data set he is also an outspoken personality unafraid to speak his mind in a distinctive French accent and explore provocative ideas both in the rigorous medium of academic research and the somewhat less rigorous medium of Twitter and Facebook this is the artificial intelligence podcast if you enjoy it subscribe on YouTube give it five stars on iTunes support and on patreon we're simply gonna equip me on Twitter Alex Friedman spelled the Fri D ma N and now here's my conversation with Yann Laocoon you said that 2001 Space Odyssey is one of your favorite movies Hal 9000 decides to get rid of the astronauts for people haven't seen the movie spoiler alert because he it she believes that the astronauts they will interfere with the mission do you see how is flawed in some fundamental way or even evil or did he do the right thing neither there's no notion of evil in that in that context other than the fact that people die but it was an example of what people call value misalignment right you give an objective to a machine and the Machine strives to achieve this objective and if you don't put any constraints on this objective like don't kill people and don't do things like this the Machine given the power will do stupid things just to achieve this dis objective or damaging things to achieve its objective it's a little bit like we are used to this in the context of human society we we put in place laws to prevent people from doing bad things because fantasy did we do those bad things right so we have to shave their cost function the objective function if you want through laws to kind of correct an education obviously to sort of correct for for those so maybe just pushing a little further on on that point how you know there's a mission there's a this fuzziness around the ambiguity around what the actual mission is but you know do you think that there will be a time from a utilitarian perspective or an AI system where it is not misalignment where it is alignment for the greater good of society that kneei system will make decisions that are difficult well that's the trick I mean eventually we'll have to figure out how to do this and again we're not starting from scratch because we've been doing this with humans for four millennia so designing objective functions for people is something that we know how to do and we don't do it by you know programming things although the legal code is called code so that tells you something and it's actually the design of an object you function that's really what legal code is right it tells you you can do it what you can't do if you do it you pay that much that's that's an objective function so there is this idea somehow that it's a new thing for people to try to design objective functions are aligned with the common good but no we've been writing laws for millennia and that's exactly what it is so this that's where you know the science of lawmaking and and computer science will come together will come together so it's nothing there's nothing special about how or a I systems is just the continuation of tools used to make some of these difficult ethical judgments that laws make yeah and we and we have systems like this already that you know make many decisions for ourselves in society that you know need to be designed in a way that they like you know rules about things that sometimes sometimes have bad side effects and we have to be flexible enough about those rules so that they can be broken when it's obvious that they shouldn't be applied so you don't see this on the camera here but all the decorations in this room is all pictures from 2001 a Space Odyssey Wow and by accident or is there a lot about accident it's by design Wow so if you were if you were to build hell 10,000 so an improvement of Hal 9000 what would you improve well first of all I wouldn't ask you to hold secrets and tell lies because that's really what breaks it in the end that's the the fact that it's asking itself questions about the purpose of the mission and it's you know pieces things together that it's heard you know all the secrecy of the preparation of the mission and the fact that it was discovery and on the lunar surface that really was kept secret and and one part of Hal's memory knows this and the other part is does not know it and it's supposed to not tell anyone and that creates a internal conflict do you think there's never should be a set of things that night AI system should not be allowed like a set of facts that should not be shared with the human operators well I think no I think the I think it should be a bit like in the design of autonomous AI systems there should be the equivalent of you know the the the oath that hypocrite Oh calm yourself yeah that doctors sign up to right so the certain thing certain rule said that that you have to abide by and we can sort of hardwire this into into our into our machines to kind of make sure they don't go so I'm not you know advocate of the the 303 dollars of Robotics you know the as you move kind of thing because I don't think it's practical but but you know some some level of of limits but but to be clear this is not these are not questions that are kind of really worth asking today because we just don't have the technology to do this we don't we don't have a ton of missing teller machines we have intelligent machines so my intelligent machines that are very specialized but they don't they don't really sort of satisfy an objective they're just you know kind of trained to do one thing so until we have some idea for design of a full-fledged autonomous intelligent system asking the question of how we design use objective I think is a little a little too abstract it's a little tough rat there's useful elements to it in that it helps us understand our own ethical codes humans so even just as a thought experiment if you imagine that in a GI system is here today how would we program it is a kind of nice thought experiment of constructing how should we have a law have a system of laws far as humans it's just a nice practical tool and I think there's echoes of that idea too in the AI systems left today it don't have to be that intelligent yeah like autonomous vehicles there's these things start creeping in that were thinking about but certainly they shouldn't be framed as as hell yeah looking back what is the most I'm sorry if it's a silly question but what is the most beautiful or surprising idea and deep learning or AI in general that you've ever come across sort of personally well you said back and and just had this kind of wow that's pretty cool moment that's nice well surprising I don't know if it's an idea rather than a sort of empirical fact the fact that you gigantic neural nets trying to train them on you know relatively small amounts of data relatively with the caste grid in the center that it actually works breaks everything you read in every textbook right every pre deep learning textbook that told you you need to have fewer parameters and you have data samples you know if you have non-convex objective function you have no guarantee of convergence you know all the things that you read in textbook and they tell you stay away from this and they were all wrong huge number of parameters non-convex and somehow which is very relative to the number of parameters data it's able to learn anything right does that surprise you today well it it was kind of obvious to me before I knew anything that that's that this is a good idea and then it became surprising that it worked because I started reading those text books okay so okay you talk to the intuition of why was obviously if you remember well okay so the intuition was it's it's sort of like you know those people in the late 19th century who proved that heavier than than air flight was impossible right and of course you have birds right they do fly and so on the face of it it it's obviously wrong as an empirical question right and so we have the same kind of thing that you know the we know that the brain works we don't know how but we know it works and we know it's a large network of neurons and interaction and the learning takes place by changing the connection so kind of getting this level of inspiration without copying the details but sort of trying to derive basic principles you know that kind of gives you a clue as to which direction to go there's also the idea somehow that I've been convinced of since I was an undergrad that even before that intelligence is inseparable from running so you the idea somehow that you can create an intelligent machine by basically programming for me was a non-starter you know from the start every intelligent entity that we know about arrives at this intelligence to learning so learning you know machine learning was completely obvious path also because I'm lazy so you know it's automate basically everything and learning is the automation of intelligence right so do you think so what is learning then what what falls under learning because do you think of reasoning is learning where reasoning is certainly a consequence of learning as well just like other functions of of the brain the big question about reasoning is how do you make reasoning compatible with gradient based learning do you think neural networks can be made to reason yes that there's no question about that again we have a good example right the question is is how so the question is how much prior structure you have to put in the neural net so that something like human reasoning will emerge from it you know from running another question is all of our kind of model of what reasoning is that are based on logic are discrete and and and are therefore incompatible with gradient based learning and I was very strong believer in this idea Grandin baserunning I don't believe that other types of learning that don't use kind of gradient information if you want so you don't like discrete mathematics you don't like anything discrete well that's it's not that I don't like it it's just that it's it's incompatible with learning and I'm a big fan of running right so in fact that's perhaps one reason why deep learning has been kind of looked at with suspicion by a lot of computer scientists because the math is very different the method you use for deep running you know we kind of as more to do with you know cybernetics the kind of math you do in electrical engineering then the kind of math you doing computer science and and you know nothing in in machine learning is exact right computer science is all about sort of you know obviously compulsive attention to details of like you know every index has to be right and you can prove that an algorithm is correct right machine learning is the science of sloppiness really that's beautiful so okay maybe let's feel around in the dark of what is a neural network that reasons or a system that is works with continuous functions that's able to do build knowledge however we think about reasoning builds on previous knowledge build on extra knowledge create new knowledge generalized outside of any training set ever built what does that look like if yeah maybe do you have Inklings of thoughts of what that might look like well yeah I mean yes or no if I had precise ideas about this I think you know we'd be building it right now but and there are people working on this or whose main research interest is actually exactly that right so what you need to have is a working memory so you need to have some device if you want some subsystem they can store a relatively large number of factual episodic information for you know a reasonable amount of time so you you know in the in the brain for example it kind of three main types of memory one is the sort of memory of the the state of your cortex and that sort of disappears within 20 seconds you can't remember things for more than about 20 seconds or a minute if if you don't have any other form of memory the second type of memory which is longer term is short term is the hippocampus so you can you know you came into this building you remember whether where the the exit is where the elevators are you have some map of that building that's stored in your hippocampus you might remember something about what I said you know if you minutes ago and forgot all our stars being raised but you know but that does not work in your hippocampus and then the the longer term memory is in the synapse the synapses right so what you need if you want for a system that's capable reasoning is that you want the hippocampus like thing right and that's what people have tried to do with memory networks and you know no Turing machines and stuff like that right and and now with transformers which have sort of a memory in their kind of self attention system you can you can think of it this way so so that's one element you need another thing you need is some sort of network that can access this memory get an information back and then kind of crunch on it and then do this iteratively multiple times because a chain of reasoning is a process by which you you you can you update your knowledge about the state of the world about you know what's gonna happen etc and that there has to be this sort of recurrent operation basically and you think that kind of if we think about a transformer so that seems to be too small to contain the knowledge that's that's to represent the knowledge as containing Wikipedia for example but transformer doesn't have this idea of recurrence it's got a fixed number of layers and that's number of steps that you know limits basically it's a representation but recurrence would build on the knowledge somehow I mean yeah it would evolve the knowledge and expand the amount of information perhaps or useful information within that knowledge yeah but is this something that just can emerge with size because it seems like everything we have now is just no it's not it's not it's not clear how you access and right into an associative memory in efficient way I mean sort of the original memory network maybe had something like the right architecture but if you try to scale up a memory network so that the memory contains all we keep here it doesn't quite work right so so this is a need for new ideas there okay but it's not the only form of reasoning so there's another form of reasoning which is true which is very classical so in some types of AI and it's based on let's call it energy minimization okay so you have some sort of objective some energy function that represents the the the quality or the negative quality okay energy goes up when things get bad and they get low when things get good so let's say you you want to figure out you know what gestures do I need to to do to grab an object or walk out the door if you have a good model of your own body a good model of the environment using this kind of energy minimization you can make a you can make you can do planning and it's in optimal control it's called it's called Marie put model predictive control you have a model of what's gonna happen in the world as consequence for your actions and that allows you to buy energy minimization figure out the sequence of action that optimizes a particular objective function which measures you know minimize the number of times you're gonna hit something and the energy gonna spend doing the gesture and etc so so that's performer reasoning planning is a form of reasoning and perhaps what led to the ability of humans to reason is the fact that or you know species you know that appear before us had to do some sort of planning to be able to hunt and survive and survive the winter in particular and so you know it's the same capacity that you need to have so in your intuition is if you look at expert systems in encoding knowledge as logic systems as graphs in this kind of way is not a useful way to think about knowledge graphs are your brittle or logic representation so basically you know variables that that have values and constraint between them that are represented by rules as well too rigid and too brittle right so one of the you know some of the early efforts in that respect were were to put probabilities on them so a rule you know you know if you have this in that symptom you know you have this disease with that probability and you should describe that antibiotic with that probability right this my sin system from the for the 70s and that that's what that branch of AI led to you know busy networks in graphical models and causal inference and vibrational you know method so so there there is I mean certainly a lot of interesting work going on in this area the main issue with this is is knowledge acquisition how do you reduce a bunch of data to graph of this type near relies on the expert and a human being to encode at add knowledge and that's essentially impractical yeah the question the second question is do you want to represent knowledge symbols and you want to manipulate them with logic and again that's incomparable we're learning so one suggestion with geoff hinton has been advocating for many decades is replace symbols by vectors think of it as pattern of activities in a bunch of neurons or units or whatever you wanna call them and replace logic by continuous functions okay and that becomes now compatible there's a very good set of ideas by region in a paper about 10 years ago by leon go to on who is here at face book the title of the paper is for machine learning to machine reasoning and his idea is that learning learning system should be able to manipulate objects that are in the same space in a space and then put the result back in the same space so is this idea of working memory basically and it's a very enlightening and in the sense that might learn something like the simple expert systems I mean it's with you can learn basic logic operations there yeah quite possibly yeah this is a big debate on sort of how much prior structure you have to put in for this kind of stuff to emerge that's the debate I have with Gary Marcus and people like that yeah yeah so and the other person so I just talked to judea pearl mm-hmm well you mentioned causal inference world his worry is that the current knew all networks are not able to learn what causes what causal inference between things so I think I think he's right and wrong about this if he's talking about the sort of classic type of neural nets people also didn't worry too much about this but there's a lot of people now working on causal inference and there's a paper that just came out last week by Leon Mbutu among others develop his path and push for other people exactly on that problem of how do you kind of you know get a neural net to sort of pay attention to real causal relationships which may also solve issues of bias in data and things like this so I'd like to read that paper because that ultimately the challenges also seems to fall back on the human expert to ultimately decide causality between things people are not very good at its direction causality first of all so first of all you talk to a physicist and physicists actually don't believe in causality because look at the all the busy clause or microphysics are time reversible so there is no causality the arrow of time is not right yeah it's it's as soon as you start looking at macroscopic systems where there is unpredictable randomness where there is clearly an arrow of time but it's a big mystery in physics actually well how that emerges is that emergent or is it part of the fundamental fabric of reality yeah or is it bias of intelligent systems that you know because of the second law of thermodynamics we perceive a particular arrow of time but in fact it's kind of arbitrary right so yeah physicists mathematicians they don't care about I mean the math doesn't care about the flow of time well certainly certainly macro physics doesn't people themselves are not very good at establishing causal causal relationships if you ask is I think it was in one of Seymour Papert spoken on like children learning you know he studied with Jean Piaget you know he's the guy who co-authored the book perceptron with Marvin Minsky that kind of killed the first wave but but he was actually a learning person he in the sense of studying learning in humans and machines that's what he got interested in for scepter on and he wrote that if you ask a little kid about what is the cause of the wind a lot of kids will say they will think for a while and they'll say oh it's the the branches in the trees they move and that creates wind right so they get the causal relationship backwards and it's because their understanding of the world and intuitive physics is not that great right I mean these are like you know four or five year old kids you know it gets better and then you understand that this it can't be right but there are many things which we can because of our common sense understanding of things what people call common sense yeah and we understanding of physics we can there's a lot of stuff that we can figure out causality even with diseases we can figure out what's not causing what often there's a lot of mystery of course but the idea is that you should be able to encode that into systems it seems unlikely to be able to figure that out themselves well whenever we can do intervention but you know all of humanity has been completely deluded for millennia probably since existence about a very very wrong causal relationship where whatever you can explain you attributed to you know some deity some divinity right and that's a cop-out that's the way of saying like I don't know the cause so you know God did it right so you mentioned Marvin Minsky and the irony of you know maybe causing the first day I winter you were there in the 90s you're there in the 80s of course in the 90s what do you think people lost faith and deep learning in the 90s and found it again a decade later over a decade later yeah it wasn't called dethroning yeah it was just called neural nets you know yeah they lost interests I mean I think I would put that around 1995 at least the machine learning community there was always a neural net community but it became disconnected from sort of ministry machine owning if you want there were it was basically electrical engineering that kept at it and computer science just gave up give up on neural nets I don't I don't know you know I was too close to it to really sort of analyze it with sort of a unbiased eye if you want but I would I would I would would make a few guesses so the first one is at the time neural nets were it was very hard to make them work in the sense that you would you know implement back prop in your favorite language and that favorite language was not Python it was not MATLAB it was not any of those things cuz they didn't exist right you had to write it in Fortran or C or something like this right so you would experiment with it you would probably make some very basic mistakes like you know badly initialize your weights make the network too small because you read in the textbook you know you don't want too many parameters right and of course you know and you would train on x4 because you didn't have any other data set to try it on and of course you know it works half the time so we'd say you give up also 22 the batch gradient which you know isn't it sufficient so there's a lot of bag of tricks that you had to know to make those things work or you had to reinvent and a lot of people just didn't and they just couldn't make it work so that's one thing the investment in software platform to be able to kind of you know display things figure out why things don't work and I get a good intuition for how to get them to work have enough flexibility so you can create you know network architectures well completion ads and stuff like that it was hard yeah when you had to write everything from scratch and again you didn't have any Python or MATLAB or anything right so what I read that sorry to interrupt but I read he wrote in in Lisp the first versions of Lynette accomplished in your networks which by the way one of my favorite languages that's how I knew you were legit the Turing Award whatever this would be programmed and list that's still my favorite language but it's not that we programmed in Lisp it's that we had to write or this printer printer okay cuz it's not that's right that's one that existed so we wrote a lisp interpreter that we hooked up to you know back in library that we wrote also for neural net competition and then after a few years around 1991 we invented this idea of basically having modules that know how to forward propagate and back propagate gradients and then interconnecting those modules in a graph loom but who had made proposals on this about this in the late 80s and were able to implement this using all this system eventually we wanted to use that system to make build production code for character recognition at Bell Labs so we actually wrote a compiler for that disp interpreter so that Christy Martin who is now Microsoft kind of did the bulk of it with Leone and me and and so we could write our system in lisp and then compiled to seee and then we'll have a self-contained complete system that could kind of do the entire thing neither Python or turn pro can do this today yeah okay it's coming yeah I mean there's something like that in Whitehorse called you know tor script and so you know we had to write or Lisp interpreter which retinol is compiler way to invest a huge amount of effort to do this and not everybody if you don't completely believe in the concept you're not going to invest the time to do this right now at the time also you know it were today this would turn into torture by torture and so for whatever we put it in open-source everybody would use it and you know realize it's good back before 1995 working at AT&T there's no way the lawyers would let you release anything in open source of this nature and so we could not distribute our code really and at that point and sorry to go on a million tangents but on that point I also read that there was some almost pad like a patent on convolution your network yes it was labs so that first of all I mean just to actually that ran out the thankfully 8007 in 2007 that what look can we can we just talk about that first I know you're a facebook but you're also done why you and and what does it mean patent ideas like these software ideas essentially or what are mathematical ideas or what are they okay so they're not mathematical idea so there are you know algorithms and there was a period where the US Patent Office would allow the patent of software as long as it was embodied the Europeans are very different they don't they don't quite accept that they have a different concept but you know I don't I know no I mean I never actually strongly believed in this but I don't believe in this kind of patent Facebook basically doesn't believe in this kind of pattern Google Files patterns because they've been burned with Apple and so now they do this for defensive purpose but usually they say we're not going to see you if you infringe Facebook has a similar policy they say you know we file pattern on certain things for defensive purpose we're not going to see you if you infringe unless you sue us so the the industry does not believe in in patterns they are there because of you know the legal landscape and and and various things but but I don't really believe in patterns for this kind of stuff yes so that's that's a great thing so I tell you a war story yeah you so what happens was the the first the first pattern of a condition that was about kind of the early version Congress on that that didn't have separate pudding layers it had the conditional layers which tried more than one if you want right and then there was a second one on commercial nets with separate pudding layers train with back probably in 89 and 1992 something like this at the time the life life of a pattern was 17 years so here's what happened over the next few years is that we started developing character recognition technology around commercial Nets and in 1994 a check reading system was deployed in ATM machines in 1995 it was for a large check reading machines in back offices etc and those systems were developed by an engineering group that we were collaborating with AT&T and they were commercialized by NCR which at the time was a subsidiary of AT&T now it ain't he split up in 1996 99 in 1996 and the lawyers just looked at all the patterns and they distributed the patterns among the various companies they gave the the commercial net pattern to NCR because they were actually selling products that used it but nobody I didn't see are at any idea where they come from that was yeah okay so between 1996 and 2007 there's a whole period until 2002 I didn't actually work on machine on your couch on that I resumed working on this around 2002 and between 2002 and 2007 I was working on them crossing my finger that nobody and NCR would notice nobody noticed yeah and I and I hope that this kind of somewhat as you said lawyers decide relative openness of the community now will continue it accelerates the entire progress of the industry and you know the problems that Facebook and Google and others are facing today is not whether Facebook or Google or Microsoft or IBM or whoever is ahead of the other it's that we don't have the technology to build the things we want to build we only build intelligent virtual systems that have common sense we don't have a monopoly on good ideas for this we don't believe with you maybe others do believe they do but we don't okay if a start-up tells you they have the secret to you know human level intelligence and common sense don't believe them they don't and it's going to take the entire work of the world research community for a while to get to the point where you can go off and in each of the company is going to start to build things on this we're not there yet it's absolutely in this this calls to the the gap between the space of ideas and the rigorous testing of those ideas of practical application that you often speak to you've written advice saying don't get fooled by people who claim to have a solution to artificial general intelligence who claim to have an AI system that work just like the human brain or who claim to have figured out how the brain works ask them what the error rate they get on em 'no store imagenet this is a little dated by the way that mean five years who's counting okay but i think your opinion it's the Amna stand imagenet yes may be data there may be new benchmarks right but i think that philosophy is one you still and and somewhat hold that benchmarks and the practical testing the practical application is where you really get to test the ideas well it may not be completely practical like for example you know it could be a toy data set but it has to be some sort of task that the community as a whole has accepted as some sort of standard you know kind of benchmark if you want it doesn't need to be real so for example many years ago here at fair people you know chosen Western art one born and a few others proposed the the babbitt asks which were kind of a toy problem to test the ability of machines to reason actually to access working memory and things like this and it was very useful even though it wasn't a real task amnesties kind of halfway a real task so you know toy problems can be very useful it's just that i was really struck by the fact that a lot of people particularly our people with money to invest would be fooled by people telling them oh we have you know the algorithm of the cortex and you should give us 50 million yes absolutely so there's a lot of people who who tried to take advantage of the hype for business reasons and so on but let me sort of talk to this idea that new ideas the ideas that push the field forward may not yet have a benchmark or it may be very difficult to establish a benchmark I agree that's part of the process establishing benchmarks is part of the process so what are your thoughts about so we have these benchmarks on around stuff we can do with images from classification to captioning to just every kind of information can pull off from images and the surface level there's audio datasets there's some video what can we start natural language what kind of stuff what kind of benchmarks do you see they start creeping on to more something like intelligence like reasoning like maybe you don't like the term but AGI echoes of that kind of yeah sort of elation a lot of people are working on interactive environments in which you can you can train and test intelligent systems so so there for example you know it's the classical paradigm of supervised running is that you you have a data set you partition it into a training site validation set test set and there's a clear protocol right but what if the that assumes that this apples are statistically independent you can exchange them the order in which you see them doesn't shouldn't matter you know things like that but what if the answer you give determines the next sample you see which is the case for example in robotics right you robot does something and then it gets exposed to a new room and depending on where it goes the room would be different so that's the decrease the exploration problem the what if the samples so that creates also a dependency between samples right you you if you move if you can only move it in in space the next sample you're gonna see is going to be probably in the same building most likely so so so the all the assumptions about the validity of this training set test set a potus's break whatever a machine can take an action that has an influence in the in the world and it's what is going to see so people are setting up artificial environments where what that takes place right the robot runs around a 3d model of a house and can interact with objects and things like this how you do robotics by simulation you have those you know opening a gym type thing or mu Joko kind of simulated robots and you have games you know things like that so that that's where the field is going really this kind of environment now back to the question of a GI like I don't like the term a GI because it implies that human intelligence is general and human intelligence is nothing like general it's very very specialized we think it's general we'd like to think of ourselves as having your own science we don't we're very specialized we're only slightly more general than why does it feel general so you kind of the term general I think what's impressive about humans is ability to learn as we were talking about learning to learn in just so many different domains is perhaps not arbitrarily general but just you can learn in many domains and integrate that knowledge somehow okay that knowledge persists so let me take a very specific example yes it's not an example it's more like a a quasi mathematical demonstration so you have about 1 million fibers coming out of one of your eyes okay two million total but let's let's talk about just one of them it's 1 million nerve fibers your optical nerve let's imagine that they are binary so they can be active or inactive right so the input to your visual cortex is 1 million bits now they connected to your brain in a particular way on your brain has connections that are kind of a little bit like accomplish on that they're kind of local you know in space and things like this I imagine I play a trick on you it's a pretty nasty trick I admit I I cut your optical nerve and I put a device that makes a random perturbation of a permutation of all the nerve fibers so now what comes to your to your brain is a fixed but random permutation of all the pixels there's no way in hell that your visual cortex even if I do this to you in infancy will actually learn vision to the same level of quality that you can got it and you're saying there's no way you ever learn that no because now two pixels that on your body in the world will end up in very different places in your visual cortex and your neurons there have no connections with each other because they only connect it locally so this whole our entire the hardware is built in many ways to support the locality of the real world yeah yes that's specialization yep okay it's still now really damn impressive so it's not perfect generalization I even closed no no it's it's it's it's not that it's not even close it's not at all yes it's socialize so how many boolean functions so let's imagine you want to train your visual system to you know recognize particular patterns of those 1 million bits ok so that's a boolean function right either the pattern is here or not here this is a to to a classification with 1 million binary inputs how many such boolean functions are there okay if you have 2 to the 1 million combinations of inputs for each of those you have an output bit and so you have 2 to the 2 to the 1 million boolean functions of this type okay which is an unimaginably large number how many of those functions can actually be computed by your visual cortex and the answer is a tiny tiny tiny tiny tiny tiny sliver like an enormous little tiny sliver yeah yeah so we are ridiculously specialized you know okay but okay that's an argument against the word general I think there's there's a I there's I agree with your intuition but I'm not sure it's it seems the breath the the brain is impressively capable of adjusting to things so it's because we can't imagine tasks that are outside of our comprehension right we think we think we are general because we're general of all the things that we can apprehend so yeah but there is a huge world out there of things that we have no idea we call that heat by the way heat heat so at least physicists call that heat or they call it entropy which is kokkonen you have a thing full of gas right call system for gas right goes on a coast it has you know pressure it has temperature has you know and you can write the equations PV equal NRT you know things like that right when you reduce a volume the temperature goes up the pressure goes up you know things like that right for perfect gas at least those are the things you can know about that system and it's a tiny tiny number of bits compared to the complete information of the state of the entire system because the state when HR system will give you the position and momentum of every every molecule of the gas and what you don't know about it is the entropy and you interpret it as heat the energy containing that thing is is what we call heat now it's very possible that in fact there is some very strong structure in how those molecules are moving is just that they are in a way that we are just not wired to perceive they are ignorant to it and there's in your infinite amount of things we're not wired to perceive any right that's a nice way to put it well general to all the things we can imagine which is a very tiny a subset of all things that are possible it was like coma growth complexity or the coma was charged in some one of complexity you know every bit string or every integer is random except for all the ones that you can actually write down yeah okay so beautifully put but you know so we can just call it artificial intelligence we don't need to have a general whatever novel human of all Nutella transmissible oh you know you'll start anytime you touch human it gets it gets interesting because you know it's just because we attach ourselves to human and it's difficult to define with human intelligences yeah nevertheless my definition is maybe damn impressive intelligence ok damn impressive demonstration of intelligence whatever and so on that topic most successes in deep learning have been in supervised learning what is your view on unsupervised learning is there a hope to reduce involvement of human input and still have successful systems that are have practically used yeah I mean there's definitely a hope is it's more than a hope actually it's it's you know mounting evidence for it and that's basically or I do like the only thing I'm interested in at the moment is I call it self supervised running not unsupervised cuz unsupervised running is a loaded term people who know something about machine learning you know tell us how you doing clustering or PCA yeah she's nice and the way public we know when you say enterprise only oh my god you know machines are gonna learn by themselves and without supervision you know there's the parents yeah so so I could sell supervised learning because in fact the underlying algorithms that I use are the same algorithms as the supervised learning algorithms except that what we trained them to do is not predict a particular set of variables like the category of an image and and not to predict a set of variables that have been provided by human labelers but what you're trying to machine to do is basically reconstruct a piece of its input that it's being this being masked masked out essentially you can think of it this way right so show a piece of a video to a machine and ask it to predict what's gonna happen next and of course after a while you can show what what happens and the machine will kind of train itself to do better at that task you can do like all the latest most successful models the natural language processing use cell supervised running you know sort of bird style systems for example right you show it a window of a thousand words on a test corpus you take out 15% of the words and then you train a machine to predict the words that are missing that's out supervised running it's not predicting the future it's just you know predicting things in middle but you could have you predict the future that's what language models do so you construct it so in an unsupervised way you construct a model of language do you think or video or the physical world or whatever right how far do you think that can take us do you think very far it understands anything to some level it has you know a shallow understanding of of text but it needs to I mean to have kind of true human level intelligence I think you need to ground language in reality so some people are attempting to do this right having systems that can I have some visual representation of what what is being talked about which is one reason you need interactive environments actually this is like a huge technical problem that is not solved and that explains why such super versioning works in the context of natural language that does not work in the context on at least not well in the context of image recognition and video although it's making progress quickly and the reason that reason is the fact that it's much easier to represent uncertainty in the prediction you know context of natural language than it is in the context of things like video and images so for example if I ask you to predict what words are missing you know 15 percent of the words that I've taken out the possibility is small that means small right there is 100,000 words in the in the lexicon and what the Machine spits out is a big probability vector right it's a bunch of numbers between 0 & 1 that's 1 to 1 and we know how to do how to do this with computers so they are representing uncertainty in the prediction is relatively easy and that's in my opinion why those techniques work for NLP for images if you ask if you block a piece of an image and you as a system reconstruct that piece of the image there are many possible answers there are all perfectly legit right and how do you represent that the set of possible answers you can't train a system to make one prediction you can train a neural net to say here it is that's the image because it's there's a whole set of things that are compatible with it so how do you get the machine to represent not a single output but all set of outputs and you know similarly with video prediction there's a lot of things that can happen in the future video you're looking at me right now I'm not moving my head very much but you know I might you know what turn my my head to the left or to the right right if you don't have a system that can predict this and you train it with least Square to kind of minimize the error with the prediction and what I'm doing what you get is a blurry image of myself in all possible future positions that I might be in which is not a good prediction but so there might be other ways to do the self supervision right for visual scenes like what if i I mean if I knew I wouldn't tell you publish it first I don't know I know there might be so I mean these are kind of there might be artificial ways of like self play in games the way you can simulate part of the environment you can oh that doesn't solve the problem it's just a way of generating data but because you have more of a country might mean you can control yeah it's a way to generate data and that's right and because you can do huge amounts of data generation that doesn't you write this well it's it's a creeps up on the problem from the side of data and you don't think that's the right way to it doesn't solve this problem of handling uncertainty in the world right so if you if you have a machine learn a predictive model of the world in a game that is deterministic or quasi deterministic it's easy right just you know give a few frames of the game to a combat put a bunch of layers and then half the game generates the next few frames and and if the game is deterministic it works fine and that includes you know feeding the system with the action that your little character is going to take the problem comes from the fact that the real world and certain most games are not entirely predictable that's what they're you get those blurry predictions and you can't do planning with very predictions all right so if you have a perfect model of the world you can in your head run this model with a hypothesis for a sequence of actions and you're going to predict the outcome of that sequence of actions but if your model is imperfect how can you plan yeah it quickly explodes what are your thoughts on the extension of this which topic I'm super excited about it's connected to something you're talking about in terms of robotics is active learning so as opposed to sort of unemployed and supervisors self supervised learning you ask the system for human help right for selecting parts you want annotated next so if you talk about a robot exploring a space or a baby exploring a space or a system exploring a data set every once in a while asking for human input you see value in that kind of work I don't see transformative value it's going to make things that we can already do more efficient or they will learn slightly more efficiently but it's not going to make machines sort of significantly more intelligent I think and I and by the way there is no opposition there is no conflict between self supervisor on reinforcement learning and supervisor on your imitation learning or active learning I see sub super wrestling as a as a preliminary to all of the above yes so the example I use very often is how is it that so if you use enforcement running deep enforcement running if you want the best methods today was so-called model free enforcement training to learn to play Atari games take about 80 hours of training to reach the level that any human can reach in about 15 minutes they get better than humans but it takes a long time alpha star okay the you know are your videos and his team's the system to play to to play Starcraft plays you know a single map a single type of player and which better than human level is about the equivalent of 200 years of training playing against itself it's 200 years right it's not something that no no human can could every I'm not sure what it doesn't take away from that okay now take those algorithms the best our algorithms we have today to train a car to drive itself it would probably have to drive millions of hours you will have to kill thousands of pedestrians it will have to run into thousands of trees it will have to run off cliffs and you had to run the cliff multiple times before it figures out it's a bad idea first of all yeah and second of all the figures that had not to do it and so I mean this type of running obviously does not reflect the kind of running that animals and humans do there is something missing that's really really important there and my apart is is which have been advocating for like five years now is that we have predictive models of the world that include the ability to predict under uncertainty and what allows us to not run off a cliff when we learn to drive most of us can learn to drive in about 20 or 30 hours of training without ever crashing causing any accident if we drive next to a cliff we know that if we turn the wheel to the right the car is going to run off the cliff and nothing good is gonna come out of this because we have a pretty good model of intuitive physics that tells us you know the car is gonna fall we know we know about gravity babies run this around the age of eight or nine months that objects don't float they fall and you know we have a pretty good idea of the effect of turning the wheel
Resume
Categories