Transcript
STFcvzoxVw4 • Vladimir Vapnik: Statistical Learning | Lex Fridman Podcast #5
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0051_STFcvzoxVw4.txt
Kind: captions Language: en the following is a conversation with vladimir vapnik he's the co-inventor of the support vector machines support vector clustering vici theory and many foundational ideas in statistical learning he was born in the Soviet Union and worked at the Institute of control sciences in Moscow then in the United States he worked at AT&T NEC labs Facebook research and now as a professor Columbia University his work has been cited over a hundred seventy thousand times he has some very interesting ideas about artificial intelligence and the nature of learning especially on limits of our current approaches and the open problems in the field this conversation is part of MIT course on artificial general intelligence and the artificial intelligence podcast if you enjoy it please subscribe on youtube or rate it on iTunes or your podcast provider of choice or simply connect with me on Twitter or other social networks at Lex Friedman spelled Fri D and now here's my conversation with vladimir vapnik Einstein famously said that God doesn't play dice yeah you have studied the world to the eyes of Statistics so let me ask you in terms of the nature of reality fundamental nature of reality does God play dice you don't know some factors because you don't know some factors which could be important it looks like good play dice but well you should discourage in philosophy they distinguish between two positions positions of instrumentalism where you click accelerate for production and position of realism where you trying to understand what God did can you describe instrumental ISM and realism a little bit for example if you have some mechanical laws what is that is it law which true always and everywhere or it is law which allow you to predict position of moving element the what what you believe you believe that it is God's law the God created the world which Adi - this physical law or it is just law for predictions and which one is instrumentalism for predictions just if you believe that this is law of God innocence always true everywhere that means that you realist say your electorate they really understood understand that God thought say the way you see the world as an instrumentalist you know I working for some models models of machine learning so in this model we can see shaking and we try to solve resolve the setting to solve the problem and you can do in two different way from the point of view of instrumentalist and that's what everybody does now because they says that goal of machine learning is to find the rule for classification that is true bucket is instrument or prediction but I can say the god of machine learning is to to learn about conditional probability so have God played youth and he is he play what is probability for one what is probability for another given situation but for prediction I don't need this I need the role but for the standing I need conditional probability so let me just step back a little bit first to talk about you mentioned which I read last night the parts of the 1960 paper by a Eugene Wigner unreasonable effectiveness of mathematics and Natural Sciences since you're such a beautiful paper by the way made me feel to be honest to confess my own work in the past few years on deep learning heavily applied made me feel that I was missing out on some of the beauty of nature and in the way that math can uncover so let me just step away from the poacher of that for a second how do you see the role of math in your life is it a tool as a poetry where does it sit and does math for you have limits of what you can describe some people saying that Moss is language which use God so I believe it's like to God or here's God almost God he's God yeah so I believe that this article about effectiveness unreasonable effectiveness of mass is that if you look in convert symmetrical structures they know something about reality and the most scientists for natural science they're looking on equation in trying to understand reality so the same in machine learning if you try very carefully looks on all equations which define conditional probability you can understand something about the reality know them from your fantasy so math can reveal the simple underlying principles of reality perhaps you know what means simple it is very hard to discover them but then when you discover them and look at them you see how beautiful they are it is surprising why people did not see that before you look in conclusion and derive it from equations for example I talked yesterday about least square method and people had a lot of fantasy half to improve lives Cornette but if you look going step by step by solving some equations you suddenly you get some term which after thinking you understand that in describe position of observation point in least square method you throw out a lot of information we don't look in composition of point of observations we're working only on residuals but when you're understood that that's very simple idea but it's not too simple to understand and you can derive this just form a coherent so some simple algebra a few steps will take you to something surprising that when I think about ocean and that is proof that human intuition not too rich and very primitive and it does not see very simple situations so that means to take a step back in general yes right but what about human is a positive intuition ingenuity the moments of brilliance so I use so do you have to be so hard on human intuition are there moments of brilliance in human intuition they can leap ahead of math and then the math will catch up all right I don't think so I think that the best human intuition it is putting in actions and then it is technical where I where they axioms take you but if they correctly take actions but it talks your polish during generations of scientists and this is integral wisdom so that's beautifully put but if you maybe look at it when you when you think of Einstein and a special relativity yeah what is the role of imagination coming first there in the moment of discovery of an idea so there's obviously mix of math and out-of-the-box imagination they're not I don't know whatever I did I exclude any imagination because whatever I saw in machine learning that come from imagination like features like deep learning they are not relevant of the problem when you're looking very carefully from mathematical equations you're deriving very simple theory which goes far beyond theoretical then whatever people can imagine because it is not good fantasy yeah it is just interpretation it is just fantasy but it is not what you need you don't need any imagination to derive say more in principle of machine learning when you think about learning and intelligence maybe thinking about the human brain and trying to describe mathematically the process of learning that is something like what happens in the human brain do you think we have the tools currently do you think we will ever have the tools to try to describe that process of learning you it is not descriptions what's going on it is interpretation it is your interpretation your vision can be wrong you know when God invent microscope 11 books for the first time only he got just instrument and nobody on he kept secret about microscope but he wrote reports in London Academy of Science in which his report and he wrote in Gaza blood he looked everywhere on the water on the blood spill but he described blood like fight between green and Kings so or he saw blood cells red cells and he imagines that it is army fighting each other and it was his interpretation of situation and she said that this report and academia signs they very clearly for you look because they believe that he's wrong he's right he saw something yes but he gave wrong interpretation and I believe the same can happen to his brain well yeah the most important part you know I believe in human language in some product is so much wisdom for example people say that it is better than thousand days of diligent studies one day this great teacher but if you I will ask him what teacher does nobody knows and that is intelligence and but we know from history and now from from mass in machine learning that teacher can do a lot so what from a mathematical point of view is a great teacher I don't know that's a journal yeah no but we can say what teachers can do you can introduce some invariance of predicates for creating invariants have you doing it I don't know because teacher knows reality and can describe from the reality predicate invariance but we know that when you're using conveniently can decrease number of observations hundred times that's so but maybe try to pull that apart a little I think you mentioned like a piano teacher saying to the student play like a butterfly yeah I played piano playing guitar for a long time and it yeah that's there's maybe it's romantic poetic but it feels like there's a lot of truth in that statement like there's there's a lot of instruction that's and so can you pull that apart what is what is that the language itself may not contain this information what blah blah blah because it's not blah blah yeah if if you its what if you queue and if your plank yes it does but well it's not the lane it's if it feels like a what is the information being exchanged there what is the nature of information what is the representation of that information I believe that it is sort of predicates but I don't know that's exactly what what intelligence in machine learning should be yes because the rest is just mathematical technique I think that what was discovered recently is it there is to tie to mechanism of learning one called strong convergence mechanism and the convergence mechanism before people use only one could Envy convergence beacon isn't you can use predicate that's what clearly butterfly and if you immediately affect your plank you know the series English product great if it looks like a dog sleeps like a duck and quacks like a duck then it is probably duck yes but this is exact about three decades looks like a duck what it means so you so many dogs that your training data so you you have description hobby how looks integral loops dogs get the visual characteristics of attack yeah but you want and you have model for recognition nouns so you would like so that theoretical description from model coincide this is empirical description each is so intelligent so about looks like a darkness general but what about seems like a dog you should know that duck swims you can say it play chess like a duck okay duck doesn't play and it is completely legal predicate but it is useless so half teacher can recognize not useless predicate so up to now we don't use this predicate in existing motion law and it's called while zillions of data but in this English proverb product they use only sleep litigate looks like a duck swims like a duck and quacks like a duck so you can't deny the fact that swims like a duck and quacks like a duck has humor in it has ambiguity let's talk about sleep like a duck in it does not say jumps jumping like a duck why because it's not relevant but that's music you know dogs you know different birds you know animals and you derive from this it is really one to say yeah so now underneath in order first understand swims like a duck it feels like we need to know millions of other little pieces of information we pick up along the way you don't think so that doesn't need to be this knowledge base in in those statements carries some rich information that helps us understand the essence of duck yeah how far are we from integrating predicates no and you know that when when you can see the completes over here machine learning so what it does you have a lot of functions and then you yo-yo target it looks like a duck you see your training data from training data you recognizes life expected duck should look then you remove all functions which does not look like you think it should look from training date so you decrease amount of function from beach P you pick up one then you give a second predicate in the again the strain decreases the set of function and after that you pick up the best function again when it is standard machine learning so why you need not too many examples your predicates are very good because every predicate is invented to decrease a divisible set of function so you talk about admissible set of functions and talk about good functions so what makes a good function so admissible sort of function is sort of function which has small capacity of small diversity small vc-dimension Excel which contain good function inside by the way for people who don't know CeCe you're the V in the VC so how do you describe to a layperson what GC here is how they describe this you have machine so machine capable to pick up one function from that visible set of function but set of admissible function can be big they contain all continuous functions and killers you don't have so many examples to pick up function but it can be small small liquid capacity but maybe better called diversity so not very different function in the set is infinite set of function but not very diverse so it is small v c-dimension when the sea dimension is small you need not in in small amount of training date so the goal is to create admissible of functions which is have small vc-dimension and contain good function then you should be you'll be able to pick up the function using small amount of observations so that is the task of learning yeah is creating a set of admissible functions that has a small VC dimension and then you've figure out a clever way of picking up that is goal of learning achieve uniformity yesterday yeah statistical learning surgery does not involve in creating admissible set of function in classical learning surgery everywhere hundred percent and textbook the set of function admissible set of function is given but this is science about nothing because the most difficult problem to create admissible set of functions given say a lot of functions continue set up function create admissible set of functions that spins that it kills finite VC dimension small VC dimension and contain good function so this was out of consideration so what's the process of doing it I mean it's fascinating what is the process of creating this admissible set of functions what is emporiums that's in various can describe invariance yeah you know string of prodigies of training data and properties means that you have some function and you just count what is value average value of functional training data you have model and what is the expectation of this function on the model and they should coincide so the the problem is about how to pick up functions it can be any function it in fact it it is true for all functions but because when they're talking that say duck does not jumping so you don't ask question jump like a duck because it is trivially does the jump engine doesn't help you to recognize you but you know something which questions to ask in your asking feet seems like the girl like the duck what looks like a duck at his general situation looks like say guy who have this illness is this disease it is legal yeah so there is a general type of predicate looks like in specific special type of predicate which related to this specific problem and that is intelligence part of all these business ends up where teachers involved incorporating the specialized predicates okay what do you think about deep learning as neural networks these arbitrary architectures as helping accomplish some of the tasks you're thinking about their effectiveness or lack thereof water what are the weaknesses and what are the possible strengths you know I think that this is fantasy everything which by deep learning like features let me give you this example one of the greatest books which Churchill book about history of Second World War and he starting this book describing that in all time when war is over so the great Kings the Gaza together and most all of them were relatives and they discussed what should be done how to create peace they came to agreement and when happens First World War the general public came in power they were so greedy that rock Germany and it was clear for everybody that it is not peace that peace will last only 20 years because they was not professionals in the same way she in machine lock the remote imitations while working for the problem from very deep point of your mathematical point and there are computer scientists this mostly does not know with the markings they just have interpretation of that and they invented a lot of blahblahblah interpretations like deep learning where you did deep learning mathematics does not know deploying mathematic does not know neurons it is just function if you like to say piecewise linear functions say that and doing in class of piecewise linear function but they invent something and then they try to prove advantage of that through interpretations which most live wrong and whether the king must not they they appeal to brain which they know nothing about that nobody not what can communism break so I think that more reliable walk on mass this is multi magical problem to your quest to solve this problem try to understand that there is no only one way of convergence which is strong wave convergence there is a big fear of convergence which requires predicate and if you will go through all the stuff you will see that you don't need the plot even more I would say one of the theorem which called represented carry it says that optimal solution of mathematical problem which is which described learning curve is on shadow network not on deep learning and a shallower again yes there absolutely so in the end what you're saying is exactly right the question is you have no value for throwing something on the table playing with it not math it's like in your old network or you said throwing something in the bucket and or buy out the biological example in looking at kings and queens or the cells or the microscope you don't see value in imagining the cells or kings and queens and using that as inspiration and imagination for where the math will eventually lead you you you think that interpretation who basically deceives you in a way that's not productive I think that if you try to analyze this nation of learning and and especially discussion about deep learning it is discussion about interpretation not about since about what you can say about things that's right but aren't you surprised by the beauty of it so the the not mathematical beauty but the fact that it works at all or are you criticizing that very beauty our human desire to to interpret to find our silly interesting interpretations and these constructs like let me ask you this are you surprised and that does it inspire you how do you feel about the success of a system like alphago and beating the game of go using neural networks to estimate the quality of a book of a board and the quality of the position is your interpretation quality of support yeah yes yeah may it work so it's not our interpretation the fact is a neural network system doesn't matter a learning system that we don't I think mathematically understand that well beats the best human player that's something that was thought impossible it's not very difficult that's so you empirical we've empirically have this this is not a very difficult problem yeah it's true so maybe that can argue so even more obviously that if they use deploring it is not the most effective way of learning theory and usually when people use deep learning they're using zillions of training data yeah but you don't need this so I describe challenge can we do some problems which do well deep learning method this dip net using hundred times less training date even more some problems deep learning cannot solve because it's not necessarily they created miscible set of function money to create deep architecture means to create invisible set of functions you cannot say that you're creating good investment set of functions you're just CEO fantasy it is not comes from mass but it is possible to create admissible set of functions because you have the training data that actually for mathematicians when you consider variant you need to use law of large numbers when you make a training in existing algorithm you need uniform law of large numbers which is much more difficult equation dimension and all that stuff but nevertheless if you use both Vic and Stroke way of convergence you can decrease a lot of training data you could user the three the swims like a duck and quacks like a duck but our so let's let's step back and think about and tell human intelligence in general and clearly that has evolved in a non mathematical way it wasn't as far as we know God or whoever didn't come up with a model in place in our brain of admissible functions it kind of evolved I don't know maybe you have a view on this but so Alan Turing in the 50s in his paper asked and rejected the question can machines think it's not a very useful question but can you briefly entertain this useful useless question can machines think so talk about intelligence and your view of it I don't know that I know the Ewings describe imitation if computer can imitate human being let's call it intelligent and he understands that it is not sinking computer yes he completely understand what he don't but his setup problem of limitation so now we understand that the problem not in imitation I am Not sure that intelligence just inside of us it may be also outside of us I have several observations so when I prove some theorem it's very difficult in couple of years in several places people prove the same theorem saying so lemma after us was done then another guys proved the same variant in the history of science it's happened all the time for example geometry it's happen simultaneously first did Lobachevsky ins and Gauss and boy ie and and other guys and it approximately in ten times period take them years period of time and I saw a lot of examples like that and when in which magicians sings it when they develop something they develop think something in general which affect everybody so maybe our models that Intel only inside of us is incorrect it's our interpretation yeah it might be exist some connection yes won't intelligence I don't know you're almost like plugging in into your exactly and contributing to this network into into a big maybe in your network on the flip side of that maybe you can comment on Big O complexity in how you see classifying algorithms by worst-case running time in relation to their input so that way of thinking about functions do you think P equals NP do you think that's an interesting question yeah it is interesting question but let me talk about complexity in about worst case scenario there is a mathematical setting when I came to United State in 1990 those people did not know this is how it is I did not know statistically so in Russia it was published two monographs or monographs but in America they did not know then they learned and somebody told me that if it's worst case Yuri and they will create real case there but still no it did not because it is much much called too you can do only what you can do using mathematics and which has clear understanding and clear description and for this reason we introduced complexity and you need this because using actually tested or said it like this one more this invention you can prove some theorems but we also create theory for case when you know probability measure and that is the best case it can happen this entropy sorry so from which a medical point of view you know the best possible case and the worst cause on the possible case you can't derive different modeling but it's not so interesting you think they educate the edges are interesting the edges is interesting because it is not so easy to get good bond exact but it's not many cases where your hair the bond is not exact but interesting principles which discover the mass do you think it's interesting because it's challenging and reveals interesting principles that allow you to get those bounds or do you think it's interesting because it's actually very useful for understanding the essence of a function of an algorithm so it's like me judging your life as a human being by the worst thing you did and the best thing you did versus all the stuff in the middle it seems not productive I don't think so because you cannot describe situation in the middle or it will be not general so you can describe education and it is clear it has some model but you cannot describe model for every new case so you you'll be never accurate when you use it but from a statistical point of view the way you've studied functions and and the nature of learning in the world don't you think that the real world has a very long tail that the edge cases are very far away from the mean the stuff in the middle or no I sings it what for my point of view if you will use formal statistic you need uniform law of large numbers if you will use this invariance business you don't need just love large numbers you don't and there is huge difference between uniform law of large numbers and watch your numbers as a useful to describe they're a little more or should we just take it no for example when when I talking about doc I gave sleep indicates if it was enough but if you will try to do formal distinguish you didn't need a lot of observation data and so that means that information about looks like a duck contain a lot of bit of information form of bits of informations so we don't know that how much bit of information contained since from artificially from intelligence and that is the subject of analysis you'll know all business I I don't like half people consider artificial intelligence they consider as some codes which imitate activity of human being it is not science it is applications you would like to imitate go ahead it as very low stolen good problem but you need to learn something more how people try to the clerk out people came to develop se pehle fate swims like a duck or play like multiply or something like that they're not not the teacher tells you how it came in his mind if he chooses image so that process problem of intelligence that is the problem of Intel and you see that connected to the problem of learning absolute are they because you immediately give this predicate like specific predicate sleeps like a duck quacks like a duck it was choosen somehow so what is the line of work would you say well if you were to formulate as a set of open problems that will take us there would play like a butterfly will get a system to be able to let separate two stories run mathematical story that if you have predicates you can do something and another story have to get predicates it is intelligence problem and people even did not start understand intelligence because to understand intelligence first of all try to understand what do teachers have teacher teach why want one teacher better than another one yeah so you think we really even haven't started on a journey of North generating the partners you don't understand they even don't understand since this problem exists because did you feel yeah no I I just know name yeah I I want to understand why one teacher better than another and have a fifth teacher student it was not because he repeating the problem which is in textbook he makes some remarks he makes some philosophy of reasoning you know that's a beautiful it is a formulation of a question that is the open problem why is one teacher better than another all right what he does but yeah what what what what why in every level what people how do they get better what does it mean to be better the whole yeah yeah from from whatever model I have yeah one teacher can give a very good predicate my picture can say swims like a dog and another can say jumped like a duck and jump like a dog's career zero information yeah so what is the most exciting problem in statistical learning you've ever worked on or are working on now oh I just figured this in very odd story and I'm happy that I believe that it is ultimate learning story at least I can show that there are no enlasa mechanism only two mechanisms but they separate statistical part from intelligent part and I know nothing about intelligent Park and if we do know does intelligent part so it will help us a lot in teaching yeah yeah well know it when we see it so for example in my talk the last slide was the challenge so you have say least digitalization problem and deplore the claim that they did it very well say 99.5% correct answers but they use 60,000 observations can you do the same music conduct times less but incorporating variance what it means you know digit 1 2 3 but yeah just looking all that explained division variant I should keep to use hundred examples or say hundred times less examples to do the same job yeah that last slide in unfortunately you're talking it quickly but that last slide was a powerful open challenge in a formulation of your the instructors exact problem of intelligence because everybody when when marshal learning starting it was developed but much much magician they immediately recognize that we use much more training data than human in it but now again the kind of the same story have to decrease that is the problem of learning it is not like in deep learning they they use zillions of training date because my bazillions not enough if you have a good invariance maybe you will never collect some number of directions but no it is a question to to intelligence have to do that because statistical part is rainy as soon as your suppliers will predicate we can do good job the small amount observations and the very first challenges well-known digital cognition and you know digits and please tell me invariance I think it about that I can say four digit three I would introduce concept of horizontal symmetry so the digits Rufus horizontal symmetry say more than say digital or something like that but as soon as I get the horizontal symmetry I can which magical invent a lot of measure horizontal singing symmetry on the vertical symmetry or the organelle symmetry whatever if I have a day of symmetry but for tells working on digital she said it is meta theater predicate which is not shape it is something like symmetry like half dark this whole picture something like that which which can serve as a pretty key you think such predicates could rise out of something that's not general meaning it feels like for me to be able to understand the difference between the two and three I would need to have had a childhood of ten to fifteen years playing with kids going to school being yelled by parents all of that walking jumping looking at ducks and now then I would be able to generate the right predicates for telling the difference in 203 or do you think there's a more efficient way I know for sure you must know something more some digits yes - that's a powerful state yeah but maybe there are several languages of description this elements of digits so I talking about symmetry about southern engineering properties of geometry I'm talking about something abstract I don't know that but this is a problem of intelligence so in one of our article it is trivial to show that every example can carry not more than one bit of information in the air because when your show example and you say this is one you can remove say a function which does not tell you one say it's a best strategy if you can do it perfectly remove half of that but when you use one predicate which looks like a duck you can remove much more functions and half and that means that it contain a lot of detail informations from formal pointers but when you have a general picture and what you want to recognize and general picture of the world chain you invent just predicate and that predicates carry a lot of information beautifully put maybe just me but in all the math you show in your work which is some of the most profound mathematical work in the field of learning ai and just math in general I hear a lot of poetry in philosophy you really kind of talk about philosophy of science there's a there's a poetry and music to a lot of the work you're doing and the way you're thinking about it so do you where's that come from these do you escape to poetry do you escape to music or not exist ground truth process granted yeah and that can be seen everywhere yeah the smart guy philosopher sometimes I surprise has a deep sea sometimes I see that some of them are completely out of subject but the ground rose I seen music musically the ground truth yeah and in poetry when apology they believe they take dictation so what what piece of music as a piece of empirical evidence gave you a sense that they are they're touching something in the ground truth you to structure the structure listening to Bach yeah but you see the structure yeah very clear very classic very simple if the salmon was when you have axioms enjoy native you have the same feelings yeah yes poetry sometime this is insane yeah and if you look back hood you grew up in Russia you maybe were born as a researcher in Russia you've developed as a researcher in Russia Eve came to United States and a few places if you look back what were what was some of your happiest moments as a researcher some of the most profound moments not in terms of their impact on society but in terms of their impact on how damn good you feel that day and you remember that moment you know every time when you found something it is great in one instance of life every simple things just well my general feeling that time mostly most of my time was broke you should go again and again and again and try to be honest in front of yourself not to make interpretation but try to understand that it related to grunt rose it is not my blahblahblah interpretation and something like that but you're a lot to get excited at the at the possibility of discovery oh yeah you'll double you have to double check it but no but how it related to ten other ground rules is it just temporary or this fall for heaven you know you always have a feeling when you found something have because that so 20 years ago we discovered statistical learning theory nobody believed except for one guy Dudley problem 87 20 years and became passion in the same support vector machines that has killed no machines so we would support vector machines and learning theory but when you were working on it you had a sense that you are a sense of the the profundity of it how this this seems to be right it seems to be powerful all right absolutely immediately I recognize that it will last forever and now when I found this invariance story if else a wife ever I have a feeling that it is completely because I have proved there are no different mechanism you can have some say cosmetic improvement you can do but in terms of invariance you need both invariance in statistical learning translation work together but also and criticism we can form over it boaters intelligence but that and to separate from technical part and that is completely absolutely all right thank you so much for talking to thank you as an honor photo