What is Statistics? (Michael I. Jordan) | AI Podcast Clips
AQUAPiHahVY • 2020-02-25
Transcript preview
Open
Kind: captions Language: en an absurd question but what is statistics so the here it's a little bit it's somewhere between math and science and technology it's somewhere in that convex hull so it's some principles that allow you to make inferences that have got some reason to be believed and also principle allow you to make decisions where you can have some reason to believe you're not gonna make errors so all that requires some assumptions about what do you mean by an error what do you mean by you know the probabilities and but and you know it struck me after you start making some assumptions you're led to conclusions that yes I can guarantee that you know you you know if you do this in this way your probability making error will be small your probability of continuing to not make errors over time will be small and probability you found something that's real will be small will be high the decision-making is a big parts it may be the big part yeah so the original so statistics you know short history was that you know it's Carter goes back this sort as a formal discipline you know 250 years or so it was called inverse probability because around that era probability was developed sort of is especially to explain gambling situations of course and interesting so you would say well given the state of nature is this there's a certain roulette or that has a certain mechanism in it what kind of outcomes do I expect to see and especial if I do things long long amounts of time what outcomes what I see in the physicists are to pay attention to this and then people said well given and let's turn the problem around what if I saw certain outcomes could I infer what the underlying mechanism was that's an inverse problem and in fact for quite a while statistics was called inverse probability that was the name of the field and I believe that I was Laplace who was working in Napoleon's government who was trying to needed to do a census of France learn about the people there so he went got in gather data and he analyzed that data to determine policy and said let's call this field that does this kind of thing statistics cuz the the word state is in there in French that's a table but you know it's the study of data for the state it's anyway that caught on and it's been all statistics ever since but but by the time it got formalized it was sort of in the 30s and around that time there was game theory and decision theory developed and nearby people that era didn't think of themselves as either computer science or statistics or control or econ they were all they were all the above and so you know von Norman is developing game theory but also thinking about its decision theory Walt is an economy trician developing decision theory and then you know turned that into statistics and so it's all about here's a here's not just data and you analyze it here's a loss function here's what you care about here's the question you're trying to ask here is a probability model and here's the risk you will face if you make certain decisions and to this day and most advanced statistical curricula you teach decision theory is the starting point and then it branches out and if the two branches are Bayesian or frequentist but um that's it's all about decisions in statistics what is the most beautiful mysterious may be surprising idea that you've come across yeah good question um I mean there's a bunch of surprising ones there's something it's way too technical for this thing but something called James Stein estimation which is kind of surprising and really takes time to wrap your head around can you try to make me I think I don't want to even want to try um let me just say a colleague at Steve a Steven stickler University Chicago wrote a really beautiful paper on James Stein estimation which helps to its views of paradox it kind of defeats the minds attempts to understand it but you can and Steve has a nice perspective on that there so one of the troubles with statistics is that it's like in physics that are in quantum physics you have multiple interpretations there's a wave and particle duality in physics and you get used to that over time but it still kind of haunts you that you don't really you know quite understand the relationship the electrons away when electrons are particle well hmm well the same thing happens here there is Bayesian ways of thinking and frequentist and they are different they they all they sometimes become sort of the same in practice but they're Fazal way different and then in some practice they are not the same at all they give you a rather different answers and so it is very much like wave in particle duality and that is something you have to kind of get used to in the field can you define Beijing and frequentist yeah decision theory you can make I have a like I have a video that people could see it's called are you amazing or a frequentist and kind of help try to make it really clear it comes from decision theory so you know decision theory you talk about loss functions which are function of data X and parameter theta as well a function of two arguments okay now either one of those arguments is known you don't know the data uh priori it's random and the parameter is unknown all right so you have this function of two things you don't know when you're trying to say I want that function to be small I want small loss all right well what are you gonna do so you sort of say well I'm gonna average over these quantities or maximize over them or something so that you know I turned that uncertainty into something certain so you could look at the first argument an average over it or you could look at the second argument averaged over it that's Bayesian frequentist so the the frequentist says I'm gonna look at the X the data and I'm gonna take that as random and I've got average over the distribution so I take the expectation loss under X theta is held fixed alright that's called the risk and so it's looking at other all the data sets you could get all right and saying how well will a certain procedure do under all those data sets that's called a frequent as guarantee all right so I think it is very appropriate when like you're building a piece of software and you're shipping it out there and people reviews on all kinds of data sets you want to have a stamp a guarantee on it that as people run it on many many data sets that you never even thought about that ninety-five percent of time it will do the right thing perfectly reasonable the Bayesian perspective says well no I'm gonna look at the other argument at the loss function the theta part ok that's unknown and I'm uncertain about it so I could have my own personal probability for what it is you know how many tall people are there out there I'm trying to infer the average height of the population well I have an idea of roughly what the height is so I'm gonna over the the the theta so now that loss function as only now again one arguments gone now it's a function of X and that's what a Bayesian does is they say well let's just focus on a particular we got the data set we got we condition on that conditional on the X I say something about my loss that's a Bayesian approach to things and the Bayesian will argue that it's not relevant to look at all the other data sets you could have gotten and averaged over them the frequentist approach it's really only the data set you got all right and I do agree with that especially in situations where you're working with a scientist you can learn a lot about the domain and you really only focus on certain kinds of data and you gathered your data and you make inferences I don't agree with it though that it you know in the sense that there are needs for frequentist guarantees you're writing software people are using it out there you want to say something so these two things have to got to fight each other a little bit but they have to blend so long story short there's a set of ideas that are right in the middle that are called empirical Bayes and empirical Bayes sort of starts with the Bayesian framework it's it's kind of arguably philosophically more you know reasonable and kosher write down a bunch of the math that kind of flows from that and then realize there's a bunch of things you don't know because it's the real world and you don't know everything so you're uncertain about certain quantities at that point ask is there a reasonable way to plug in an estimate for those things okay and in some cases there's quite a reasonable thing to do to plug in there's a natural thing you can observe in the world that you can plug in and then do a little bit more mathematics and assure yourself it's really good my math are based on human expertise what's what it wouldn't go they're both going in the Bayesian framework allows you to put a lot of human expertise in but the math kind of guides you along that path and then kind of reassures at the end you could put that stamp of approval under certain assumptions this thing will work so Pratt you asked question was my favorite you know or was the most surprising nice idea so one that is more accessible as something called false discovery rate which is you know you're making not just one hypothesis test or making one decision you're making a whole bag of them and in that bag of decisions you look at the ones where you made a discovery you announced it something interesting it happened all right that's gonna be some subset of your big back in the ones you made a discovery which subset of those are bad there are false false discoveries you like the fraction of your false discoveries among discoveries to be small that's a different criterion that accuracy or precision or recall or sensitivity and specificity it's a different quantity those latter ones that are almost all of them have more of a frequentist flavor they say given the truth is that the null hypothesis is true here's what accuracy would get are given that the alternative is true here's what I would get it's kind of going forward from the state of nature to the data the Bayesian goes the other direction from the data back to the state of nature and that's actually what false discovery rate is it says given you made a discovery okay that's conditioned on your data what's the probability of the hypothesis it's going the other direction and so the classical frequency look at that so I can't know that there's some priors needed in that and the empirical Bayesian goes ahead and plows forward and starts writing down these formulas and realizes at some point some of those things can actually be estimated in a reasonable way no and so it's kind of it's a beautiful set of ideas so I this kind of line of arguments come out it's not certainly mine but it sort of came out from Robins around 1960 Brad Efron has written beautifully about this in various papers and books and and the FDR is you know been Yamini and Israel John's story did this Bayesian interpretation and so on so I've just absorbed these things over the years and find it a very healthy way to think about statistics you
Resume
Categories