File TXT tidak ditemukan.
The Statistical Science of Polling I NOVA Now Podcast | PBS
1JnCINa4nB0 • 2020-11-02
Transcript preview
Open
Kind: captions Language: en listeners i cannot believe that election day is finally here that means we can just maybe relax and talk about something else for a change i don't know maybe we can do an episode about exoplanets or bird mating calls am i right not quite it's time for the imminent pre-game show [Music] soon the last of the votes will be cast and the last of the ballots will find their way to election offices all over the country no one's quite sure when we'll know the final outcome could be days weeks who really knows but in the waning moments of the 2020 campaign let's take one more look at the predictions because one of the big questions looming over this whole election season has really been a math question but here's what we do know our final nbc news wall street journal poll completed overnight shows joe biden leading president trump by 10 points i'm talking about polling there's an interesting gender gap here trump leads with men by 16 points a new polling tells us biden does have the opportunity is this race really up for grabs or not could the pollster professionals and data heads get it right or would their best predictions honed and polished since 2016 be way off polls are tools they are not perfect don't believe these polls the trump vote is always being undercounted makes me feel a little more uncertain about being confident in any prediction so today we're diving into the numbers to see what this election can tell us about the state of poland and how you can better understand any poll or survey you encounter in the future this is nova now which according to our survey with a sample size of one possible selection bias and a margin of error i don't want to calculate is the best podcast you'll ever hear i am that sample of one alok patel so to understand the polls for this election we need to spend some time on the previous presidential election 2016 because as you may recall one of the big stories of that election was how the pollsters were dead wrong well turns out it's a bit more complicated than that so at the national level it was only a one-point polling error we're going to get some help here from professor joshua dick director of the center for public opinion at the university of massachusetts lowell yeah well i mean if you need polls i guess i am your person right now and according to josh to understand the 2016 election you can't look at just the national polls you have to go more fine-grained and look at the state level 2016 was more accurate than 2012 we had a larger polling error in 2012. the problem was the national popular vote is not the electoral college and so in 2016 what happened was is that we had a systematic polling error in those three states that trump flipped where everyone was surprised you know wisconsin michigan and pennsylvania so it's a weird year that same year josh was doing state polls as well in new hampshire another swing state and he noticed that his polls were picking up something interesting we had two polls our beginning of the month poll in october showed clinton up by six points and by the end of the month our poll that we released right ahead of the election showed that trump had completely closed the gap and that the race was tied but when he looked at the polls in other states with similar electorates like wisconsin or pennsylvania he didn't see that same movement happening in the final weeks of the race i've often talked with people and wondered you know why did we see this coming in new hampshire here's his theory one of the differences was is we were waiting by education and there were some other folks who missed in new hampshire and elsewhere who were not weighting their data weighting their data now i want to break down exactly what this means and we'll get there because luckily i've got some help on the math front as well yes i'm dr talithia williams i am a statistician and mathematician and i am a professor at harvey mudd college out in sunny southern california so i'm hoping we can kind of run through a few basic statistical concepts that come into play when you know people are designing a poll when the layperson reads how a poll was designed can we start with sample size could you just walk us through why that's important what does it exactly mean is it just about the number is it about the demographics yeah so um if you think of sort of the entire population right so let's say everybody in the in the country in the united states a sample size is going to be a subset of that so yeah i want to take a smaller group of people a sample and i want this sample to be as representative of the population as possible because whatever information the sample gives me i want to try to generalize it to the entire population that means a larger sample is not necessarily a better sample right i mean typically larger is better but not if you're getting more of the same people political pollsters actually learned this lesson early on in the history of polling so there was an american magazine called literary digest this is joshua dick again that would send postcards to subscribers and collect huge amounts of data about who people were going to vote for in the upcoming election roosevelt and garner for the two highest offices they would get millions of responses and they had for a series of elections predicted the correct winner around what year were these that like the 1928 1932 gotcha okay and so in 1936 they published their great prediction which was that landon defeats fdr alfred landon yeah i hadn't heard of him either he went by al for short a republican governor from kansas who lived to be a hundred anyway he was up against fdr franklin delano roosevelt you've probably heard of him but back in the prediction was that landon would defeat roosevelt uh they were wrong they were off by i think something like 20 points they missed in the next four years the worst part is the emergency being over and uh it's a great example of how you can have the biggest sample in the world but if you have a biased sample you will be wrong [Music] remember the magazine sent out millions of these postcards the problem was who was receiving those postcards people who were on the telephone registry and people from the automobile registry which you know if you know anything about the time period people who had telephones and automobiles in 1936 were people who had money and this was the great depression and people who had switched and were voting for fdr and voted for the new deal were from the lower socioeconomic status and so folks who never got those postcards overwhelmingly overwhelmingly voted for fdr and he crushed landon in that election interesting side note in that same election another pollster named george gallup maybe you've heard that last name made his first presidential prediction his sample was a fraction the size of the literary digest poll they got 2.4 million postcards gallup surveyed just 50 000 people and yet because he was using a method to approximate randomization george gallup correctly predicted the outcome great problem solved when conducting a poll all you need is a smaller random sample of the population and you're set yeah i mean that's exactly what we're trying to do right i mean in in a perfect world we'd be able to pull a perfectly simple random sample from the population turns out not so simple if we were talking 30 years ago what everyone would do is they would do a telephone poll and they would probably use a method called random digit dialing where they would just randomly dial numbers and they would acquire their sample in that way and when you know 70 or 80 percent of people answered their telephones and we were able to get response rates that high it was a lot easier to approximate a simple random sample but those numbers dropped and if you look at telephone samples today the response rates are under 10 percent so increasingly professional posters like josh are switching to other methods like online polling 2018 we made it actually switch to acquiring our samples online now it's important to point out that the highest rated polls today are still the ones that involve random dialing of both landlines and cell phones but as josh said that method has become increasingly difficult to do because people don't pick up their phones hence the online polling the problem with that is you can't randomly contact people online in the same way that you can randomly contact people by phone and so what pollsters wind up with is a sample that's biased and their samples are almost always biased in the same way their samples are too old they're too white and they're too educated this now circles us back to the idea of waiting [Music] waiting i'm waiting [Music] as in w-e-i-g-h-t waiting so there's no sample that is not post-weighted i can't wait to hear what josh has to say about waiting or waiting you're post waiting you can then post weight your data wait it what did i tell you it's a lot and this is where i mean all the magic is in polling now [Music] one concept one term we keep hearing from pollsters is is this idea of waiting can you walk us through how that's actually done like what is the actual math involved in weighting a sample yeah so let's say you know maybe we sample 10 people once again our statistician dr telithia williams and eight of them were men and two of them were women most of us would say well like 50 of our population is female it doesn't represent you know the female population so i may say well i want to give more weight to these two women because they're underrepresented in the sample and i'm going to down weight these other eight men because they're over represented so sometimes people will try to give weight to areas where they have less representation so i can almost imagine in some of these polls you are weighting multiple different variables you might have not enough of a certain minority group maybe there's too many people who are have college degrees too many people from the south you know just all these different factors are you doing multiple weights people can so like depending on the type of pole yes you can do multiple weights and again the more you're doing the waiting and the more you're trying to um sort of make your sample accommodate the population the more that you're also introducing error and mistakes into that right so if you give more weight to these two women what if these two women are both republican like then you have like heavy weight on women who are republican but not women who have democratic views or independent views so you also have to be careful with waiting because not everyone in a population has the same view what all this means is that in any given poll there are a whole bunch of judgment calls made by the pollster about which variables to weight and how much to weight them by age race gender these are all pretty standard in political polling but waiting by something like education for example hasn't traditionally been standard in the polling industry partly because it wasn't considered as irrelevant to how a person votes which brings us back to 2016. that was a big part of where trump was winning votes was he was winning white voters without a college degree and he had made big inroads in that group and so you're going to miss unless you're waiting by that category so to recap the polls in michigan wisconsin and pennsylvania they're thought to have been so far off in 2016 because their samples under-represented voters without a college degree and the pollsters failed to wait their data to correct for that there may have been other factors as well such as phone poles that overwhelmingly favored landlines which again leads to an unrepresentative sample and i think that a lot of other pollsters uh you know we've seen a movement towards a lot more people are waiting by education this time around or waiting by party registration or 2016 vote choice in a variety of different models and that i think that that's probably likely to prevent the same sort of mistake from happening in this in this upcoming election [Music] no poll is perfect no matter how the sample is gathered or how the data is weighted there's always a possibility that it's going to be wrong and posters have a special way of acknowledging that fact and we'll acknowledge that after the [Music] break and welcome back we were just talking about why polling went off the rails in 2016. but if you look closely at a good poll you'll see that the pollsters are admitting they're fallible or better said they're quantifying their fallibility putting a number on their imperfection it's called the margin of error yeah i get excited about margin of error quite frankly i think everyone should we're back with our statistician dr telithia williams or everyone should at least know that they're there and they're often not reported like i don't see candidates on stage saying you know they have us leading the polls in florida with a margin of error of three percent right they don't report that so could you break down for us high school statistics 101 what exactly an error rate is and how you figure it out sometimes i feel like pollsters you know pull it out of thin air but i'm sure there's a mathematical reasoning totally totally so so generally speaking your margin of error is the way that you try to say you know i recognize that there's some error and some bias and some mistakes in the way that i might have gone about this calculation it also sort of accommodates the fact that if you take a completely different sample you get a slightly different number right so we could take multiple samples over and over and over and over again all of those numbers are would be different so you know imagine if we could take a hundred samples and we average them the margin of error sort of gives us the range that those samples would ideally fall into so that's what you're trying to capture like based on this one sample what might i have seen if i you know got a different random sample of people think of it this way hypothetically if you could pull a thousand random voters a truly random sample you would get one result if you pulled another thousand voters again at random you would get a different result depending on which voters you happen to pick now continue that process a bunch of times and what you'll have is a whole range of results roughly speaking that range is what the margin of error attempts to account for and the entire idea is that we're going to get some random error in any survey that we produce that's going to be captured by a margin of error at the end once again joshua dick from umass lowell but we're trying really hard through all of these other processes not to introduce non-random bias not to do something that introduces bias so that a choice that we made in the sampling process or weighting biases the survey so yeah this makes sense to me so i mean i'm looking at a couple equations right now i see like you know 1 over the square root of the sample size equals margin of error i'm not going to go into the proof of how this comes about but i'm guessing that you you have your sample you plug it into an algorithm and it gives you a margin of error and we're supposed to trust that you are and that's why i hate giving people equations without any kind of basis for them um you know i always want people to try to understand the concept and like what the concept means because when you see a margin of error on television even when i see it my first thought isn't it like oh that's one over you know i know how to calculate that you know my first thought is what goes into it and how did they accommodate the uncertainty within that margin of error and i think that's kind of what i want your viewers to think about like not like can you memorize this formula there's going to be a test at the end but really like what does it mean to say there's a margin of error it it all means uncertainty you know i want to try to capture the fact that i'm not certain about this number that i just told you and here's the extent to which i'm uncertain about it plus or minus whatever percentage three percent five percent um and often you can see overlap right and so you know if you add the margin of error to polls you can often see where there's an overlap between a candidate winning or losing and so that's why you it's hard to predict who's going to win or lose because the margin of error often has overlap in the winners and the losers you know i wish we did more to say that we could be wrong i don't think we say that enough and i think we can often use polls to lead people to believe that we know the answer before the event actually happens and so that's that's so far from the truth [Music] all right in the end polls are pointing to one thing outcome who's gonna win and by how much and at this point we don't know and we don't know when we're gonna know who knows maybe we'll all be wrong again from a mathematical point of view what's your personal prediction do you feel like the polls are going to be correct uh you so i do think that pollsters have learned from 2016. i think if anything showed us the importance of really getting a better representative sample and so you know based on what i've seen you know i'm hesitant to go with the biden victory but um because it still feels a little bit tight uh but it looks like polls are leaning toward uh biden as the victor and so we'll see where that lands but i think that's going to be my prediction [Music] this is going to be heated josh it sure is i literally cannot be i can't believe it's already here on the one hand i can't believe it's here on the other hand it feels like it's been [Music] [Applause] forever [Music] nova now is a production of gbh and prx it's produced by ian koss ari daniel johnson gonzalez isabel hibbard christina manan and sandra lopez monsalve julia court and chris schmidt are the co-executive producers of nova dante graves is director of audience development tsuki bennett is senior digital editor robin kasmer is science editor lorena lyon is digital production assistant emma uk's research intern and nina porzuki is managing producer podcasts at gbh our theme music is by the innovative dj kid koala and i'm a low patel we'll be back in two weeks when we hopefully will know who our president is maybe it's a possibility we won't hopefully we will but listen either way we'll be here you
Resume
Categories