Wed October 10, 2012
'Signal' And 'Noise': Prediction As Art And Science
Originally published on Wed October 10, 2012 12:56 pm
No one has a crystal ball, but Nate Silver has perfected the art of prediction. In 2008, he accurately predicted the presidential winner of 49 of the 50 states, and the winners of all 35 Senate races. Before he focused on elections, Silver developed a sophisticated system for analyzing baseball players' potential and became a skilled poker player — which is how he made his living for a while.
Silver is a statistical analyst who's become something of a celebrity for his ability to plumb the meaning of opinion polls and other political data. He writes the New York Times blog FiveThirtyEight (named for the number of votes in the electoral college), which measures the meaning of political polls and predicts election outcomes. Silver's new book, The Signal and the Noise: Why So Many Predictions Fail — but Some Don't, is about the explosion of data available in the Internet age, and the challenge of sorting through it all and making thoughtful decisions.
"According to IBM, 90 percent of the data in the world was created within the last two years," Silver tells Fresh Air's Dave Davies. "So one problem is what we call the signal-to-noise ratio — the amount of meaningful information relative to the overall amount information is declining. We're not that much smarter than we used to be, even though we have much more information — and that means the real skill now is learning how to pick out the useful information from all this noise."
In The Signal and the Noise, Silver looks at analysts in many fields, from weather to the economy to national security, and concludes that those who succeed at spotting new trends and understanding the future are careful to acknowledge what they don't know — and examine the assumptions that underlie their thinking. Humility, he says, is critical.
In an election year, with numerous polls being taken on a daily basis, it's impossible to avoid predictions. That's why Silver finds it's necessary to look at all the data, but also consider it in the larger context of election history. "Sometimes, there's a tendency to take the result, the poll that is most out of line with the consensus because it tells the most dramatic headline," he says. "So I do urge caution about becoming attached or overly despondent about any one polling result."
On his forecasting of the 2008 presidential election
"I think the best thing that our model did in 2008 was that it detected very quickly after the financial crisis became manifest — meaning after Lehman Brothers went belly up — that McCain's goose was cooked — that he'd been a little bit behind before, and there was such a clear trend against him that McCain had very little chance in the race from that point onward. Interestingly enough, Obama had about the same lead pre-Lehman Brothers over McCain that he did before the debate against Romney, so you see in 2008 you had a narrow Obama advantage that broke and opened up toward him, whereas this cycle, you had a narrow advantage that collapsed to close to a tie, based on a news event going the other way."
On the bias of statistical models
"You can build a statistical model and that's all well and good, but if you're dealing with a new type of financial instrument, for example, or a new type of situation — then the choices you're making are pretty arbitrary in a lot of respects. You have a lot of choices when you're designing the model about what assumptions to make. For example, the rating agencies assume basically that housing prices would behave as they had over the previous two decades, during which time there had always been steady or rising housing prices. They could have looked, for example, at what happened during the Japanese real estate bubble, where you had a big crash and having diversified apartments all over Tokyo would not have helped you with that when everything was sinking — so they made some very optimistic assumptions that, not coincidentally, happened to help them give these securities better ratings and make more money."
On predictions of political pundits who appear on the TV program The McLaughlin Group
"These predictions were made over a four-year interval, so it's a big enough chunk of data to make some fair conclusions. We found that almost exactly half of the predictions were right, and almost exactly half were wrong, meaning if you'd just flipped a coin instead of listening to these guys and girls, you would have done just as well. And it wasn't really even the case that the easier predictions turned out to be right more. So, for example, on the eve of the 2008 election, if you go to Vegas you would have seen Obama with a 95 percent of winning. Our forecast model had him with a 98 percent chance. Three of the four panelists said it was too close to call, despite Obama being ahead in every single poll for months and months and the economy having collapsed. One of them, actually, Monica Crowley on Fox News, said she thought McCain would win by half a point. Of course, what happened the next week where she came back on the air and said, 'Oh, Obama's win had been inevitable, how could he lose with the economy' ... so there's not really a lot of accountability."
On the similarities between the invention of the printing press and the current digital age
"Basically, books were a luxury item before the printing press. ... They cost in the equivalent in today's dollars of about $25,000 to produce a manuscript. So unless you were a king or a bishop or something, you probably had never really read a book. And then, all of a sudden, the printing press reduced the cost of publishing a book by about 500 times, so everyone who was literate at least could read. But what happened is that people used those books as a way to proselytize and to spread heretical ideas, some of which are popular now but at the time caused a lot of conflict. The Protestant Reformation had a lot to do with the printing press, where Martin Luther's theses were reproduced about 250,000 times, and so you had widespread dissemination of ideas that hadn't circulated in the mainstream before. And, look, when something is on the page or the Internet, people tend to believe it a lot more, and so you had disagreements that wound up playing into greater sectarianism and even warfare."
TERRY GROSS, HOST:
This is FRESH AIR. I'm Terry Gross. Our guest, Nate Silver, is a statistical analyst who's become something of a celebrity for his ability to decipher the meaning of opinion polls and other political data. He writes a blog called FiveThirtyEight, named for the number of votes in the Electoral College. In 2008, he correctly called all 35 Senate races and the winners of presidential contests in 49 of 50 states.
Two years ago, his blog moved to the New York Times website, where it's closely followed by political junkies. Before he focused on elections, Silver developed a sophisticated system for analyzing baseball players' potential and became a skilled poker player, making his living from the game for a while.
Nate Silver has written a new book about the explosion of data available to us in the Internet age and the challenge of sorting through it and making wise decisions. Silver looks at analysts in many fields, from economics to weather to national security, and concludes those who succeed at spotting new trends and understanding the future are careful to acknowledge what they don't know and examine the assumptions that underlie their thinking. Humility, he says, is critical.
Silver's new book is called "The Signal and the Noise: Why So Many Predictions Fail, But Some Don't." He spoke yesterday with FRESH AIR contributor Dave Davies.
DAVE DAVIES, HOST:
Well, Nate Silver, welcome to FRESH AIR. Let's talk about where the polls are now. We all - it's widely viewed that Mitt Romney had a great performance in his first debate against President Obama. How has his performance been reflected in polls as you see them?
NATE SILVER: So we've seen a significant reversal in the polls since the Denver debate, where you went from Obama having maybe about a four-point lead, four or five points, to a case now where it's hard to tell who's ahead, and it's probably for the first time in many months that it wasn't clear who had the lead.
We think Obama most likely still has a narrow advantage. There's some evidence that Romney's best polling nights came immediately after the debate and that Obama has since regained some ground. But look, before Denver it looked as though this campaign would be close but where you had a very clear frontrunner and where every day that kind of came off the calendar was an opportunity lost for Mitt Romney.
He did himself an awful lot of good just in that one evening in Denver.
DAVIES: OK, there was a Pew research poll that showed him leading by four percent among likely voters, but when looked in the full context, it's awfully close, right?
SILVER: It's awfully close, and as I said, our - we put a betting line on this each day and give the probability of each candidate winning, and that's still - that calculation still has Obama as a modest favorite. But it's important to - kind of on the one hand you want to look at all the data. On the other hand, sometimes there's a tendency to take the result, the poll that is most out of line with the consensus, because it tells the most dramatic headline and make a big deal about that.
So I do urge some caution about becoming overly attached or overly despondent if you're a Democrat, I suppose, about any one polling result.
DAVIES: Right, and you have a, you know, a full discourse on your blog about how you want to look at all of the current polls. Now, of course forecasting the outcome of the popular vote is one thing. That shows it's very close. You also on your blog do a regular forecast of the likely winners of the Electoral College.
And if I read that right, as of the day we're speaking, Tuesday, it shows about a 75-percent chance that the president will win the Electoral College. Is that right?
SILVER: That's right, although I should stipulate that number has been going down every day since the debate. And so by the time this airs, it might be a bit lower. But one thing we don't know yet, we've seen national polls showing everything from a modest bounce for Romney to a pretty clear bounce in the Pew poll.
We don't have a sense yet for how that will work out in the Electoral College. It would be naive to assume that Obama won't be hurt in some of these states. People in Ohio, for example, aren't all that different than people elsewhere in the country. But there were points when it seemed like he was actually polling better, for example, in Ohio than in the national average.
So if he were able to maintain a lead there and in other key swing states like, say, Virginia and Iowa, then Romney would still have some work left to do.
DAVIES: Now, it's clear that your forecast isn't simply an averaging of polls, that you look at a lot of things, including historical trends. What else do you take into account - economic data, what else?
SILVER: Yeah, so we do look at a number of economic data series. For example, we look at the monthly jobs numbers - not the unemployment rate, technically, but what's called the non-farm payrolls number. And so on that basis, interestingly enough, we think Obama ought to be a slight favorite, where the jobs growth numbers - about 150,000 per month so far this year - look a lot like George W. Bush's in 2004, where he added 160,000 jobs per month.
So that's why we think the natural gravity of this race is a narrow Obama lead, by about two points. So the forecast was adjusting his numbers downward after his convention bounce, where to be ahead by five or six points with this economy seemed unlikely.
At the same time, he is a president who has an approval rating of about 50 percent. There are some brighter signs in the economy lately. And so we think he's a slight favorite on that basis, without looking at any polls.
DAVIES: You carefully analyze every poll. You look at economic data. Anything else, the weather?
SILVER: Not the weather, no. I mean, with any model you have to decide, well, how much can I add to it, where I add value and insight versus just kind of starting to plug in too much detail, where you kind of lose track of the big picture and lose the forest for the trees.
So we look at - the polls and economic data are two variables, and they explain an awful lot of what's going on, but yeah, there are always intangible factors that can matter. Foreign policy developments can matter. At the same time, I think that the news media probably gives more attention to some of those factors and this idea of momentum, for example, than might be really warranted based on the empirical evidence.
DAVIES: Now, you did forecasting in the 2008 national elections and were known for calling all 35 Senate races correctly and the outcome in the presidential contest in 49 out of 50 states. Can you give us an example of something you got right in 2008 that other people missed and tell us why?
SILVER: Well, I think the best thing that our model did in 2008 was that it detected very quickly after the financial crisis became manifest, meaning after Lehman Brothers went belly-up, that McCain's goose was cooked. He had been a little bit behind before, and there was such a clear trend toward him - or against McCain, rather, that McCain had very little chance in the race from that point onward.
Interestingly enough, Obama had about the same lead pre-Lehman Brothers over McCain than he did before the debate against Romney. So you see in 2008 you had a narrow Obama advantage that broke and opened up toward him, whereas this cycle you had a narrow advantage that collapsed to close to a tie based on a news event going the other way.
DAVIES: Now, you tell an interesting story in this book about the relationship between all of the information we're getting nowadays and our inability to use it effectively.
DAVIES: How much more data do we generate today than we used to?
SILVER: So according to IBM, 90 percent of the data in the world was created within the last two years. Now, not all that data is very useful, though. It might include YouTube videos of people's cats and text messages sent between teenage girls. So the question is - well, two questions, really. Number one...
DAVIES: You're saying that's not important?
SILVER: Well, it might be, but if you have a very literal-minded way of looking at what is data, what is information, that stuff's encoded and beamed into outer space and will survive forever, but there's not a lot of knowledge in there. So one problem is that what we would call the signal-to-noise ratio, the amount of meaningful information relative to the overall amount of information is declining.
We're not that much smarter than we used to be, even though we have much more information, and that means the real skill now is learning how to pick out the useful information from all this noise, as we call it. The second thing people can do, though, when they have so much choice of what to read or which news outlets to learn about the election from, for example, is that people are often very selective about the information that they choose to emphasize.
If you have three polls coming out every day, you can kind of handle those mentally and look at all of them. If you have 20, what happens is that people tend to pick three that they like, depending if they're a Romney or an Obama supporter, and ignore the others.
And so there are actually cases where for some types of people, more information can make their analysis worse.
DAVIES: You have an interesting story. I mean, you worked for several years at an economic consulting firm. And you've developed a great statistical model for predicting the performance of baseball players. Was baseball a love of yours? Is that why you got into this?
SILVER: You know, so I've always been a person who's been interested in data and statistics but always in a hands-on way rather than an abstract way. And so kind of being a little bit of a math nerd growing up and also having been six in 1984, when the Detroit Tigers won the World Series - I grew up in Michigan - I always found that connection very compelling in baseball, and it was kind of - it's always been a lifelong love of mine.
And baseball is such a great way to test out statistical ideas because you have so much information and so much data. So I created a statistical system called Pecota. Pecota is named after Bill Pecota, who is a Kansas City Royals infielder who was always a thorn in the side of the Tigers.
And what it does is it looks at history to guide it. So it can find a player from the past, where if you have, say, Robinson Cano of the Yankees, second baseman, it can find other players who are similar to him throughout history based on their physical characteristics, based on their statistics as well. So it's taking advantage of baseball's very rich history, where we have tens of thousands of players who have played in the major leagues.
So we can say and be more exacting and say let's look at different precedents for how players like this guy did in the past. That was the idea behind the system. We found that this did pretty well. It gave you a slightly better prediction than other systems did and it also gave you the fact that there were different possibilities for how a player might develop, where you might have a player who has the same statistics as, say, Mike Schmidt up through a certain point in his career but then developed a drug problem or something else and tailed off afterwards.
So instead of just saying, oh, we know exactly how many home runs this guy's going to hit next year, it would give you a range of different outcomes, best case and worst case and median case scenarios, so to speak.
DAVIES: You know, it's interesting. I mean one of the main points of the book is that you need the right mix of data analysis and insight. And you write that, you know, there was, you know, years past when the old curmudgeons of baseball, the scouts and coaches, sort of didn't realize what statistics might tell them about real performance, but that now that they've embraced that, they're actually better because they see things and know things that statisticians don't.
SILVER: Yeah, so once you know what you - once you know what you don't know, then you can kind of take more advantage of what you do know, potentially. So the scouts 10 years ago or so, when Michael Lewis wrote "Moneyball," this is about when I was kind of breaking in to the baseball industry, although I was an outsider, I should say, myself. And there really was a lot of tension. Everyone was afraid that the other person would steal their jobs, potentially, that a bunch of stat geeks were going to take down the scouts. And what teams realized is that, number one, in baseball we have a concrete way to measure success, which is how many games do you win or lose, and that reduces some of the poor incentives, where if you're making stupid decisions to be politically correct, then your team will lose and you'll get fired.
So we have natural ways of weeding that out of the system. But the best way you can actually make money, make a good investment in baseball, is by knowing how young players will perform. Based on the major league's contract rules, you don't have to pay a guy his full market value for the first six years of his career.
So if you get a guy who can perform like a superstar, like a Mike Trout or a Bryce Harper, that's worth millions and millions of dollars, given - compared to how much he would cost on the free market. So both stat heads and scouts provide that kind of data.
And the scouts now are sophisticated enough where they understand what the statistics say, that they can concentrate on areas where they add value. A lot of it is actually looking at the kids - because they are kids, they are 18 and 19-year-olds that are just drafted out of high school a lot of the time, looking at what they're like as human beings.
Do they have the willpower and the drive to succeed? How will they handle success and failure? It's a big adjustment when you've gone from being kind of king of your high school to riding a bus in Kannapolis or Boise or something from minor league game to minor league game. And so that stuff is almost where they add more of their value.
DAVIES: You spend a lot more time doing political analysis these days, but I'm sure you follow baseball. Do you have a probability of - can you predict the winner of the World Series?
SILVER: Well, in baseball, because the game does involve, over the near term, so much luck, you would never at this point in the playoffs want to say, oh, any one team is the odds-on favorite necessarily. You'd always want to put things in terms of probabilities.
But, you know, I mean the Yankees and Tigers look fairly decent so far. We'll see how the Nationals do. But no, I don't have as much time to enjoy baseball during election years as I do otherwise. It's one of my - the great pleasures of life denied. So although it would be good for FiveThirtyEight to have an election every year, I appreciate the fact that we have some odd-numbered years instead of even-numbered ones where I can kind of be a little bit more relaxed.
GROSS: We're listening to FRESH AIR contributor Dave Davies' interview with Nate Silver, who writes the FiveThirtyEight political blog for the New York Times and is the author of a new book, "The Signal and the Noise." As you can tell, Silver follows political polls almost around the clock, and since this interview was recorded yesterday, some new data favorable to Mitt Romney has come in. Silver's forecast now gives President Obama a 70 percent chance of winning the Electoral College and Romney a 30 percent chance. We'll hear more of the interview after a break. This is FRESH AIR.
(SOUNDBITE OF MUSIC)
GROSS: Let's get back to the interview FRESH AIR contributor Dave Davies recorded yesterday with Nate Silver. On Silver's New York Times blog FiveThirtyEight, he analyzes political polls and forecasts election outcomes. He's the author of the new book "The Signal and the Noise: Why So Many Predictions Fail, And Some Don't."
DAVIES: Now, we're in a situation today where we have not only far more data generated than ever before, but we also share it more easily because of the Internet. You compare this to the invention of the printing press. How are they similar?
SILVER: Well, so the printing press - basically books were a luxury item before the printing press, where they cost the equivalent in today's dollars of about $25,000 to produce a manuscript. So unless you were a king or a bishop or something, you probably had never really read a book.
And then all of a sudden, the printing press reduced the cost of publishing a book by about 500 times. And so everyone who was literate at least could read. But what happened is that people used those books as a way to proselytize and to spread heretical ideas, some of which are popular now but at the time caused a lot of conflict.
The Protestant Reformation had a lot to do with the printing press, where Martin Luther's Theses were reproduced about 250,000 times. And so you had widespread dissemination of ideas that hadn't circulated in the mainstream before. And look, when something's on the page or on the Internet, people tend to believe it a lot more.
And so you had disagreements that wound up playing into greater sectarianism and even warfare. The first 200 years or so after the printing press was invented were maybe the bloodiest epoch in the history of Western Europe.
DAVIES: And so are we in for a period of intensified conflict because we are sharing all this information?
SILVER: You know, I don't think we'll have another repeat of the Crusades, but we have seen, for example, in Congress that the amount of partisanship has increased quite a bit, and you can time that back kind of to the dawn of the computer age and frankly also the growth in cable news. Right about when cable news comes online you see this uptick in the graph, where fewer and fewer times are Democrats and Republicans voting together, and they're becoming more entrenched in their conservative or liberal views.
So we are moving toward a time where there's greater disagreement, instead of greater consensus, and that could have consequences. It could mean that we cycle back and forth between some very liberal presidents, some very conservative ones. Potentially there's a lot of turnover.
We've had periods like this in the country before, but it can be somewhat damaging. It can be damaging economically, frankly, as well, where you saw the debt ceiling debate really did have economic consequences. It kind of left that Washington bubble in ways that wound up hurting ordinary people.
DAVIES: Now, you write in the book that our challenge is to make sense of all this data, to put it in context. And if there's a glaring example of people having a lot of data and reading it wrong, it's - you know, Wall Street analysts and government regulators who failed to recognize the risks of what was going on in, you know, mortgage-backed securities and all of the derivatives that flowed from them before the economic crash. What was wrong with their approach?
SILVER: So this is an example - it's a two-part problem. Number one is that you have these financial instruments called mortgage-backed securities that were new and complicated and that people didn't have good models for how to handle. They were unprecedented in many respects.
The other problem is that the group that was charged with rating these, the rating agencies, were basically corrupt. They had a lot of skin in the game, where every time a new mortgage-backed security was created, they would get profits by rating it. That's how their business model works. And so they had an incentive to give favorable ratings so that they could rate more of these and make more of a commission.
So those two things, when the person who was supposed to be kind of the lifeguard wound up being in on the scheme, created a disaster for the global economy.
DAVIES: So their analysis of data was tainted by bias?
SILVER: Yeah, I think - look, one thing we talked about in the book is that you can build a statistical model and it's all well and good, but if you're dealing with a new type of financial instrument, for example, or a new type of situation, then the choices you're making are pretty arbitrary in a lot of respects. You have a lot of choices when you're designing the model about what assumptions to make.
For example, the rating agencies assumed, basically, that housing prices would behave as they had over the previous two decades, during which time there had always been steady or rising housing prices. They could have looked, for example, at what happened during the Japanese real estate bubble, where you had a big crash, and having diversified apartments all over Tokyo would not have helped you with that, when everything was sinking.
So they made some very optimistic assumptions that, not coincidently, happened to help them give these securities better ratings and make more money. But so I do throw up my hands some when people say, oh, you know, this is just what the data says, this is what the model says, when it's some human being or some set of human beings who designed the model, and if they make bad assumptions, they'll be reflected in the model.
The term in computer science is garbage in, garbage out.
GROSS: Nate Silver will continue his conversation with FRESH AIR contributor Dave Davies in the second half of the show. Silver writes the New York Times blog FiveThirtyEight, analyzing political polls and forecasting election outcomes. His new book is called "The Signal and the Noise." I'm Terry Gross, and this is FRESH AIR.
(SOUNDBITE OF MUSIC)
GROSS: This is FRESH AIR. I'm Terry Gross. Let's get back to the interview FRESH AIR contributor Dave Davies recorded yesterday with Nate Silver, who writes the 538 blog for The New York Times, which analyzes polling trends in the elections and forecasts outcomes. Silver is a statistician and has written a new book about the challenges of predicting trends and making wise decisions in an age in which more data is available than ever before in so many fields. His new book is called "The Signal and the Noise."
DAVIES: Right. Now in this book, about eight chapters in, you introduce us to an 18th-century minister named Thomas Bayes, who gave us, who had a real insight into how you look at data, statistical probability, all those years ago. What was his insight?
SILVER: So it's called Bayes' Theorem, and what Bayes' Theorem is is just a mathematical rule, a simple rule based on some addition and subtraction, multiplication of how to weigh new evidence given what you already know. And probably the mathematical abstraction of it is less important than just the practical application, which means that you shouldn't take each piece of evidence in isolate. You should say here's what I know, and how much should this change what I know? And ideally, in fact, you create a set of rules where, before you encounter new evidence, you have an idea of how you would weigh it, conditional upon different things being true.
One example we give in the book is - by the way, a lot of this Bayes' Theorem stuff is actually very intuitive; it's kind of how we think naturally a lot of the time. If you're in a relationship and you encounter, say, a strange pair of underwear in your dresser drawer, which could potentially be a sign of infidelity, the right way to look at that information is to consider it in context of the relationship you have already. It might be very damning if you've already had reasons to suspect that your partner is cheating. But if you've been in a relationship for 20 years or something, and this person is as honest as can be, then you wouldn't weigh it very heavily. And that's all it is. It's just saying take this information given the context that you already know, and it also encourages us to be explicit about what our state of knowledge is and what our biases might be, instead of pretending that we can just kind of look at everything like we are performing a perfect experiment on it but be sealed off from the real world implication.
DAVIES: All right. So before you draw a conclusion from somebody's economic model of a financial product, or a poll or a player's, you know, ball-to-strike ratio, you have to think carefully about what you think you know, and ask, does this make sense?
SILVER: And this is especially true in cases where we have limited information. So for example, in baseball you have so much data, where you have 750 players every year, and we've played Major League Baseball for about 150 years now. Where - in baseball, you almost can just run up your statistical model and take the results out and be satisfied that it's a pretty good representation of the truth. But in other fields, like presidential elections, for example, we've only had 16 elections since World War II - this is the 17th, I believe. And so, we don't really know all that much based on why people vote the way they do or exactly which economic indicators are most important. And so we have to make some educated guesses and those guesses are based on the historical evidence that we do have, but also based on theory, and what might even be called common sense, for example on having good structural - good structural variables in the model.
For example saying yeah, we probably should weigh the economy some, but we're going to look at a number of different economic data series, because we're not sure exactly what's most important to people. So I think people who are working, both in a casual way and in a technical way, in these fields don't always respect how important it is for how much quality information do you have. Whereas, baseball is at one end of the spectrum - where you have so much data you can be really data-driven - and something like politics or maybe a new type of financial instrument is at the other end, where we've never seen how mortgage-backed securities performed in the long term. So you can't be guided that much by the history alone, and you have to think sensibly about these things and be more of a Bayesian - as it would be called.
DAVIES: And you write there are some people that do this. And you cite the chief economist at Goldman Sachs...
DAVIES: ...a guy named Jan Hatzius? Am I pronouncing that right?
SILVER: Yeah. I think it's Jan Hatzius, but I...
DAVIES: ...who wrote a paper in November, 2007 that was strikingly accurate, predicting not just what would happen with mortgage-backed securities but their broader effect on the economy. What did he get that others didn't?
SILVER: So what he understood is he put two different puzzle pieces together. Number one, he realized that these instruments were very risky because the potential for housing price collapse, they weren't robust to the possibility of a housing price decline. But number two, he realized how large a fraction of the economy this had become. So for every actual dollar that Americans spent on a new home during the bubble, there were about $50 of side bets. These derivatives and securities were being traded very, very rapidly and represented an extremely large chunk of the financial sector.
And then when all of a sudden, trillions of dollars - what people think are profits - turn out to be a mirage, that creates a gigantic hole in the economy. And that takes a long time to repair, where you're not quite sure who owes what to whom and you have to untangle and unwind everything. And very often when you have recessions that are triggered by financial crises, they do perpetuate for years and years and years. And in some ways the American experience since 2007 is quite typical, where finally in 2012 you maybe see more resilient signs of a recovery in the job market, but still not great and it's taken an awful long time to get here.
DAVIES: You write that there are some fields that have managed to make better use of all this explosion of data than others. Weather forecasters, for example, is it that they understand this Bayesian principle, that you have to think carefully about what it all means?
SILVER: So the weather forecasters, I think, are Bayesian in the sense that they're frequently updating their forecasts. Another big Bayesian idea is that you always want to make the best forecast that you can at the time you're making the forecast. There's no glory in sticking to a forecast you made three days ago when the facts change. But the weather forecasters also do a couple of other things. One of the most important ones is thinking probabilistically. By the way, this is another component of Bayes' Theorem is that the number - you think of everything in terms of a probability. So there's a, you know, a sixty percent chance that Obama will be reelected. There's a two percent chance that we'll find life on Mars, and so forth. It is framed in terms of a probability, rather than a certainty, either way.
But weather forecasters really realized very early on, because of something called chaos theory, that they could never make perfect predictions. That as smart as they are, and as good as our computing power is, it's still very small as compared to all of the complexities of Mother Nature - how a cloud forms, or something like that, or a rainstorm develops. So they can get to better and better approximations, but when they tell you that there's a 30 percent chance of rain, people think oh, that's being wishy-washy. Will it rain or not? I want to know. That just reflects a more honest view of mankind's predictive powers. That we can get close to the truth and, kind of, give you the odds almost as a gambler might but we'll never know for sure. None of us are possessed of true clairvoyance. And, lo and behold, in having for many many years given those probabilistic estimates and thought about the weather that way, they've gotten a lot better.
They aren't perfect by any means, but now when you have a hurricane, for example, sitting in the Gulf of Mexico, they can predict the landfall position of that hurricane within about 100 miles, on average, three days in advance. Which means if you need to evacuate some key counties in Louisiana or Mississippi, you could do so. Twenty-five years ago they could only get within about 250 miles on average, which if you draw a radius, would take you everywhere from about Tallahassee, Florida to Houston, Texas. Not a very useful prediction at all.
DAVIES: And are we talking about the National Weather Service or commercial forecasters, everybody?
SILVER: So here is a slightly important distinction to make, where the data we've seen the most impressive gains is from the National Weather Service itself. And they're scientists, they're very rigorous about what they do. They should maybe be distinguished from the local TV meteorologist who succumbs to various types of news media biases. Where it's been shown, for example, that local meteorologists have what's called a wet bias, which means they put more rain in the forecast than there really is.
SILVER: So the idea that, oh, this guy is always talking about the next Snowpocalypse or the next storm of the century and it doesn't always seem to pan out, if you're watching the Weather Channel it's not a fair complaint. But if you're watching the goofball on Channel 7, then that's actually true. They deliberately make the forecast worse, because it gets better ratings and seems more dramatic to people.
DAVIES: Right. Now one crew that you think does a particularly poor job of using data and forecasting are political pundits.
DAVIES: And I love this. You did a study of nearly a thousand predictions by panelists on The McLaughlin Group. You found a quarter of the predictions were too vague to analyze really. But among those that you could test, what did you find?
SILVER: So the predictions - and these were made over a four-year interval, so it's a big enough chunk of data to make some fair conclusions. We found that almost exactly half of the predictions were right, and almost exactly half were wrong. Meaning if you'd just flipped a coin instead of listening to these guys and girls, you would have done just as well. And it wasn't really even the case that the easier predictions turned out to be right more. So, for example, on the eve of the 2008 election - you can go to Vegas, you would have seen Obama with a 95 percent chance of winning. Our forecast model had him with a 98 percent chance.
Three of the four panelists said it was too close to call, despite Obama being ahead in every single poll for months and months, and the economy having collapsed. One of them, actually, Monica Crowley of Fox News, said she thought McCain would win by half a point. Of course, what happened the next week, where she came back on the air and said, oh, Obama's win had been inevitable, how could he lose, with this economy and so forth. And so there's not really a lot of accountability for these predictions. They're mostly meant as entertainment and it's hoped that the viewer will forget about them just moments later.
(SOUNDBITE OF MUSIC)
DAVIES: If you're just joining us, we're speaking with Nate Silver. He analyzes statistics - particularly political polls, and he writes the 538 blog for The New York Times. He also has a new book about how we predict and use data. It's called "The Signal and the Noise: Why So Many Predictions Fail - but Some Don't."
There was a stretch in the early 2000s, I think, or mid-2000s, where you made your living playing poker?
DAVIES: I guess not surprising for a guy who understands probability like you do. Was it fun?
SILVER: It's fun when you win. It's not so fun when you lose.
SILVER: I'm not a world-class player but I was pretty good, if you understand probability and statistics and know something about game theory. And so for a couple of years, during the bubble, I made a lot of profits from the game. And it was fun until the end, where we saw the ecology of the game changing and those players began to either get better or bust out. And all of a sudden if you're - if you were the second best player at the table before and one of the fish, they're called in poker, busts out of the game, you might go from being a winning player to a losing one. Most poker players are losing players over the long run, because the house takes a share of the profits; the rest is redistributed around. So unless you're really at the top or you find a very weak game, then it's a hard way to make a living.
DAVIES: What interests me here, I mean, is that the book argues that, you know, we need not just to look hard at data, but we also need to bring our own kind of human understanding of events to it, and look at our assumptions and figure that into how we calculate probabilities. How does that work at a poker table?
SILVER: So in poker, this is a very good example of this Bayes' Theorem concept that I talked about, where you're always refining your estimate of what cards your opponent might have as you pick up more and more data. So at first, maybe even based on a person's physical characteristics, poker players, it's unpleasant, but they engage in a fair amount of what might be called stereotyping. So, for example, women are reputed to be less aggressive players than men. Older players are assumed to be more conservative than younger ones. Of course, some players can exploit that. It's a matter of logical deduction, but also one because in Texas Hold'em - the most popular form of the game - you never know exactly what your opponent holds until the cards are turned face up, or the cards stay facedown throughout the hand, that you're refining your estimate of does he have a good hand or a bad hand, and how should it change my actions accordingly?
DAVIES: But - being a careful observer of human behavior helps.
SILVER: Yeah, it helps a little bit. Although, there's much less of these, you know, people watch, say, the movie "Rounders," which is a great poker movie, but where John Malkovich manipulates an Oreo cookie whenever he's about to bluff in a certain way. Real poker is not like that as much.
It's more very mathematical, and there are a certain number of different combinations that a person can have of different hands, and that number gets reduced as they bet or fold or raise with their hand at different points in the betting hand. So it's more - poker is probably about 90 percent a mathematical game and 10 percent a psychological one.
And the psychology stuff is important, but remember, the good players can manipulate your expectations about how they might be behaving, where they might act weak if they have a strong hand, meaning look like they're very scared and tentative, or the reverse. Or if they're really smart, they'll know that you know that and so they'll go one level deeper and act strong when they have a strong hand because they think, they'll expect you to have them acting the opposite way.
DAVIES: You make the point in the book that you have to be aware of what your own bias is, how you look at data before you look at it, what assumptions you're bringing to your analysis.
DAVIES: And there are those who would say, well, you know, Nate Silver is pro-Democrat. He's pro-Obama. Do you have to look at your own kind of biases and do you think there's anything to that?
SILVER: Well, look. I think news coverage in general is affected by different types of biases, but you can go both ways. On the one hand you can give a result that reflects your biases, but there are also people who live in perpetual fear of being labeled as being biased and so maybe overcompensate as a result.
You know, technically speaking, I'm a political independent but I think it would be insane not to have any political views at all. But I do think that if you acknowledge those and say, look, here's where I'm coming from when I view the data, then that makes you less susceptible to actually manifesting that bias than if you pretend that you're this neutral arbiter who has no preferences whatsoever.
But what I do try and do during an election - we do this with the different models we have - is we designed a set of rules in the spring for analyzing the data. And really, the rules were designed in 2008, for the most part. So whatever the polls say, according to that program that we designed, is where the forecast will move. I'm not arbitrarily tweaking it upward and downward.
Sometimes I'll say, oh, I think maybe this model is missing something, but I don't have to make a lot of judgment calls on a day-to-day basis. We decide ahead of time how to weigh the evidence. And that, I think, makes me more on the prism of objectivity toward being objective than maybe someone else who is picking and choosing one poll out of a day when there were 20 to tell a preferred narrative.
So I think there are ways to protect yourself against being infected by bias, which you're always going to have some of, by number one, being fairly explicit with yourself about acknowledging it, maybe also with other people. And number two, setting up rules so that you won't have to make calls in difficult gray areas where your bias might influence your conclusion.
DAVIES: Right. So you call them like you see them. But just so we get this on the table, kind of what is your political perspective?
SILVER: I would describe myself as being somewhere between a liberal and a Libertarian. A fancy way of saying I'm probably fairly centrist on economic policy, but liberal or Libertarian on social policy.
DAVIES: Now, when you got into political forecasting, were you interested in policy and politics or was it more the challenge of prediction?
SILVER: Probably more the challenge of prediction. Although one reason why I did get more interested in politics was because at the time poker was my livelihood and Congress passed a law in 2006 that essentially banned Internet poker. So I wanted to see if those rascals would get voted out of office and that increased my interest in politics.
SILVER: But I more like the - I don't want to call it the game of the election but I like elections a lot more than I like politics themselves, where elections are an interesting thing to look at and look at the data. It's very much like a baseball season where it develops slowly, a little bit at a time, and then you have kind of a climactic ending. But I'm not a fan of the political culture per se. I'm happy that I live in New York in its own kind of bubble and not in D.C., which is a wonderful place but where you can't turn down the street without someone having some kind of tie to Congress or the White House or something else.
DAVIES: Well, Nate Silver, thanks so much for speaking with us.
SILVER: Thank you.
GROSS: Nate Silver spoke yesterday with FRESH AIR contributor Dave Davies. Silver analyses political polls and forecasts election outcomes on his New York Times blog FiveThirtyEight. His new book is called "The Signal and the Noise." You can read an excerpt on our website, freshair.npr.org. Transcript provided by NPR, Copyright NPR.