0條Plus

大數(shù)據(jù)的預(yù)測(cè)盲區(qū)

Kurt Wagner 2013-04-28

美國統(tǒng)計(jì)學(xué)家內(nèi)特?希爾是個(gè)數(shù)學(xué)天才,，長于利用大數(shù)據(jù)進(jìn)行預(yù)測(cè),。去年美國總統(tǒng)大選期間,，他非常準(zhǔn)確的預(yù)測(cè)了美國50個(gè)州的投票勝負(fù)。但他認(rèn)為,，大數(shù)據(jù)也不是萬能的,，有些領(lǐng)域的預(yù)測(cè)成功率就很低，比如地震,，比如股市,。

????預(yù)測(cè)大選的時(shí)候，把你的個(gè)人政治理念從工作中拋開會(huì)不會(huì)很困難,？

????無論我們干哪一個(gè)行業(yè),，都很難保持客觀。沒有人能左右現(xiàn)實(shí),，我們多多少少有些厭世的觀點(diǎn),。不過我認(rèn)為在體育上的訓(xùn)練對(duì)我是有幫助的，比如我雖然可以像小時(shí)候一樣做底特律猛虎隊(duì)（Detroit Tigers）的粉絲,，但是我仍然認(rèn)為洛杉磯天使隊(duì)（Los Angeles Angels）的麥克?特勞特才應(yīng)該當(dāng)選為去年的最有價(jià)值球員,。不過我認(rèn)為政治有一點(diǎn)不同，這個(gè)行業(yè)里的很多人不光有自己的觀點(diǎn),，且而還習(xí)慣于左右大眾的觀點(diǎn),。他們習(xí)慣性地認(rèn)為，他們可以創(chuàng)造他們自己的現(xiàn)實(shí),。這就是為什么我認(rèn)為有時(shí)候正確理解政治語言有困難,。

??? 有些人會(huì)想，如果我編出一個(gè)事實(shí),，或是編造一個(gè)民調(diào)數(shù)據(jù),，問題就解決了。而政治媒體圈里雖然有好人，但是也有人非常聽話,，而且樂于把政客在拉票活動(dòng)上說的鬼話傳播出來,。我認(rèn)為這就是問題所在。跟體育相比,，人們?cè)谡螁栴}上不習(xí)慣檢查一下現(xiàn)實(shí),。

????那么你是怎樣篩選信息，挑出那些“鬼話”的,？

????重點(diǎn)是忽略政治人物說的話,，堅(jiān)持使用能公開獲得的數(shù)據(jù)。記錄顯示,，大多數(shù)政治觀察家一般愛把政治人物的一次失態(tài)或一場(chǎng)辯論看得太重了——當(dāng)然總有例外,，不過大體上民意調(diào)查數(shù)據(jù)還是提供了一個(gè)較為可靠的標(biāo)準(zhǔn)。至于老百姓,，他們有自己的生活,，也不總是消費(fèi)政治新聞。他們衡量事物的方式非常復(fù)雜,，比如他們會(huì)考慮經(jīng)濟(jì)問題,，或者政府是不是讓我們卷入了一場(chǎng)愚蠢的戰(zhàn)爭(zhēng)，又或者政府是不是出了什么大丑聞,。這些因素才能幫助我們解釋最終是誰贏得了大選,，而不是政治評(píng)論家們關(guān)注的那些勁爆花邊。

????現(xiàn)在的數(shù)據(jù)比以前多了,。你在選擇數(shù)據(jù)的時(shí)候,，怎樣確定哪些數(shù)據(jù)才能正確回答你的問題？

????其中一點(diǎn)是,，你需要一個(gè)系統(tǒng),，而不是一次性的做法。我們?cè)?008年設(shè)計(jì)了一個(gè)模型,，在2012年進(jìn)行了升級(jí),，我們用它來對(duì)每次民意調(diào)查進(jìn)行分析。如果有些民調(diào)機(jī)構(gòu)以往的信用很好,，它在系統(tǒng)中就會(huì)占有更大的權(quán)重,。并不是說其它民調(diào)就會(huì)被忽視。不是說我們只盯著一份民調(diào),，然后伸出手指說：“這份民調(diào)很重要,，那份不重要?！被旧纤械碾y題和所有的決策過程都來自設(shè)計(jì)模型的過程,。根據(jù)理論,、實(shí)際和以往的經(jīng)驗(yàn)，怎樣設(shè)計(jì)一系列好的規(guī)則來處理這些信息,？這個(gè)問題最重要,，然后堅(jiān)持這些標(biāo)準(zhǔn)。我們?cè)诿磕?月推出這個(gè)模型后,，就不會(huì)再更改了,，除非模型里有bug，幸運(yùn)的是到現(xiàn)在還沒有發(fā)現(xiàn),。我們的基本原則始終不變,，然后你再在這個(gè)規(guī)矩方圓里分析數(shù)據(jù)。

????Is it hard to keep your own political beliefs separate from your work predicting elections?

????It's always hard for us to be objective in any walk of life. None of us has a monopoly on reality, we all have rather jaded points of view. I do think the sports training helps though, where I can be a Detroit Tigers fan as I am [and was] growing up, I still thought Mike Trout [Los Angeles Angels] should have won the MVP award last year. What I think differentiates politics a bit is that you have an industry full of people who not only have views but are [also] used to manipulating public opinion. They're used to thinking they can create their own reality. That's why I think you have such trouble on the uptake there.

????People think that, well, if I can spin a fact a certain way or spin polls a certain way, [the problem] goes away. When you have a political press where some people are very good, but some other people are very compliant and happy to pass along spin from the campaigns, I think that's the issue. People aren't used to getting a reality check in politics as much as in sports.

????So how are you able to sift through that information then to pick out the BS?

????The idea is to ignore what the politicians say and stick with publically available data. The record shows that in general, most political observers tend to overrate the importance of a gaffe or a debate -- there are always exceptions -- but in general the polls provide a pretty reliable benchmark. And the public, who have real lives and are not constantly consuming political news, are [sometimes] weighing things in a very sophisticated way where they're looking at things like the economy or are we involved in any stupid wars or major scandals from the administration. Those are the things that explain a lot about who wins the elections and not so much the petty stuff that the political pundits can focus on.

????There is more data now than ever before. How are you able to determine which information to pull in order to properly answer your question?

????Part of it is that you do need -- as Vegas might say -- you do need a system instead of an ad hoc way of doing it. So we have a model that we designed in 2008 that was updated for 2012 that was designed to account for every single poll. Some polls, if they're from a pollster that has a better track record, get more weight in the system. It doesn't mean that others are ignored. So it's not like we're just looking at a poll and sticking our fingers up in the air and saying, "Oh that poll is important, and that poll's not." Basically all the hard work and all the decision-making process comes from designing this model before the fact. Based on theory and practice and past experience, what are a good set of rules for processing this information? And then sticking to that. We don't make any alterations to the model once we launch it in June every year, unless there's a bug, which fortunately there hasn't been. But the principles are always the same, and then you have a disciplined way to analyze data in that context.

精選評(píng)論

熱讀文章

熱門視頻

500強(qiáng)行業(yè)分布