Monday, April 30, 2012

Predicting v forecasting

I heard an interesting hypothesis yesterday.  If you had access to precise data on everything that has ever happened before, you would be able to predict the future.  The short term future would be almost perfect prediction.  Because of stochastic uncertainty, the distant future would be harder, but still a lot better than it is now.  So who needs omniscience if we had good big data and good analytics behind it.

This has a lot of truth behind it, but I would like to restate it by switching two critical words.  I think you could "forecast" the future if you had access to precise "information" on everything that has ever happened. 

The difference is subtle but important.

Data: the high temperature in Boston on June 3, 1926 was 72 degrees Fahrenheit.

Info: the high temperature in Boston on June 3, 1926 was ranged from 69 in the north of the city to 73 in the west of the city.  This was 3 degrees above the average for the month and 4 degrees above the average for the other Junes in the 1920s.

Prediction: based on all of the past temperature data and accurate weather models, it will be 75 degrees on June 3, 2012.

Forecast: based on all of the past temperature data and accurate weather models, there is a 92% chance that it will be 75 degrees on June 3, 2012, with a 99% confidence interval from 72 to 79 degrees.

Why does this matter?  Well, I see pundits in all walks of life (particularly in politics but also in economics, weather, space science and other more quantitative domains) who make their claims as precise predictions when they could not possibly, even with perfect data, be so confident.  The public would be so much better served if we could acknowledge our uncertainty.

The problem is that the general public does not understand concepts like confidence intervals.  Not their fault - public schools don't teach them.  So we need either to change public school curricula (which I doubt - just look at the debates over intelligent design) or to develop a user friendly language to communicate uncertainty.

The greater challenge is that it would have to be short enough to fit in a tweet somehow because when people share the information with each other, that is a common path it will take.

Any ideas?