web analytics

Repeat after me, “correlation…”

This goofy looking sod is Tyler Vigan and he’s studying for his doctorate at Harvard. But that doesn’t matter right now. He also runs a site called Spurious Correlations.

He’s written a little algorithm that compares shit tons of data sets and finds correlations. Really stupid pointless ones, for the most part (if his algorithm has found any likely meaningful ones, he doesn’t say). Like, there’s a 0.992558 correlation between the divorce rate in Maine and the US per capita consumption of margarine.

That’s lots of fun, and I invite you to browse his charts. Could come in handy next time you get into an argument with a green. But his bigger point is that computers are terrific at sifting and finding correlations, but they’re absolutely crap at weeding the meaningful ones from the silly ones. “Meaning” isn’t an easily quantifiable characteristic.

If I asked you to tell me the current population of Uruguay, I assume you don’t know. Thing is, if you don’t know, somehow you knew instantly that you don’t know. Many years ago, I read that this is something they haven’t worked out how to do build into computers: how to recognize instantly when they don’t have data, without sifting through all the data they DO have. I’ve been puzzling ever that ever since. Somehow, I think those problems are related.

p.s. Did you have any idea that seven hundred people died in 2009 by becoming entangled in their bedsheets?

May 27, 2014 — 9:39 pm
Comments: 21