Popular Posts

Caveat Emptor

The opinions expressed on this page are mine alone. Any similarities to the views of my employer are completely coincidental.

Friday 4 June 2010

Wikipedia and Prediction

Andrew Gelman has an interesting piece on his blog about the politics of Wikipedia edits. The scientific point that is at stake here is that prediction before you have peeked at the data (fitted a model) is a completely different thing to prediction after you have fitted a model and it is...err...essentially dishonest to pretend that these are one and the same thing or are of equivalent scientific value. Think about it this way. Fit your favourite model for a binary outcome - discriminant function,  logistic regression or whatever -  to a sample of data and define a decision rule to calculate how many you got in the right box. Now apply that same model with the same parameter values to a new set of data. You won't do anywhere near as well because first time round you capitalised on chance. It's multivariate analysis 101, or at least it should be.

No comments: