Popular Posts

Caveat Emptor

The opinions expressed on this page are mine alone. Any similarities to the views of my employer are completely coincidental.

Wednesday, 28 May 2014

Party Pooper

Knowing a little about numbers and having a modicum of common sense tends to make you a party pooper. I woke up this morning to the sound of  Penny Young chief executive of NatCen Social Research being interviewed on the Today programme. Actually I think she did  a good job and if you listened carefully you would have heard her explain that whatever change there has been in people's willingness to admit racial prejudice has been going on for a long time - probably since 2001. 

But that story isn't going to make the headlines - or make an "impact" case study. The framing of the interview suggested that there was some genuinely new finding here and that certainly seems to be how the Guardian is selling it today: "New data from NatCen’s authoritative British Social Attitudes (BSA) survey, obtained by the Guardian, shows that after years of increasing tolerance, the percentage of people who describe themselves as prejudiced against those of other races has risen overall since 2001." Well, no not really. What's new is one new data point for 2013 which adds nothing  much to a data series that has long been in the public domain, and a bit of puff for volume 31 of the British Social Attitudes series.

Let's take a closer look at the numbers behind the story. To their credit NatCen have put them online  in a nice pdf . I hope they don't mind me borrowing one of their graphs (all credit to them, I've just corrected a small typo in the title).

I'll leave it to others to quibble about the question that is posed to respondents and simply assume that it has some claim to validity (if it hasn't then the whole story collapses anyway).

 Let's look first at the mauve line which  joins up the year on year data points, except it doesn't because it is clear from the data table that NatCen helpfully provide that there are no data for 1988, 1991, 1993, 1995 and 1997. Full marks to the Guardian for making that clear in their graph where they represent the missing years with a * on the x axis. I'll return to this below. 

My first impression is that there is quite a lot of year on year variation within the range of 38% and 25% affirming that they are at least a little prejudiced. A 13 percentage point spread in a social attitude is actually a fairly narrow band of variation given the magnitude of non-sampling errors that these sorts of investigations are prone to. It is entirely possible that nothing very much has happened since 1983. Consider the change between 2011 and 2012 this was, roughly 12 percentage points in one year! My inclination is to believe that racial prejudice is a fairly stubborn cognitive disposition rather than a febrile attitude and that leads me to regard such apparent differences  as implausible estimates of real  population aggregate level attitudinal change.

Clearly the NatCen team have similar concerns, hence the inclusion of the blue line for the 5 year moving average (which the Guardian omits in its report). Some smoothing of the data is obviously desirable. But a few things puzzle me. Why doesn't the 5 year moving average start in 1985? And why does it continue to 2013 (obviously we can't calculate it for 2013  (or 2012) because we haven't yet got observations for 2014 and 2015)? I'm going to hazard a guess that NatCen have got the moving average plot wrong by plotting points  at the 5th year of the average rather than the central year. This would make sense of the dip at the end of the blue line which mainly reflects the large (and suspect) difference between 2011 and 2012.

But something else worries me. The "natural" year on year variation can be quite large, yet this is already somewhat smoothed out in the first part of the series because of the missing years in which no data were collected. If data had been collected we probably would have seen much more of a saw tooth pattern in the 1990s The downward trend from 1992 looks quite convincing but how convincing would it have looked if every other year that line was jumping up and down to the same degree that it is in the second half of the series where the year on year data are continuous?  This is the kind of situation in which a good (multiple) imputation model would come in handy to simulate data for years in which it is missing (remember the point of imputation is to preserve the pattern of variability in the data). A subsidiary quibble is that the 5 year moving averages in the first part of the data can't be five year averages because of the missing years. In brief, the first half of the series is made much smoother than the second half of the series and that's a bit undesirable for a fair over time comparison. It would have been more honest, though less aesthetically pleasing, to have plotted points with confidence intervals around them rather than joined up lines.

And then there is item non-response. We aren't told anything about this, but my guess is that a reasonable number of people won't feel too comfortable about answering this question. Is the number of non-respondents stable over time or has it changed in any particular direction? Again this may be a situation where a fair representation of our knowledge probably, somewhat paradoxically, requires some data imputation.

So, what's my best guess at what has actually happened? I'd venture that these data give us some, not particularly strong, evidence to say that  (at the aggregate level) willingness to admit to prejudiced attitudes probably declined a little bit between the 1980s and the end of the 1990s and since the early years of the new century there may have been a slight increase in the tendency to admit prejudice.  However these changes, if changes they be, are within a fairly narrow range. Nothing particularly noteworthy has happened in the last four or five years and the 2013 BSA data add nothing new to the story.

But you can't make a case for impact, or get on the Today programme by saying that.


Anonymous said...

Just a tiny note that you can quite ordinarily calculate moving averages that only count preceding years (or future years for that), although for a time series like the one presented a moving average centred on the year probably would make more sense. My guess is that they used Excel where the default is preceding years only...

Colin said...

Thanks Anonymous. Indeed you can do it these ways & with all sorts of other bells & whistles like giving less weight to years that are further away from the target year.
Like you I don't see any grounds in this case for just paying attention to the past when the point is to smooth data that are noisy in the past, the present and the future. I'd really expect an organization like NatCen to do a bit better.