This is the second part of this post.
The question now arises as to whether there is anything else about Bartlett's data that suggests something untoward? Abernethy points out that the heaping is suspicious. I'm not entirely convinced it is, but let's run with a related idea. I would conjecture that when people "estimate" or make up data a real give away is that they tend to underestimate natural variability. That translates into a pretty straightforward prediction. If Bartlett's numbers aren't entirely kosher then the residual variation from his observations should be smaller than the residual variation from all the other interviewers.
There is a simple test for this. In effect we estimate a regression both for the level of income and for its variance. To keep things simple and feasible the only predictor I use for the latter is whether or not the observation is attributable to Bartlett. Bruce Western and Deirdre Broome have a nice paper on how to do this and more importantly some Stata code to get the job done. I use their two step maximum-likelihood method - in effect an iterated gamma regression for the variance. It's also possible to do this kind of thing by REML with Stata's Mixed procedure however I lost patience waiting for the full model to converge and gave up. With simpler less heavily parameterized models the estimates point in the same direction though.
So here are the results. Same model for the means as in Part 1, but now with an extra equation for the residual variance. If Bartlett was "estimating" we would expect the variance of his observations to be smaller than the variance of the observations generated by the other interviewers, in other words the coefficient for the Bartlett dummy should be negative. And this is indeed what we find (λ =-.14, t = 4.52).
Though this proves nothing definite, Abernethy's case seems to gain some strength.
At the beginning of Part 1 I mentioned a detective story, so for those who are really interested in that rather than statistical games with 80 year old data, here it is. Abernethy notes that "Little is known of G. E. Bartlett..." True, but after a bit of spade work I can make a conjecture as to who he was. The evidence is circumstantial, but taken together is, I think, quite convincing. If I'm right it also may explain how he was able to carry out his prodigious interviewing feat.
Using a well know genealogy site I was able to look through all the Bartletts in the 1929 London Electoral Register. It turns out that there is only one with the initials G. E. - George Edwin Bartlett. It's easy to find George Edwin in the 1911 Census. He is living at 32 Netherfold Road, Clapham, SW with his wife Sarah Louise, two children and a domestic servant. The most important piece of information is that he is an LCC Attendance Officer, in other words somebody employed to make sure that kids go to school. This is significant because we know that a lot of the NSLLL data was collected by school attendance officers and this is the thing that tips the balance of evidence towards our man.
George Edwin Bartlett was born in Brighton in 1865 the son of a plasterer and seems to have been an attendance officer at least from the final years of the 1890s. There is in fact a reference to him in a London School Board document of 1898. At the 1881 Census he is recorded as living with his parents in Clerkenwell and his occupation is given as Confectioners Errand Boy. By 1891 he was lodging in Islington and in the census he is recorded as G. E. bartlett with the occupation Confectioner's Assistant. In 1901 we know from the electoral register that he was living in Lavender Hill in one furnished room. The Census has him as a visitor at another address and tells us that he is a School Attendance Officer and a widower. By 1918 he has remarried and is living at 30 Union Grove, Clapham which is where we find him in 1929. He died in 1935 aged 70.
We don't actually know what Bartlett was doing in 1929, but at 64 it is not impossible that he had retired and therefore had a lot of time on his hands. As a School Attendance Officer he was in a sense a professional nosey parker and would have known the circumstances of many of the families on his patch pretty well. More than 30 years of working for the LCC in this capacity may well have acquainted him with a very large number of people. A retired man who was still reasonably vigorous could easily do 20 of the the rather minimalist interviews required of him during the day, especially if he was willing to take his data from the the most easily accessible source which would have been the wives of the men who were away at work. This of course raises the possibility that the estimating and rounding were not done by Bartlett and that he was merely faithfully recording what he was told by the wives about their husband's earnings.
At the end of the day the important question, as Abernethy strongly points out, is should we trust the NSLLL data? I think one thing is clear: no modern survey organization would let one interviewer collect 20% of the data. Even if the "bias" attributable to that individual is small in percentage terms - the sheer weight of their contribution might be important for some questions. It is of course, important though to make that judgement within the context of a particular question. To take a modern example, we know that the earnings data from the modern Labour Force Survey though biased are good enough for some broad brush stroke comparisons. However you would be very ill advised to use them for questions which rely on information about the tails of the distribution ie very high or very low earners.
As it happens for what I was interested in - the rank order of occupational average earnings - it really makes very little difference whether you include or exclude Bartlett's contribution. The Pearson correlation between the occupation averages (actually the shrunken level 2 residuals) with and without Bartlett is 0.98 and that is good enough for me.