Popular Posts

Caveat Emptor

The opinions expressed on this page are mine alone. Any similarities to the views of my employer are completely coincidental.

Monday, 30 September 2013

Conversation with Edmund Chattoe-Brown about agent based models

A little while ago I linked to an interesting article by Edmund Chattoe-Brown about  agent based models. It stimulated me to think about the conditions under which new methodological tools are adopted. Edmund got in touch and posed a few tough questions. I think we both found the dialogue useful and enlightening. We also thought that there might be at least a few people who would be interested in our conversation. So here it is, my part is in italics:



Dear Colin,

 Thanks very much for your kind mention of my article in your blog. I think I pretty much agree with you (which is why I have been trying to publish more in "subject" journals than simulation ones.) However, I have a couple of thoughts/comments: 

1)       What would you say, in your area (however you define it) that a reasonable number of sociologists agree can't be done but needs to be?

That is one hell of a question! I'm not sure it's possible for me to imagine anything but a tiny group of sociologists agreeing about anything. If I had to point to an area where it seems to me that there is already an active interest in the sorts of things that simulation can do, I would point to the interface between family sociology and social demography. I'm thinking principally of the work of people like Rob Mare and our own Francesco Billari. The kinds of things they are interested in tend to be about how observed macro-level demographic patterns can emerge from multiple micro-level processes with lots of endogeneity (ie "mediating variables"). My guess is that there are probably lots of connections here with social stratification, mobility, homogamy etc. The basic problem with applications to the latter is that we are still struggling to accurately describe what the basic patterns are.

2)       Although it is difficult to generalize reliably from one’s own experience, at least a couple of papers I have published might fall under: "publication in mainstream journals of a few articles reporting realistic applications to substantive problems that enough sociologists care about that tell us something believable and important that we didn’t know already." My BJS article "overturned" a result from previous analysis which in turn was based on an extensive empirical literature going back to the early seventies. (Are strict churches strong? Not in a properly dynamic environment rather than a simplified partial equilibrium model.) My BJC article showed that, for poor response rates (as you would expect in criminal networks for example), unreliable qualitative "third party" data might outperform quantitative data in reconstructing social networks. (Something that both the rather formal SNA and "real" users of network ideas, police forces, might want to know.) Now, of course, one can always argue that the problems one tackles could be "more substantive" or "more popular" than they are (and these articles were considered good enough to go in reasonable journals at least) but I'm not sure the response these papers have received is really in proportion to their "substantiveness".

It seems to me you are doing the right thing and one has to live in hope that if the message is read and received by the right people and they realize that it will meet their needs then they will take it up. Build it and they will come, if they have any need for it. My guess is that road to Damascus conversions are rather rare. The basic ideas about log-linear modelling had been knocking about in the early 1960s, but it wasn't really until the late 1960s early 1970s that it got taken up in the bio-medical field and it was only in the late 1970s that it really was first introduced into the sociological mainstream. Undoubtedly a big impetus came from the dissemination of Goodman's (relatively) user friendly ECTA program. What is clear is that publication in methodological ghettos, SM, SM&R etc doesn't necessarily reach the right audience. Also one shouldn't underrate the role of arbitrage. There are some people (I won't name them!) who specialize in "translating" the technical innovations from one field into another. They are often very good at picking examples that will sell.

3) Ages ago, Robert Andersen asked me why simulation and statistical data had so much trouble "getting on". Like all good questions I have been thinking about it on and off ever since. The other day, when trying to calibrate a simulation of attitude change, I had a sudden (minor) epiphany. I needed to know (very roughly) how often people discuss political matters. One large reputable survey asks people to report ("never", "rarely", "sometimes", "often") and another, equally large and reputable reports ("daily", "several times a week", "weekly", "monthly".) Both are perfectly OK if you want to look at statistical association but one is completely useless if you want to model an underlying process. I knew I wasn't imagining that there is more to these issues that "just" data!

I agree and would go even further. I'm not even sure the use of vague quantifiers is that enlightening in  social survey applications without some serious attempt being made to understand how sub-populations understand the category labels. There are ways of modeling this, for example Gary King has been a pioneer,  but there are powerful vested interests in the data collection world that sometimes inhibit sensible innovation. And if a behaviour is well defined and the unit of time is sensibly chosen, then I can't see why  we shouldn't attempt to get frequencies. Of course, it isn't always that straightforward. What is a political discussion? Is it a discrete countable event  with somewhat obvious boundaries like say visiting your GP's surgery? Then there are questions of time units, seasonality etc. And of course well known memory effects like telescoping.

4) You talk about simulation models that should be "reporting realistic applications". Statistics has many virtues but can its uses be assessed as realistic (or otherwise?)  To take another example I've just read. Simple regression makes normality assumptions on data. Sometimes one can "fix" data that fails to be suitably normal by logging the variable. For sure that solves the technical problem but what do we conclude about an association between something and "log age" (or age squared come to that). Is there a danger that every method seems "realistic" to its advocates and what we need are standards of realism that don't presume the virtues of a particular method?

I was thinking of "realistic" here as meaning something like "a realistic degree of complexity". I think - but this is just the impression of a possibly naive but sympathetic observer - that "toy applications" don't do much to persuade enthusiastic take-up. Another dimension of this problem would be to say that demonstrating that a given outcome could be produced in a particular (simplified) way, is not the same as demonstrating that it has in fact been produced in this way. Of course this is not a problem that is any way unique to simulation. If I think about the aggregate distribution of votes between parties at an election what tends to impress me is that the reality is that this is the result of lots of different decision processes that are going on in different sub-populations. Therefore to propose a single "theory of voting" is absurd. Some people are cogitating about the relative advantages to them of voting one way or another, others are just doing what they've always done or what their parents have done, some are voting strategically and so on. All these processes are going on at the same time to produce the aggregate outcome. In some sense an adequate model would try to capture this (and perhaps produce as a by-product some sort of estimate of the proportions involved).

In terms of statistics, the way I look at it is that statistical models are just smoothing devices that permit the estimation of some quantities that you happen to be interested in. How you go on to explain whatever patterns are revealed is quite another matter (and simulation has a big role to play here). This is, of course, not the standard econometric justification - structural parameters and all that.  If economists have believable models and sensible identification techniques then they should estimate their structural parameters. My feeling is that in sociology we usually are a long way from this position, not because our statistics are
no good, but more usually because either we don't have a good (precise enough) theory and/or because we don't have data that permit identification of what we are really interested in. Problems of, course remain even when we have identification - consider the classic experimental design. We have a well defined target and we have identification via randomization. But unless we have a good theory we also have a black box! Well, that discussion will take us off in another direction...

To which Edmund replied in a follow up email:

On 1: _That's_ why we simulators find it hard to build "generally appealing" models ... :) Interestingly, I certainly have demography on my list, particularly having read with interest this http://ideas.repec.org/a/bla/popdev/v37y2011i1p89-123.html) and seeing how statisticians, modellers (and even qualis) could have a debate around what each method can contribute to this specific problem and why each thinks that the other "hasn't got it". (One needs to convene a small "fair/broad minded" group to discuss.)

On 3: This shades into "hard core" simulation methodology (which even some simulators conveniently neglect). The biggest critique I have tried to make (still unpublished interestingly) is that the average simulation paper still has nothing to do with data (even when it is virtually free) and the "field" has forgotten several old papers (particularly Hagerstrand 1965 on spatial diffusion) which have higher standards. (Bad news for "science".) To do something useful with statistical data, a simulation doesn't necessarily have to have it exactly right (because it doesn't feed straight into parameter estimation). Obviously a model that is based on the idea that people talk about politics once a year not once a day almost certainly won't produce good data but with all the other social processes represented it may be that 1.5 times per week versus 1.8 times per week won't alter the "basic qualitative behaviour"of the system (like turning points versus trend). And sensitivity analysis can tell you roughly where it is most important your data be accurate even before you "get at" any real data. 

On 4: There's a lot in this response.

1) "Realistic degree of complexity" is hard to nail down. Qualis usually say that simulations are excessively formalistic and simplistic. Quants (particularly economists) say they are needlessly complex and ad hoc. This makes me laugh in seminars.) One argument I am trying out is that we need to distinguish clearly between complexity we think exists and complexity we can show _matters_. (Again, we need less "method embedded" ways to justify the claim that a model is too simple or not simple enough: "Too complicated for my taste" is not at all the same as "Too complicated".) For sure, ethnographers can tell us all sorts of things about, for example, family size aspirations (paper above) but how do we tell that these "add up to" a particular pattern of family size (or that  one couldn't do just as well with four key variables). Conversely, quantitative researchers can't usually show that process x (differing socially reproduced family norms for family size?) _doesn't_ affect outcomes because if they don't already have the data it is a huge faff to collect and, in any event, some reasonable causes just aren't very quantifiable. As far as I know, ABM is the only way that you can say "OK, we are now going to give our agents brains - or social networks or whatever - and see how much difference it makes". This is the "simulations as thought experiments" idea. (One thing I find thought provoking is that I think that Social Network Analysis fairly convincing that "networks matter" and yet social statistics - which almost never includes network variables - also seems to achieve sensible things. So is one approach "wrong" about the fact that "networks matter" or is it that the methods just aren't geared up to adjudicate on this? Perhaps networks _do_ matter but social statistics can lose these effects in lower R2 and error terms in ways which don't "ring any alarm bells" with practitioners.) On that score, what do you think of the statistics in: http://cumc.columbia.edu/dept/healthandsociety/events/documents/Haynie.pdf (network effects and delinquency.) 

2) Mixtures of agents with different decision making processes (including "no decision") are exactly something that ABM is good at. (But you have to watch to make sure you don't get good fit by just adjusting the fractions till the graphs match!) 

All the best,

Edmund

No comments: