516717 research-article2014

SJP0010.1177/1403494813516717Evidence on public policyO.P. Attanasio

Scandinavian Journal of Public Health, 2014; 42(Suppl 13): 28–40

Original article

Evidence on public policy: Methodological issues, political issues and examples

ORAZIO P ATTANASIO University College London and Institute for Fiscal Studies, London, UK

Abstract In this paper I discuss how evidence on public policy is generated and in particular the issue of evaluation of public policies. In economics, the issue of attribution and the identification of causal links has recently received considerable attention. Important methodological issues have been tackled and new techniques have been proposed and used. Randomized Control Trials have become some sort of gold standard. However, they are not exempt from problems and have important limitations: in some case they cannot be constructed and, more generally, problems of external validity and transferability of results can be important. The paper then moves on to discuss the political economy of policy evaluations for policy evaluations to have an impact for the conduct of actual policy , it is important that the demand for evaluation comes directly from the policy making process and is generated endogenously within it. In this sense it is important that the institutional design of policy making is such that policy making institutions are incentivized to use rigorous evaluation in the process of designing policies and allocating resources to alternative options. Economists are currently involved in the design and evaluation of many policies, including policies about health, nutrition and education. The role they can play in these fields is not completely obvious. The paper argues that their main contribution is in the modelling of how individual reacts to incentives (including those provided by public policies). Key Words: causal models, counterfactuals, controlled experiments, policy evaluation, structural models, identification, welfare policy

Introduction Over the last 15 years, in economics there has been a renewed interest in evaluation issues. On the one hand, a large methodological literature, mainly in econometrics, has developed and analysed a number of important issues. There is much more methodological clarity on the estimation of the impacts of policies in different contexts. On the other hand, from an applied point of view, in most analysis of public policies there has been a tendency to use rigorous methods that identify the causal effects of the policies under study. Much emphasis and effort has been put on understanding the nature of the variability in the data, that allows the causal pathways of specific interventions to be identified. As in other sciences, there has been a considerable emphasis on randomised control trials (RCTs),

especially in the field of development economics and, to an extent, in labour and public economics. Randomised trials, are not new in economics, as they have been used occasionally since the 1960s. However, in the last 15 years they have received a new impetus and have come to be considered as the gold standard of empirical work, especially in some fields. This is certainly a positive development, as it has forced economists and, in particular, applied economists to think hard about what variation can be legitimately used to identify causal effects. However, the risk is that researchers lose track of what I think is probably the main contributions economists can make: the identification of behavioural models that explain how individuals react to incentives, how they interact in markets and other social situations, and

Correspondence: Orazio P Attanasio, Department of Economics, University College London and Institute for Fiscal Studies, London, UK. E-mail: [email protected] © 2014 the Nordic Societies of Public Health DOI: 10.1177/1403494813516717

Downloaded from sjp.sagepub.com at MCMASTER UNIV LIBRARY on April 2, 2015

Evidence on public policy   29 how outcomes of interests (such as investment in human capital, school decisions, health behaviour) come about. In this paper, I will discuss these issues. I will start with a methodological discussion of the issues at stake. I will also briefly review some of the methods that have become standard in applied economics. I will then move on to discuss the political economy of policy evaluations: there, I stress that for policy evaluations to have an impact in the conduct of actual policy, it is important that the demand for evaluation comes directly from the policy-making process and is generated endogenously within it. In this sense, it is important that the institutional design of policy making is such that policy-making institutions are incentivised to use rigorous evaluation in the process of designing policies and allocating resources to alternative options. Economists are currently involved in the design and evaluation of many policies, including policies about health, nutrition, and education. The role they can play in these fields is not completely obvious, as will be discussed. One of the main themes of this paper is that useful evaluations of policies go beyond the identification and estimation of impacts and should be informative about the mechanisms that generate the observed impacts. For this to be possible, it is necessary to have the availability of rich data with good measurements. I also discuss the need for good measurement. Finally, I end the paper with an example that is illustrative of many of the issues that I have discussed here. Methodological issues Impact evaluations in social sciences are difficult and are typically affected by a number of difficult methodological issues.1 First and foremost is the so-called attribution issue. One would like to establish the extent to which observed outcomes of interest can be attributed to a policy. In other words, one would like to establish the causal impact of a policy. It is possible to observe the outcome of interest for a given individual targeted by a given policy. However, in order to establish the impact of that policy on that particular individual, it is necessary to know what the outcome of interest would have been for that individual in the absence of the policy. Analogously, it is possible to observe the outcome for a potential beneficiary of a programme who has not received it. To establish the (potential) impact of the programme, however, one would need to know what the outcome would have been if that individual had received the

programme. Obviously, it is not possible to observe the same individual in two different states. A first and apparently simple solution is that of measuring the outcomes of interest before and after the intervention. Unfortunately, however, such an approach delivers the impact only under very special circumstances. In particular, one has to assume that the only thing that changes between the two periods considered, that is relevant for the outcome of interest is the intervention being considered. Such an assumption, is clearly extremely unlikely as a plethora of events – some very idiosyncratic and specific to an individual, others common to many individuals – will affect the outcomes of interest. It will therefore be very difficult to attribute the observed differences in outcomes, before and after the intervention to the programme itself. Given this situation, it is pretty clear that it is virtually and logically impossible to identify and estimate the impact of interventions on specific individuals. On the other hand, such an enterprise is not particularly useful or interesting. From the point of view of a policy maker or even an academic, once one recognises that the impact of specific programmes are in all likelihood heterogeneous, the interesting exercise is to estimate some of the moments of the distribution of these impacts rather than the specific impact on a given individual. An obvious moment to consider is the mean. Other measures of location (such as the median or the mode) can also be interesting. And it is also probably important to establish the degree of impact of heterogeneity and consider, therefore, the variance of impacts, or other measures of heterogeneity such as the interquartile range or even the range of variation. Finally, it might also be important and interesting to establish how the impacts of an intervention vary by observable characteristics of intended and actual beneficiaries. For instance, one might want to know whether an intervention aimed at improving the quality of teaching by training teachers has larger impacts on the weakest or strongest pupils, or on girls or boys. The evaluation of the impact of a social or public policy is particularly hard because it depends on how individuals will react to the availability of the policy, starting with the issue of whether they are willing to be subject to the intervention. Among the eligible individuals that participate in an intervention, some will often react in ways that will be conducive to larger impacts than others. And even in the absence of a specific behavioural reaction, the impact can be different simply because some individuals are in a better position to use the intervention.

Downloaded from sjp.sagepub.com at MCMASTER UNIV LIBRARY on April 2, 2015

30    O.P. Attanasio The distribution of impacts of a given intervention can only be estimated if one is able to identify a comparison group for the individuals that receive the intervention over a certain period. Without this, what in the literature is referred as a “control” group, it would be impossible to establish any features of the impact distribution unless one has prior knowledge of how individuals react to a specific intervention and what are the impacts of that intervention. It is useful to think of the variables of interest as the outcome of what economists call a production function. Let’s define as the Yi the variable of interest for individual i:

Yi = fi ( Di , X i , ui )



(1)

where X i is a vector of variables, some of which might be controlled by the agent receiving the intervention, that are relevant for the outcome of interest. ui are random shocks that affect the realisation of the outcome of interest and Di is the treatment. The function fi specifies the relationship between the relevant inputs and the outcome of interest. Notice that I allow the function fi to be individual specific, so that the impacts of the intervention Di can be heterogeneous. Notice also that knowledge of the function fi (.) is not sufficient to establish the impact of the intervention represented by Di . In particular, if one or more of the variables X i are controlled by individual agents and react to the treatment, one needs to know how these changes happen in order to establish the overall impact of the policy on Yi . Finally, notice that I have not considered any dynamic effect explicitly in equation 1. It would be however easy to add such effects by considering the outcomes of interests and the other variables at different point in times. In such a situation, however, it would be necessary to specify the persistence of the X i and ui variables. An example will make these arguments clear. Suppose one is interested in the impact of a nutrition intervention on the health and nutritional status of vulnerable children attending a preschool programme. In this case, the variables X i represents other inputs possibly provided by the parents of the targeted children, such as the food that the children receive at home. It is well possible that parents might react by reducing the food provided at home, possibly to increase the rations given to the siblings of the targeted children. In such a way, the intervention leaks to a number of other children and might loose its effectiveness on the targeted children who may be in a particularly sensitive period of their development. In such a situation, even knowledge of how a specific nutritional supplementation ( Di ) affects the nutritional status of children will not be sufficient to

establish the actual impact of the intervention without knowledge of how the intervention changes X i . The expression in equation 1 is very general. In practice, equation 1 is often made more specific and simpler by adding some (restrictive) assumptions on the way in which the random (and unobserved to policy makers and researchers) influences ui , the variable of interest. In particular, it is often assumed that the unobservable variables affect the outcome of interest in an additive fashion: Yi = fi ( Di , X i ) + ui



(2)



In the rest of this section, I will make this assumption too. I will also assume that the random effects ui are independently distributed across the population and average out to zero. In such a situation, the presence of a control group (i.e. a group of individuals for whom the variable Di is set to 0) can be helpful to establish the impact of the intervention. In such a situation, it is natural to estimate the average impact of the intervention, which in the literature is sometimes referred to as average treatment effect (ATE), as the difference between the average Yi for the individuals for whom Di = 1 and the average Yi for the individuals for whom Di = 0:

ATE = E [Yi | Di = 1] − E [Yi | Di = 0]



(3)

where the operator E indicates the average and the conditioning indicates the set over which the average is computed.2 Note that the averages on the right-hand side of equation 3 are independent of the specific mechanism that generates the outcome variable Yi as specified in equations 1 or 2. Whether the expression in equation 3 can be interpreted as the causal impact of the intervention Di on the average level of Yi depends on what determines the assignment of the treatment Di . In most situations, unfortunately from the evaluation point of view, the assignment of a policy is related to the outcome variables or other variables (observed and/or unobserved) that are likely to be related to the outcome. This is true both if participation in the programme is a choice of eligible individuals and if the programme is assigned by policy makers. A youth who decides to participate to a job-training programme will probably differ in motivation and ability from a youth who decides not to participate. A government official will target limited resources for a nutrition intervention to the poorest areas under her jurisdiction. An implication of this is that the comparison in equation 3 does not yield the causal impact of the intervention. One can easily think of a situation in

Downloaded from sjp.sagepub.com at MCMASTER UNIV LIBRARY on April 2, 2015

Evidence on public policy   31 which such comparisons will yield upward or downward biases of the intervention impact (in the two examples above, the former will probably yield an overestimate and the latter an underestimate of the impact). Random assignment If the treatment was assigned at random among a sample of eligible individuals, then the comparison in equation 3 would indeed yield the causal impact of the intervention, under relatively weak conditions. The reason is intuitive and simple: a random assignment guarantees that the distribution of X i and ui , that is the determinants of Yi , is the same among treated and untreated samples and, if such samples are sufficiently large, influences unrelated to the intervention will average out and the only systematic difference between the two groups would be the assignment to the programme.3 The simplicity of the comparison between “treatment” and “control” samples in equation 3 lies behind the attractiveness of randomised controlled trials (RCTs) that have recently received renewed interest in economics and the literature on the evaluation of public policies. Before discussing potential limitations and problems of RCTs, I will first briefly mention alternatives to such an approach. This discussion is relevant because in many situations random assignment of intervention is not an option. In the literature there are several approaches to deal with the many situations in which random assignment of treatment does not exist. One possible classification of the methods that has been proposed is between those approaches that attempt to control for possible selection effects on the basis of observables and those that attempt to model explicitly the process of selection and rely on the existence of variables that mimic the random assignment. Matching methods belong to the first group, while instrumental variable methods belong to the second. I will discuss these approaches in turn. Matching methods Suppose an intervention is aimed at both men and women but, for some reason, policy makers reserve by design 75% of places in the programme for women or simply that, for some other reasons, more women receive the programme than men. Suppose also that the variable of interest is larger for women than for men and we consider a random sample of women and men, some of whom receive the programme and some not. Such a sample will deliver an upwardly biased estimate of the ATE as the treatment sample

will contain more women than men who, for reasons independent from the programme, have a higher level of the variable of interest. In such a simple and artificial example, an unbiased estimate of ATE is easily obtained conditioning on the gender of the recipients and estimating first the ATE for women and then the one for men and then averaging with weights 0.75 and 0.25 the two resulting estimates. The basic assumption behind this procedure is that conditioning on a specific observable characteristic, the assignment of the programme is essentially random and one can compare, within groups defined by the relevant characteristics, treated and untreated individuals. Methods that are based on such an assumption go under the label of “matching” method, as they try to match the treatment and control sample in terms of observable variables that are relevant for the outcome of interest (in this case the gender of the recipient). Formally, the assumption that is made by matching methods is that the treatment is orthogonal to the outcome of interest, conditional on observable variables X:

Yi ⊥ Di | X i

(4)

In the simple example given above, the relevant conditioning variable is a simple indicator. In reality, to be plausible, the conditioning variables in equation 4 are a possibly high-dimensional vector. An implication of such a situation is that conditioning in a flexible way for many X variables would require large samples in order to consider the possible combinations of different values for the various components of X. In some situations, some cells defined by specific combinations of Xs might be very small or even empty. In such a situation, it would be difficult to compute all the relevant components. One possible approach to this situation is the consideration of models that put additional structure onto the data. If one is willing to assume, for instance, that the relationship between the (conditional mean of) Y and X is linear, one can implement matching method by minimum least squares methods and estimate the following regression model:

Yi = γ Di + β ’ X i + ui



(5)

where γ is the impact of the treatment equation and the vector β contains the coefficients that determine the conditional mean of Y. The matching condition translates into the assumption that, conditional on X, the residual term ui is orthogonal to the treatment Di .

Downloaded from sjp.sagepub.com at MCMASTER UNIV LIBRARY on April 2, 2015

32    O.P. Attanasio The linearity assumption used in equation 5 may be considered too restrictive. Moreover, it might imply that one can extrapolate for some combinations of Xs where there is no data for both treatment and control. An important result that allows more flexibility and still yields a very simple estimate of the average treatment effect is the celebrated propensity score matching result of Rosenbaum and Rubin [1] who showed that, under equation 4 and some additional assumptions, one can obtain the ATE by first estimating the probability of receiving the treatment as a function of the X variables and then match treatment and control score on the basis of the so-called propensity score (i.e. the estimated probability of receiving the treatment). Such an approach limits the analysis to the treated observations for which it is possible to find control observations with a similar propensity score. Difference in difference methods Matching methods assume that all systematic difference between treatment and control groups are captured by observable variables. In the presence of longitudinal data or, more generally, data where treatment and control groups are observed before and after the treatment, unobservable differences that are systematicaly related to the treatment but that are time invariant can be differenced out by using first differences or other dynamic specifications of regressions similar to equation 5. Difference in difference methods make the assumption of common trends (i.e. all unobservable shocks that affect treatment and control observations during treatment periods are the same and systematic differences are constant over time. Therefore, considering both treatment and control observations before and after the treatment allows the identification of the causal effect of the treatment. This approach can be combined with matching methods, so as to relax the common trends assumption to unobservables. In other words, the assumption is formulated conditional on the observable variables considered in X. Instrumental variable and control function approaches An alternative to the matching approaches is an explicit model of the unobservable differences that determine the assignment of the treatment. Suppose that the outcome variable Y is determined by equation 5 and that there are one or more variable Z, which determine the assignment, (perhaps not precisely and perhaps in combination with other variables X) but

are not directly related to the outcome of interest Y or its unobservable determinants ui .

Di = ϑ Z i + θ ' X i + ν i

(6)

An example of such a variable is the outcome of a randomisation procedure used in RCTs, in which case (under full compliance) D is fully determined by such a variable. Even when such an ideal situation does not arise, one could still use the model in equation 6 to isolate the causal effect of D on Y. From an econometric point of view, one uses either instrumental variable techniques or a control function approach, which, in the linear case, coincide. The main idea is to model explicitly the process of selection and use the assumption that the variable Z affects treatment assignment but not the outcome of interest to identify the causal link between the treatment and the variables of interest. Such an approach is particularly attractive because it focuses on the mechanisms that determine the assignment. At the same time, the crucial assumption, which cannot be tested, is that the variable Z does not directly affect the outcome of interest. Instrumental variable techniques can also be used to deal with situations where there is some contamination of a RCT. For example, it is possible that, when implementing a RCT, the treatment is not delivered to subjects that are not particularly interested even if they are randomised into the treatment group while, on the other hand, it may be delivered to control subjects that are particularly motivated in obtaining it. In such a situation, as long as the original randomisation has predictive power for the assignment of the programme, one can use an instrumental variable approach where the randomisation variable is used as an instrument. The limitations of randomised controlled trials In many situations, the availability of an RCT is extremely useful because it allows the identification of the causal effect of a treatment on one or more outcome of interests and solves the attribution problems. However, there are many situations in which an exclusive reliance on RCTs can be rather reductive and allow only for a very narrow focus. In this subsection, I will discuss some of the limitations of RCTs. One first and obvious issue is that, in many situations, a RCT is simply not feasible. A RCT might be too expensive or unethical or politically infeasible, or simply impossible. The latter case is relevant for many macroeconomic policies: for example, experimentally evaluating institutional reforms, pension reforms, and tariff liberalisations is simply not

Downloaded from sjp.sagepub.com at MCMASTER UNIV LIBRARY on April 2, 2015

Evidence on public policy   33 conceivable. More importantly, even when a RCT is feasible, it might only yield limited answers to the questions posed by policy makers. And, in some plausible situations, it might give misleading answers. The latter case happens when the treatment, although assigned randomly, affects indirectly the outcome of interest for the control observations. This might happen, for instance, when a wage subsidy or even a training programme is evaluated with an experiment that is large relative to the size of a local labour market. In such a situation, as employers are more likely to hire treated individuals (either because of the subsidy or because of the training they receive), control individuals will face a stronger competition and thus, are detrimentally affected by the programme. A comparison of treatment and control individuals, therefore, will overestimate the impact of the programme because it will combine the potentially positive effect on the treatment with the negative effect on the control individuals. A similar situation occurs because of the presence of general equilibrium effects. When an effective and successful training programme, say, is scaled up, the increased supply of skills might reduce their price and, therefore, reduce the benefits to the treated individuals. Even in cases where there are no spillover or general equilibrium effects, inferences from RCTs can be of limited interest. Under suitable conditions (mainly on the lack of spillover effects on the control sample), a RCT will deliver the causal impact of a specific intervention in a specific context on the outcome of interest. However, it is not clear what one can infer from the outcome of a similar intervention in a slightly different context. Without the identification of the mechanisms that yield specific impacts, it is not possible to extrapolate the results that one obtains from the analysis of a RCT, and it becomes difficult to use such analysis for the design and improvement of policy interventions. This is not to say that RCTs are not useful. On the contrary, they can be very powerful tools that can be used not only to evaluate the impact of interventions but to help identify the behavioural responses that individuals have in the face of changing incentives. Once that is possible, one can then model explicitly such responses and use the evidence from experiments to extrapolate the evidence from a RCT to perhaps different programmes and contexts. The political economy of the evaluation of public policies The evaluation of public policies is difficult. In addition to the methodological and statistical issues

already discussed, an evaluator has to deal with a number of other problems, ranging from ethical concerns to political restrictions. The ethical issues are well known in the literature. They arise routinely in the conduct of medical research, and well-established protocols are followed both in the practice of evaluating new products and in the scientific literature behind new developments and products. The economic and public policy literature is increasingly aware of these issues. In conjunction with to the renewed methodological attention given to randomised trials, economists are increasingly concerned about following appropriate protocols and following well-established ethical standards. The issues to be considered are many, ranging from the ethical justifications in excluding a group from a “treatment” that might be beneficial, which can be particularly relevant in the case of treatments that involve the transfer of resources to potential beneficiaries, to the criteria to be chosen to determine the expansion and the reach of a programme over time, to the protocols to deal with specific situations that might be encountered in the field and might involve cases of crime, abuse, or the like. I will not discuss these issues further except to say that it is highly desirable that economists, social scientists and policy makers address directly the ethical issues relevant in the conduct of serious evaluations, especially as RCTs become more common and establish as a desirable standard in the conduct of public policies evaluations. There are some important trade offs (the exclusion of some beneficiaries might be the price to pay to establish the impact of a given intervention and determine its best design) that need to be discussed in a transparent and open fashion. The political issues, however, are subtle and formidable. It is to those issues that I would like to dedicate this section. Incentives of elected and unelected policy makers to evaluate the impacts of a programme and to establish its real effectiveness might be very weak, especially if the programme has already been rolled out and its success or perceived success affects the career prospects of the policy makers that designed the intervention under consideration and developed its implementation. Evaluating programmes that have already been rolled out can be difficult not only for the methodological difficulties discussed above, but also because such programmes might have developed political constituencies that might resent a direct scrutiny of their actions. And such constituencies can become entrenched and powerful in the decisionmaking process. And yet, rigorous evaluations can be a powerful tool in the political debate to promote or maintain a

Downloaded from sjp.sagepub.com at MCMASTER UNIV LIBRARY on April 2, 2015

34    O.P. Attanasio given intervention or abolish one that seems not to work. In the last section of this paper, I discuss some examples where the incentives of the policy makers become aligned with those of the evaluators and generate some very desirable outcomes. Unfortunately, however, these situations are more often the exception rather than the rule. It might therefore be important to think of the type of institutions and institutional arrangements that generate within a policy decision process, a demand for rigorous impact evaluation. In such a setting, it might be crucial that the role is played by a strong agent in the policy-making process, with sufficient budgetary muscle, who might explicitly use the results of impact evaluation to assign resources across different projects, ministries, and departments. Imagine a situation where different ministries compete for budget resources and are required, for instance by the treasury, to justify their requests on the basis of rigorously estimated impacts of the proposed policies. While ultimately, important decisions will always have a purely political or even ideological nature, such a system could generate a healthy demand for rigorous evaluation that policy makers could use in the early phases of the policy design process. In such an institutional environment, interventions and proposals are evaluated during their design through pilots and consequent experiments and the results of the evaluations are used to improve their design and roll out, and, crucially, to secure the funds for their implementation at scale. Evaluations during the early phases of a programme is simpler because of the ease with which comparison groups can be identified, because of the possibility of using the results to improve programme design, because of the lack of strong constituencies that might exercise political pressures, and also because policy makers might have, in such a phase, a genuine interest in establishing the effectiveness of a new idea and the best way to implement it. The contributions that economists can make In recent years, RCTs have become more and more common among economists, especially in some fields such as development economics. In parallel with this development, economists have started to look at the impact of many interventions in specific sectors, such as nutrition, education, and health, where they do not have a specific expertise. While the use of RCTs in economics is not new (going back to the 1960s), it is only over the last 15 years that RCTs have become extremely common. Economists have been learning the protocols associated with rigorous RCTs in other disciplines and they have been setting up a variety of

small and large experiments. Economists are now starting to systematise the use of these experiments and are learning lessons on the quality and standard of experiments that other disciplines have known and used for decades. A legitimate question in such a situation is what is the contribution that economist can make to the study of specific interventions whose design (and possibly evaluation) requires very specific knowledge in increasingly specialised fields. It is not clear whether economists have the absolute or even comparative advantage in analysing health, nutrition, or education interventions. Can their contribution be useful and what is it? One perception that is often heard from specialists in other disciplines that look for an interaction and collaboration with economists is that they can be useful to perform standard cost–benefit analysis and compare systematically the benefits involved with some specific interventions with their costs. In my opinion, however, the most important contribution that economists can give is an entirely different one. It derives from the fact that economists are trained to think about behavioural responses to incentives. They, therefore, can be extremely useful in thinking about how a specific intervention yields some impacts, starting with the changes that it puts in motion and the various aspects of individual behaviour that in return are affected. The identification of these behavioural responses to interventions and how individual responses interact to generate market responses are at the essence of what has already been discussed when mentioning the limited usefulness of RCTs that are simply used to determine the impact of an intervention. If one wants to go beyond that, which I argue is crucial for the proper design of policy, one needs to understand how individuals react and why. In other words, one needs to model individual behaviour. And this is where economists might have a comparative advantage. Quantitative models of individual behaviour obviously have limitations. Models are only schematic and somewhat narrow interpretations of reality that exclude some aspects of behaviour that might be important. And yet they are useful because they allow researchers to identify specific aspects of behaviour and regularities that would not otherwise be obvious and might inform the reaction to policies. The price that one pays to construct models that are tractable is expressed in the assumptions that are made to arrive to precise quantitative statements. As a consequence, the predictions of a model depend on the type of assumptions that one is forced to make. It is therefore important to keep in mind that the conclusions drawn are valid only under certain circumstances.

Downloaded from sjp.sagepub.com at MCMASTER UNIV LIBRARY on April 2, 2015

Evidence on public policy   35 However, without a model of individual behaviour, it is impossible to make inferences and extrapolate the empirical evidence that one obtains from RCTs or observational studies. Within this framework, a RCT may take an additional and important role. A well-designed RCT, accompanied by appropriate measurement, can be used to identify and estimate richer behavioural models that impose less restrictive assumptions on the way that individuals behave. Within this approach, the interactions and collaborations that economist have with other disciplines can play an important role. In a standard economic model, individual agents (who can be households, individuals, or firms) make choices to maximise some sort of function (which can be utility or profit) subject to a number of restrictions. These restrictions include technological or biological factors, the nature of markets and trading possibilities that individuals have access to, information and so on. For instance, in modelling parental investment in their childrens’ development, an economic model will specify an objective function that parents might have that will depend on their child’s development outcomes (from health to nutritional status to cognitive development) as well as other factors, (such as consumption or leisure) that might constitute an alternative use of (time and monetary) resources to investment. Parental decisions are made taking into account the prices (including wages) faced by individuals, the availability of credit, and/or other mechanisms to move resources over time (such as interpersonal transfers), their perception between specific investments, and child development. Children development outcomes will then depend on individual choices, on the actual (rather than perceived) relationship between investment and development, on genetic endowments and on a variety of environmental factors others than parental investments to which parents might be reacting. It is in the specification of the details of this model that the collaboration between economists and other researchers (psychologists, nutrition specialists, education specialists, doctors, and so on) can be extremely fruitful. For example, knowledge of human psychology is important in determining the type of objective function used in the economic model, and the effects of different type on nutritional inputs on physical growth or even brain development is equally crucial. The need for measurement If a researcher, running a RCT to study a specific intervention, is only (and narrowly) interested in determining its impact on a specific outcome, then

the temptation is that of simply measuring the outcome of interest among the subjects of the experiment. However, if the interest lies in the behavioural mechanisms that generate that impact, one needs to gather information on many more variables. The construction, identification, and estimation of a model of individual behaviour requires information on the factors and variables that determine individual choices. Of course, it would be foolish and naïve to assume that it is possible to obtain complete information on all such factors. Humans and their choices are too complex for being completely explained by a quantitative model. It is for that reason that a good quantitative model will contain what econometricians call “unobserved heterogeneity”. A good model, however, will take into account the presence of these unobservables so that it will be possible to “average” them out and identify the relevant averages. Good measurement has a crucial role to play in this approach. On the one hand, measuring different factors that inform choices (over and above the intervention) can improve the precision of the estimates and, in some cases in which the allocation of the intervention is not random, allow for the unbiased estimate of its impact. More importantly, on the other hand, measurement of a wide variety of variables and factors that affect individual choices might allow for the identification and estimation of rich behavioural models that can be used, as already discussed, to inform and improve the design of public policies. Measurement is hard. All data will be affected, in some degree, by measurement errors. Survey questions are often very restrictive and the answers obtained will depend on the way questions are framed. Anchoring effects, recall biases, and framing effects are all important issues that affect data quality. However, good and rich data constitutes an invaluable resource to assess public policies, model individual choices, and estimate models of individual behaviour. There is great scope for research into questionnaire design. Innovative measurement tools can yield great returns if they allow researchers to capture aspects of individual behaviour that interact with individual reactions to intervention. Comprehensive surveys, including measurement on a variety of factors, including novel measures of individual tastes and attitudes, can be extremely useful. Many advances have been made in recent years. Only a few decades ago, it was thought that accurate measures of household financial wealth were almost impossible to obtain in standard surveys. The development of innovative survey techniques, such as the so-called unfolding brackets, have allowed for the collection of much more precise and better quality

Downloaded from sjp.sagepub.com at MCMASTER UNIV LIBRARY on April 2, 2015

36    O.P. Attanasio measures of household financial wealth. More recently, a new strand of the literature has been looking at surveys aimed at measuring subjective expectations and subjective perception of risk. This type of data has the potential to allow for the estimation of much richer models that avoid some strong assumptions (such as rational expectations). Many other interesting ideas, such as the use of experimental games to measure some aspects of individual tastes, are being tested and developed. The combination and integration of these innovative measurement tools in survey is potentially very fruitful. An example In this concluding section, I would like to illustrate some of the points that I have made with some concrete examples. I will use the evaluation of specific policies in which I have been involved, as I know these cases better. The first example, the evaluation of PROGRESA, one of the first conditional cash transfer programmes in Latin America, illustrates the point that in addition to establishing the impact of a the programme, a RCT can be used to aid/help in the design of policies. The PROGRESA example also illustrates the role that rigorous evaluations can play in the political economy of social interventions. In the second subsection, I will discuss some other examples as cautionary tales on the role that evaluations actually, play in the policy debate without entering in to the details of the specific evaluations. Evaluating PROGRESA in Mexico In 1997, the Mexican government launched a new welfare programme to combat poverty in rural areas. The programme, that was born from the consolidation and rationalisation of many pre-existing programmes, introduced some important novelties that made it one of the most successful interventions in Latin America and, subsequently, in many other developing countries. PROGRESA, as the programme was originally called, was one of the first of the so-called conditional cash transfer (CCT) programmes that have garnered much attention in recent years. While it is not clear whether PROGRESA was the first CCT to be implemented on a large scale, as more or less at the same time the Brazilian government was developing a similar intervention (Bolsa Familia), it is probably the best known and the most celebrated. One of the reasons for its fame, as I will argue, has been the rigorous evaluation that was implemented by the Mexican government from the inception of the programme.

When the government designed PROGRESA, it decided to exploit the phase of expansion of the programme to design a rigorous evaluation. The programme was initially targeted to about 10,000 marginalised communities in rural areas and it was established that it would take about 2 years to reach all these communities and to enrol all the beneficiaries the programme. This expansion phase provided the opportunity to design a rigorous evaluation overcoming the obvious ethical problems connected with the presence of a control group. In 1997, the government identified 506 targeted communities in seven different states that would be part of the evaluation sample and randomised 320 of them for immediate implementation of the programme. The remaining 186 were put at “the end of the queue”, so that the programme reached them only at the end of 1999. During this period of time, a massive effort of data collection was implemented in all 506 communities, providing researchers the possibility of evaluating the short-term impacts of the intervention. It should be stressed that, regardless of the presence of the evaluation, some communities would have started the programme at the end of 1999. What the evaluation design did was to exploit this logistic restriction and, by choosing randomly which communities would start later, guaranteed the possibility of the evaluation. The PROGRESA evaluation design is particularly attractive for many reasons, including the fact that the randomisation was done across villages rather than across individuals. I will discuss this and other issues after providing some details on the nature of the programme. Details of the programme.  As mentioned above, PROGRESA was one of the first CCTs. It consisted of two main components: the nutrition and health component and the education component. Nutrition was targeted to poor rural families with children aged 0–6. It consisted of a subsidy, which did not depend on the number of children and was supposed to be ear marked for nutrition purposes. For households with children below the age of 2, there was also a small component in kind, which constituted of a nutritional supplement. The education component was targeted to households with children aged 9–18 and consisted of grants to attend grades 3–9. Grants increased with grade (and substantially so after grade 6, which marks the end of primary school) and were higher for girls than for boys. It was conditioned on school enrolment and attendance of at least 85% of the classes. Attendance was certified by the school authorities. Children who repeated a grade could do

Downloaded from sjp.sagepub.com at MCMASTER UNIV LIBRARY on April 2, 2015

Evidence on public policy   37 so and qualify for the grant, but they would lose it upon another repetition. Payments were bimonthly and, for households that would comply with the conditions, could be substantial, albeit capped. For the average beneficiary household, the PROGRESA grants constituted about 25% of their income. The conditions imposed on the households were substantial. Mothers were supposed to attend a number of courses and meetings that could be as frequent as weekly. They consisted of classes on nutrition, hygiene, contraception, health, and so on. In addition, mothers were supposed to bring their young children and periodic growth and development check ups. The conditions for the education grants were school attendance and enrolment. Interestingly, failure on the part of the mother to participate to the classes and other programme activities would disqualify the household completely. However, failure on the part of the children to attend school would only have consequences on the education grant. An important feature of the programme, shared by many other CCTs in Latin America, is the fact that the grant was given to the mothers, rather than to the fathers. This choice was made explicitly with the purpose of improving the position of women in rural Mexico. Targeting of the programme in its first phase was done firstly at the geographic level. Localities that were “marginalised” according to a number of indicators were identified in 1997 and targeted by the programme. Localities that were “too” poor and marginalised were excluded if the absence of schools and health centres would not allow the beneficiaries to comply with the conditions imposed by the programme; these communities were eventually targeted by a different programme. After a community was targeted, the programme would conduct a census survey (ENCASEH) in which a number of indicators were collected on all the households present in the community. Some of the variables on which information was collected were used to compute a wealth index that was then used to target the programme at the individual household level. A cut-off point was identified in August 1997 and households that scored below that level would qualify for the programme. Following this procedure, about 55% of the households in the targeted communities qualified for the programme. As the government decided to increase the coverage of the programme, the cut off was increased such that an additional 22% of households qualified in March 1998. The programme started its operation in May 1998.

Evaluation design, the evaluation surveys, and results. As aforementioned, the PROGRESA evaluation was based on a design that defined a control group by randomly assigning 186 of 506 communities for a late start of the programme operation. After a first data collection in March 1998 (in which the evaluation communities complemented the ENCASEH survey collected in August 1997) and the start of the programme in June 1998, there were subsequent data collections in October 1998, March 1999, November 1999, and April 2000. After that, there were two data collections: one in 2003 and one in 2007. It is important to note that, in terms of the impact evaluation, only the surveys up to November 1999 could be used, as the programme expanded to all communities in the sample after that date. The surveys conducted in the 506 communities in the evaluation sample were quite comprehensive. First of all, the survey included every household in the communities. Second, they collected information on a large variety of variables, ranging from detailed food consumption to expenditure on nonfood items, labour supply, school enrolment and progression, demographic characteristics, intrapersonal transfers, and so on. For a subsample of children, anthropometric measures were also collected. These were crucial in establishing the effects of the programme on the nutritional status of children. The first few waves of the surveys were conducted by the Mexican government. However, starting with the October 1998, the surveys and the evaluation was contracted to IFPRI, a research institute based in Washington, DC. This change was important both because it improved the quality of the data collection and more generally of the design of the evaluation but, above all, because it made the evaluation truly independent from the administration of the programme and, therefore, much more transparent and credible. After 2000, the evaluation was taken over and managed by the Mexican National Institute for Public Health (INSP), which is a government agency, but has a remarkable degree of independence. The results of the impact evaluation have been amply described and discussed in the economics and policy literature. Remarkably, in addition to the evaluation reports produced by IFPRI and subsequently by the INSP (and widely available on the relevant web pages), the evaluation produced a flurry of papers that found their way in high-quality academic journals. A number of papers, authored by some of the academics who acted as advisors in the evaluation, discussed the impacts of the programme. Subsequently, the richness of the data allowed for the use of the evaluation data for a variety of research

Downloaded from sjp.sagepub.com at MCMASTER UNIV LIBRARY on April 2, 2015

38    O.P. Attanasio projects, not necessarily directly linked to the impacts of the programme. The impact evaluation indicated some important successes of the programme. PROGRESA seemed to have had an impact on the nutritional and health status of young children, as measured by their height as well as on the occurrence of various illnesses and consumption and on school attendance. For the latter, the evidence indicated that most of the effect was concentrated among children of secondary school age, perhaps not surprisingly, given that enrolment rates in primary school were already very high in rural Mexico (above 90%). The evaluation was a remarkable success in many dimensions. First, it showed how it is possible to evaluate rigorously a major programme, implemented at scale and to overcome some of the ethical and political issues that such evaluations pose. Second, the evaluation and its results were instrumental to the survival and the expansion of the programme. In Mexico, there had been a long tradition of programmes aimed at reducing poverty in rural areas. These programmes, which were never evaluated, would be systematically changed when a new administration took charge. In 2000, when the administration of President Fox took charge, in the midst of major changes because, for the first time in 70 years, the party that had held power (the PRI) lost the elections, the programme survived and was actually expanded. The only change introduced by the new administration was a change in name: PROGRESA became Oportunidades. While conditional cash transfers are without doubt very popular and politically rentable programmes, there is general consensus that the availability of strong and rigorous evidence, produced by external evaluators in showing the impacts and the effectiveness of the programme, played an important role in its survival. Finally, it should also be mentioned that the Mexican government decided to make the evaluation data public and available to researchers. This fact has stimulated the production of many interesting research studies well beyond those that simply estimated the programme’s impacts on specific outcomes. Many papers have been published on a variety of topics in academic and peerreviewed journals that have used the data from the PROGRESA evaluation. Beyond impacts: the impact of the evaluation, mechanisms, and extrapolation. While the presence of a rigorous impact evaluation that estimated PROGRESA’s impacts was, aforementioned, instrumental to its survival, a possible danger, which in my opinion materialised in the years after the evaluation, is the generalisation and extrapolation of the results to

different contexts in a fairly noncritical manner. This happened both within Mexico and outside. PROGRESA’s results heralded a development of CCTs that were often promoted as a silver bullet that would provide the solution to many policy problems. That, and the political rentability of this type of programmes, meant that CCT spread widely throughout Latin America and, later, to other regions. However, the extrapolations were not necessarily justified. Without an understanding of how a given programme, such as PROGRESA, obtained certain results (such as increasing school enrolment for certain ages), it is not obvious that the same results would be obtained in a different context. In other words, while the randomisation used in the PROGRESA evaluation allowed a straightforward interpretation of the results and guaranteed their robustness, without the identification of the relevant mechanisms it is not possible to extrapolate these results to different countries or regions or even to predict the impact of slightly different versions of the programme in the same context. For instance, it would not be possible, on the basis of the simple impact evaluation, to predict the effect of changing the structure of the grants. Fortunately, the availability of a rich data set allows researchers to identify and estimate models of behaviour that, on the one hand, permit a better understanding of the mechanisms that generate the observed impacts and, at the same time, allow the performance of counterfactual experiments that can be used to predict the effect of different versions of the programme. Two papers that have explicitly pursued this avenue are that by Todd and Wolpin [2] and Attanasio et al. [3], the latter also using a model to assess the extent to which the intervention’s impacts change across different states in Mexico. Todd and Wolpin [2] and Attanasio et al. [3] estimate a structural model of individual behaviour in which households are assumed to decide on the children enrolling into school taking into account a variety of economic incentives (including the possibility that the child would work for a wage and a specific grant structure). The availability of the experimental variation (the fact that the programme is only available in a set of communities randomly selected) allows the estimation of particularly rich models (for instance, it is possible to allow the effect of the grant on the incentive to work or going to school to be different from the effect of the wage) and/or to validate a model estimated exclusively on control villages (an exercise performed by Todd and Wolpin). Once the parameters of this model are estimated, one can change the structure of the grant and simulate how the effect of this modified grant

Downloaded from sjp.sagepub.com at MCMASTER UNIV LIBRARY on April 2, 2015

Evidence on public policy   39 would change. In the case of PROGRESA, given that the programme did not have an impact on enrolment of primary school age children, a natural change for the model is to eliminate the primary school grant and maybe use the resources thus saved to increase the secondary school grant. Such a policy seems natural, given that primary school enrolment was already over 90% in rural Mexico (making the primary school grant particularly expensive from the point of view of the government) and given the results obtained in increasing secondary school enrolment. Attanasio et al., in their paper, perform an experiment where they eliminate the primary school and use the resources generated by this reduction to increase the secondary school grant. Given the estimates of the behavioural model, they can calibrate the increase in the secondary school grant so that, given the increased uptake, the overall operation is budget neutral. The authors can show that such a change in the structure of the grant can generate much larger increases in enrolment. Of course such a conclusion has to be taken with some caution: firstly, the results depend on the specific model estimated. And secondly, it might be that the primary school grant is not solely designed to increase enrolment but may also play a different role. It should be stressed, however, that the model estimated seems to fit the data very well (for instance, it can reproduce with very few parameters the impact observed in the experimental evaluation across different ages). Moreover, the conceptual framework is a logical model that identifies specific reactions of individual households to the introduction of the programme. Because of this, it can be used to extrapolate the results of the evaluation to slightly different versions of the programme. The impact of impact evaluations The evaluation of PROGRESA and Oportunidades is an example of a successful evaluation in that it was carried out by an independent entity to high-quality standards. It was one of the few evaluations based on a randomised experiment that evaluated a programme at scale. Moreover, given its rigour and its independence, it received much attention and it affected in a fundamental way policy decisions both in Mexico and in other countries, especially within Latin America. However, one has to keep in mind that this specific outcome is not the rule. First of all, the success of CCT programmes was probably also due to the fact that this type of programme, which involves the transfer of cash, can be extremely rentable from a political point of view. Many politicians within Latin America (even in countries in which these programmes were

not successful such as Brazil and Colombia) made a very political use of their expansion. Moreover, many of the expansions of CCT justified on the back of the PROGRESA (and other) evaluations resulted in the creation of programmes that replicated the same features of PROGRESA in very different contexts where it was not obvious that they were appropriate. One of the most visible examples in this respect is the urban expansion of PROGRESA/Oportunidades, which was much less successful than the original programme. Moreover, in many cases, the results of evaluations are not taken into consideration in the policy process. I could list some evaluations that identified unsuccessful programmes that were continued and even expanded and others that identified effective programmes that were subsequently eliminated. The main message to be kept in mind is that the incentives of politicians and policy makers do not always coincide with the desire to design and expand effective policies, instead are rather affected by the political rentability of specific interventions. For this reason, the considerations on the institutional mechanisms to generate a genuine demand for rigorous evaluations whose results are used effectively to inform and improve policies, are particularly salient. Conflict of interest The author declares that there is no conflict of interest.

Funding This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Notes 1. The literature on impact evaluation is too large to be surveyed here. I will not even attempt a brief summary of such literature. Good examples of, relatively recent surveys are included in Blundell and Costa-Diaz [4] or Imbens and Wooldridge [5]. Other useful sources include some of the Handbook of Econometric chapters (Heckman and Vytlacil [6,7] and Abbring and Heckman) [8]. 2. The first term on the right-hand side of equation 3, for instance, indicates that the average is computed over the individuals for whom D=1. 3. The weak conditions that need to hold for this result, relate to the fact that the assignment of the programme to certain individuals does not affect the outcomes for the “control” group that does not receive the programme. I will discuss these issues further.

Downloaded from sjp.sagepub.com at MCMASTER UNIV LIBRARY on April 2, 2015

40    O.P. Attanasio References [1] Rosenbaum PR and Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983;70:41–5. [2] Todd P and Wolpin K. Assessing the impact of a school subsidy program in Mexico: using a social experiment to validate a dynamic behavioral model of child schooling and fertility. American Economic Review 2006;96:1384–417. [3] Attanasio OP, Meghir C and Santiago A. Education choices in Mexico: using a structural model and a randomized experiment to evaluate PROGRESA. Review of Economic Studies 2012;79:37–66. [4] Blundell R and Costa-Diaz M. Alternative approaches to evaluation in empirical microeconomics. Journal of Human Resources 2009;44:565–640. [5] Imbens GW and Wooldridge JM. Recent developments in the econometrics of program evaluation. Journal of Economic Literature 2009;47:5–86.

[6] Heckman JJ and Vytlacil E. Econometric evaluation of social programs – Part I: Causal models, structural models and econometric policy evaluation. In: Heckman JJ and Leamer E (editors), Handbook of Econometrics. Amsterdam: North Holland, 2007;Vol 6.B:4779–874. [7] Heckman JJ and Vytlacil E. Econometric evaluation of social programs – Part II: Using the marginal treatment effect to organize alternative econometric estimators to evaluate social programs, and to forecast their effects in new environments. In: Heckman JJ and Leamer E (editors), Handbook of Econometrics. Amsterdam: North Holland, 2007;Vol 6.B:4875–5143. [8] Abbring J and Heckman JJ. Econometric evaluation of social programs – Part III: Distributional treatment effects, dynamic treatment effects, dynamic discrete choice, and general equilibrium policy evaluation. In: Heckman JJ and Leamer E (editors), Handbook of Econometrics. Amsterdam: North Holland, 2007;Vol 6.B:5145–303.

Downloaded from sjp.sagepub.com at MCMASTER UNIV LIBRARY on April 2, 2015

Evidence on public policy: methodological issues, political issues and examples.

In this paper I discuss how evidence on public policy is generated and in particular the issue of evaluation of public policies. In economics, the iss...
808KB Sizes 0 Downloads 3 Views