Carnegie Commons Blog

Protected: Evidence for Improving Schooling: Role of RCTs and the “Tiers of Evidence” Framework

The term “evidence-based” is mentioned some 60 times in the new Every Student Succeeds Act (ESSA). ESSA’s attention to how research can and should play a more central role in improving schooling is a much-needed antidote in our field where fads tend to run strong but supportive evidence is often weak or non-existent.

But left much less clear is the practical question: “What does evidence for local improvement actually look like?” ESSA makes references to the Tiers of Evidence set out in the Investing in Innovation (I3) program and operative in the What Works Clearing House. However the body of such evidence is quite limited, and questions are now being raised about its validity and utility and even whether this would actually be the best guidance we could possibly afford educators were it somehow magically available.

Educators are not alone in this regard. These same questions are being raised about other practical improvement endeavors. Recently, I came across a video of a session organized by the Development Research Institute at New York University. The session focused on a then recently released book, Poor Economics, that sought to synthesize findings and draw out the practical implications from a large body of RCTs carried out on fighting global poverty. (add reference) The subsequent commentary by Angus Deaton, a Nobel Prize winning development economist at Princeton University, offers a cogent critique of the “RCT as Gold Standard” paradigm in an address at the Development Research Institute’s 2012 Debates in Development Conference at NYU. And, since the Tiers of Evidence Framework rest on the latter argument, implicit here are words of caution about the limits of this framework as well. Toward the end of his commentary, Deaton offers his own view as to what evidence for practical improvement actually looks like. Although not specifically referencing improvement science or networked improvement communities, Deaton’s argument is quite consistent with the six improvement principles.

We present below key segments from Deaton’s presentation. The entirety of the address can be viewed here.

The nature of evidence and its relevance for policy

Deaton begins by talking about the nature of evidence and its relevance for policy aimed at improving practice. “What is it that RCTs actually tell us and don’t?”

The heterogeneity of effects

Professor Deaton then proceeds to talk about the consequences of interventions having different effects in different contexts and how this may confound inferences from even well designed and executed experiments. This phenomenon of heterogeneity of effects is widespread in education. Deaton describes how the results of RCTs can be very misleading when the experimental and control group sizes are small. His example of small is an experiment in a 100 villages, half receiving an intervention and half not. Interestingly, this sample size is actually much larger than many education RCTs. So his worries on this account seem quite germane to educational research as well.

Are other methods worse?

“Others often say that other methods are worse. This is not true either.” He proceeds to argue that what is best to do can only be decided on a case by case basis. “The answer cannot be determined by a religious belief in the automatic superiority of RCTs.”

the relevance of RCT evidence

“What an RCT does is give you an average; it doesn’t mean it is good for you.” So here he takes up the relevance of RCT evidence for improvement in another context. His argument here strikes me as especially germane for education because decision-making about “how to improve” is a local matter and variation among school contexts can be quite substantial.

Learning from “Angry Birds”

He then proceeds to argue about the power of iterative trials and learning from failures to “find out what will work for you.”

Conclusion

In concluding, Deaton comments that statements based on RCT results often involve unwarranted generalizations which is characterized as “outrageous over-reach.” His concluding comment on this account is quite compelling: “Loose words cost lives and this is the opposite of careful thinking.”