Today we feature a follow up to Andrew Means' progressive and insightful look at program evaluation
Last week I published The Death of Program Evaluation, highlighting my firm belief that evaluation as it is today needs to change and evolve. Program evaluation is often not forward looking, over-focused on statistical proof, and actually undermines program improvement. The post sparked a debate across twitter, the website, and the Markets for Good community on the role that program evaluation should play in the sector.
As a result, the team at Markets For Good have given me the opportunity to further highlight my beliefs. They have also asked for counter arguments from others engaged in this debate as well, which I am eagerly looking forward to reading as soon as they are published.
Today I want to unequivocally state that I believe nonprofits should be focused on producing outcomes and that we need evidence that nonprofits are delivering on those outcomes. However I also believe that the processes and methods we currently use to do so are in need of improvement.
This raises the question, what role should data play in the sector?
Prove or Improve?
Should data be used to prove or improve our programs? The obvious and immediate answer is both. The reality is, however, that most programs cannot afford to prove to a statistically significant level that their program is the cause of some outcome. There is a reason it is known as the burden of proof.
When facing the high bar of proof, many organizations turn around and decide since they can’t meet that bar they won’t do anything. When we tell organizations randomized control trials with significant results are the only acceptable form of proof, many organizations choose not to examine their programs at all.
We need to encourage nonprofits of all sizes to gather evidence (read data) that supports the claim that their program is working. Evidence scales in a way that proof does not.
We also need to do more formative program evaluation and continuous improvement work. We need to take historical data and use it to identify places of improvement and then track progress. Organizations should strive to increase their effectiveness over time, and data has a role to play in that work.
Periodic or Ongoing?
Should organizations use data periodically or in an ongoing manner? I think that data should be infused throughout an organization and be used everyday to make better decisions. This is really why I like the term analytics. It implies a kind of ongoing use of data that I think is important for the sector to embrace.
This again goes back to the high bar we set. Most organizations that do evaluation don’t do it in an ongoing manner. They do it every 5 (or so) years. I think that organizations should be using data everyday to evaluate how they are doing. When we raise the bar of evaluation to the level of proof (and the only reason we use data is to prove our programs are working) it makes it difficult to think about how to engage data everyday.
Causal Inference or Prediction?
Let’s get a bit more technical. Should we build models that infer causality or predict things outside of the sample set? Again, we should strive for both. However, most of the program evaluation models I see being built use methods (primarily regression analysis) often best used for causal inference.
Causal inference is extremely important. It is important to understand how various inputs affect an outcome. That relationship is helpful because it will help me understand how to change those inputs to maximize that outcome.
What I don’t see happening often enough is program evaluation models being tested for out of sample validity. I don’t see many models being built on historic data and re-tested on future data. Simply put, I don’t see much future validation of past models. That could lead to misinterpretation and overfitting. For more, check out this thread.
Lastly, I think we should find more ways to engage predictive models in program decisions. If I can predict what outcomes a program participant will have from various interventions, I should. But good predictive models often employ wholly different techniques than good interpretative models.
What’s Next
Data is changing the social sector. And I think we’re just at the beginning of that change. David Anderson put it well in his comment on the original post, “As data becomes more ubiquitous and the sector more in-tune to its value, we are more likely to see the true emergence of evaluation rather than its death.”
I agree. We need so much more than what we have today. We need to use new skills, new methods, and new models that embrace data and promote the value it brings to the sector. As that happens our organizations will improve, our effectiveness increase, and we can begin to make greater progress in solving some of our world’s most intractable problems.