Better is Good Enough: Putting Imperfect Data to Good Use

Andrew Means weighs in on the age-old struggle between perfectionism and practicality — updated for our digital age.

Math is nice isn’t it? That feeling of getting the right answer. 1 + 1 = 2. There is no other answer. It’s so clear. Something is either correct or incorrect. It’s not left open to interpretation.

Our first exposure to numbers is usually this kind of simple mathematics. We get used to this clarity — this black and white, right and wrong thinking when working with numbers. The problem is, numbers are not always clear-cut. They aren’t always black and white. Sometimes they are still left open to interpretation.

And sometimes even incorrect, flawed numbers point us in the right direction.

I was recently at a gathering that brought together data scientists and evaluators. The goal was to foster more dialogue between these two communities that actually view the world quite differently. One of the topics of conversation was how to use flawed data. Evaluators spend a lot of energy making sure their data is as close to perfect as possible. When they say that a program has a certain effect, they want to say that with confidence. They live in a messy world and have been trained to keep the mess away from the data as much as possible.

Data scientists tend to be a bit more pragmatic. Many know they are working with biased or flawed data but realize that doesn’t necessarily mean it is worthless. They take an approach of trying to extract value where they can. Whereas evaluators use data to prove the impact of a program, data scientists and analysts often use data to improve a program.

When you are seeking improvement, perfection is sometimes the wrong goal. Let’s say I’m running an after-school program and I want to increase high school graduation rates. I build a model to predict which of the students in my program are potentially at greater risk of dropping out so I can target some additional resources toward them. My model will definitely label some students as high risk who aren’t and miss some students who actually are. But if it helps me improve the success of the students in my program, is that a problem?

Some people might say it is a problem to label students as high risk that aren’t and to overlook some that are. But if the model helped improve the outcomes of students, isn’t that enough? Sometimes isn’t better good enough?

It’s easy to point out how models can be wrong. Of course the model is wrong! That’s not the point. The British statistician George E. P. Box put this well when he said, “All models are wrong, but some are useful.”

One of my fears is that organizations are unable to embrace the simultaneous wrongness and usefulness of data. I see organizations waiting for the perfect and passing by the perfectly useful.

I am not advocating that we throw rigor out the window or that accuracy no longer matters. It absolutely does. But I think the bar we sometimes hold ourselves to is the wrong one. Perfection should not be the goal, improvement should be.

How do you balance these goals in your organization? How do you strive for the best while also moving quickly with the useful? Share your thoughts and experiences below!

Special thanks to Andrew Means for his post on making the most out of messy data. Andrew is the Head of the Uptake Foundation and Co-Founder at The Impact Lab. Follow him on Twitter to learn more about his work.

To stay up to date with the latest from Markets For Good, sign up for our newsletter and follow us on Twitter.

August 2, 2016

Measurement and Evaluation