Better is Good Enough: Putting Imperfect Data to Good Use

Andrew Means weighs in on the age-old struggle between perfectionism and practicality — updated for our digital age.

Math is nice isn’t it? That feeling of getting the right answer. 1 + 1 = 2. There is no other answer. It’s so clear. Something is either correct or incorrect. It’s not left open to interpretation.

Our first exposure to numbers is usually this kind of simple mathematics. We get used to this clarity — this black and white, right and wrong thinking when working with numbers. The problem is, numbers are not always clear-cut. They aren’t always black and white. Sometimes they are still left open to interpretation.

And sometimes even incorrect, flawed numbers point us in the right direction.

I was recently at a gathering that brought together data scientists and evaluators. The goal was to foster more dialogue between these two communities that actually view the world quite differently. One of the topics of conversation was how to use flawed data. Evaluators spend a lot of energy making sure their data is as close to perfect as possible. When they say that a program has a certain effect, they want to say that with confidence. They live in a messy world and have been trained to keep the mess away from the data as much as possible.

Data scientists tend to be a bit more pragmatic. Many know they are working with biased or flawed data but realize that doesn’t necessarily mean it is worthless. They take an approach of trying to extract value where they can. Whereas evaluators use data to prove the impact of a program, data scientists and analysts often use data to improve a program.

When you are seeking improvement, perfection is sometimes the wrong goal. Let’s say I’m running an after-school program and I want to increase high school graduation rates. I build a model to predict which of the students in my program are potentially at greater risk of dropping out so I can target some additional resources toward them. My model will definitely label some students as high risk who aren’t and miss some students who actually are. But if it helps me improve the success of the students in my program, is that a problem?

Some people might say it is a problem to label students as high risk that aren’t and to overlook some that are. But if the model helped improve the outcomes of students, isn’t that enough? Sometimes isn’t better good enough?

It’s easy to point out how models can be wrong. Of course the model is wrong! That’s not the point. The British statistician George E. P. Box put this well when he said, “All models are wrong, but some are useful.”

One of my fears is that organizations are unable to embrace the simultaneous wrongness and usefulness of data. I see organizations waiting for the perfect and passing by the perfectly useful.

I am not advocating that we throw rigor out the window or that accuracy no longer matters. It absolutely does. But I think the bar we sometimes hold ourselves to is the wrong one. Perfection should not be the goal, improvement should be.

How do you balance these goals in your organization? How do you strive for the best while also moving quickly with the useful? Share your thoughts and experiences below!

Special thanks to Andrew Means for his post on making the most out of messy data. Andrew is the Head of the Uptake Foundation and Co-Founder at The Impact Lab. Follow him on Twitter to learn more about his work.

To stay up to date with the latest from Markets For Good, sign up for our newsletter and follow us on Twitter.



  1. Peter Campbell says:

    In the nonprofit world, we have to do the best with what we have, and messy data is often the best we’ll be able to get. When I worked at Goodwill in the early 00’s, I put in new inventory management and point of sale systems and developed a retail reporting system. My Goodwill’s model was to take in donations, sort them, and then categorize them as 1) sellable in our retail stores, or 2) sellable in an “as is” store (e.g., too poor quality to individually price, but probably sellable), or 3) best sold in bulk to third parties. The cost of goods is the handling of goods, but the handling of goods is an intense amount of labor. Once sent to the stores, we tracked what was sold, but we didn’t track what wasn’t sold. Goods that sat on the racks for a while were eventually shipped back to the warehouse and nobody counted those. This meant that we tracked only two out of the three metrics that provided a full picture. The reporting made no allowance for this discrepancy. So it was incorrect every day. We understood that. We also had a ballpark idea that allowed us to make some assumptions. But, the cost of goods being the handling of goods, and the resources heavily constrained, counting the returned goods was not a justifiable expense. So we did what we could with the data we had. What we could do was increase our profits 10%, year over year, for two years in a row, once we had the new reporting system, despite closing two of our 18 stores during that period. The data made a large, measurable difference. So I’ll go to my grave as an advocate for accuracy when you can afford it. But bad data, used in the proper context, can still give you great and useful insight.

    1. Andrew Means says:

      Such a great lesson Peter. You also raise a really important point. Data collection in our sector in particular is not always friction-less. It’s not always machine generated. Therefore the cost – benefit calculation of collecting a new metric should always be done. Sometimes the benefit you get from messy data is higher than the benefit you would get from perfect data considering the additional cost of getting perfect data.

      1. Peter Campbell says:

        Exactly! But I’ve always been a big believer that perfection is the enemy of accomplishment. Or progress. Or something like that. 🙂

Comments are closed.