From the Archive

“Markets for What?”: Refining Our Questions Before Defining The Data

Notes From The FieldIn this edition of “Notes From The Field,” David Henderson speaks on how better data can actually lead to greater impact, by examining a critical assumption behind the request for better data: that it lands in the hands of skilled users. This isn’t a point to leave at assumption. I wouldn’t assume that buying Usain Bolt’s decidedly better track shoes will not take many of us under 10 seconds in the 100m. While there are many developing examples of intelligent data practice in the sector, the question of data literacy at both individual and industry scale is a rightly arresting one, and not only for the social sector as business and government also seek to ramp up data literacy.

I make my living helping organizations make sense of their social sector data. Yet as the fervor over the promise of data gets louder, the harder it is for me to tell what anyone is actually talking about.

“Data” is an ambiguous term, it has no value in itself. Data is only valuable insofar as it informs decision-making. The organizations I work with commonly ask “What data points should we collect?” This is the wrong way to think about data, and it is the same mistake we are making as a community when we focus on developing data markets without clearly identifying the questions we are trying to answer.

The real question then is, what decisions are we trying to make?

Philanthropic investments are different from financial investments. Financial investors maximize over one outcome, profit, while social investors might minimize or maximize over varying outcomes depending on their utility frameworks. What an investor or organization values determines what data points are relevant in decision-making, not the other way around.

Identifying indicators too hastily can lead one to make data informed decisions that are at odds with one’s desired result. Take for example a homeless services provider. One might think that a reasonable metric to assess success and failure would be a binary variable that indicates whether an individual was placed in housing or not.

What this data point would not consider is how difficult to house each individual was. Under this framework we would value housing a person who was chronically homeless for over fifteen years equal to housing an individual who was only temporarily homeless due to job loss.

Valuing these two outcomes the same might be okay, if that is true to one’s utility framework. But what if one valued placing a chronically homeless person in housing differently? We would need a different set of metrics, perhaps using something like a vulnerability index.

My point is not to argue that one type of outcome should be valued over another, but rather to underscore the complexity of working with social sector data, and how our choices of data points can influence what we believe to be rational, even guiding us to make data informed decisions that are contrary to our own goals.

So if we don’t start with identifying data points, where do we start?

In my firm’s work we begin each engagement with developing an impact theory, this is the portion of a theory of change that links outcomes to a set of interventions. We use a database system we developed to programmatically model an impact theory, tying intended outcomes to measurable indicators.

By tying outcomes (abstract goals like creating a safe neighborhood, or family stability) to measurable indicators, we can approximate goals in concrete terms. The discipline of tying indicators to outcomes also answers the question of what data points to collect in decision relevant ways.

Our next step is a process we call “utility elicitation,” where we identify points of indifference across all indicators. Going back to our homeless services example, we might try to identify how many temporarily homeless people an organization would need to house to equal housing one chronically homeless person. By identifying these points of indifference we can develop a utility framework that creates bounds around decision-making.

Finally, we help organizations develop a data collection strategy, using data collection instruments like surveys or database systems. By developing the data collection instruments with indicators tied to outcomes we are sure to get the data points an organization needs to improve decision-making without collecting extraneous indicators, especially important given the marginal cost of collecting additional data points.

With the impact theory and data collection instruments in place, we work with organizations to develop quarterly feedback loops whereby we use data collected in the previous three months to answer a particular policy questions. We use statistical techniques like regression analysis and machine learning algorithms to build predictive models based on outcomes metrics, treating the impact theory as a hypothesis that is tested and updated on a regular basis.

The subtext of each of my firm’s engagements is helping organizations and social investors understand the limits of data. Indeed, in order to pipe down the data hype and move into the information revolution, the social sector must first focus on improving data literacy. Sweeping generalizations about “what works and what does not” are not made by visionaries with hidden oracles, instead they are the uninformed mutterings of those who draw generalizations from small samples and outliers.

Ultimately, we will be able to get more out of data the better we collectively understand its limits. Too often we make generalizations based on our findings without proper error bounds around our results. These pseudo-scientific findings are perhaps more dangerous than not using any data at all.

As we think about data markets in the social sector, we should perhaps be more concerned with developing a data literate workforce. A social sector that knows how to use data can identify for itself the necessary data points to collect, and will be better able and willing to participate in intelligent data sharing. As the sector stands now, we need better data consumers more than we need better data.