Skip to content

Digital Impact was created by the Digital Civil Society Lab at Stanford PACS and was managed until 2024. It is no longer being updated.

The “What” and The “How” of Big Data

ID-100111445Here’s a discussion on big data and analytics and their potential application within the social sector. Sunand Menon walks through the landscape, adding further comment on analysis (the skills we’re using to make sense of data) and analytics (the tools being developed to help that work.) This kind of exploration is important given that the word “data” is sometimes used, incorrectly, as an implied and casual synonym for “analysis.” But, what “the data say” is not as self-evident or certain as that common refrain. Analysis says a lot more. And then, there’s still a decision left to make. Time to hike a sleeve and take a tour inside the machine.

“What is ‘big data’ and ‘analytics’? How does it apply to my sector? How can it help my organization?”

Those were the opening questions posed by a prospective client, as we sat down to breakfast.

I appreciated the directness, and immediately launched into a simple explanation, without the technical jargon. Big data is really just information that is continually generated, collected and stored. It’s called ‘big’ because it represents a volume that is so immense, and growing so quickly, that we cannot quantify it. Think about it, if we gave it a value (e.g. 100 zettabytes = 1 with 23 zeroes after it), what would we call it after we reach it? ‘Bigger data’?

Big data comes from everywhere. Text messages, emails, climate sensors, Facebook ‘likes’ and ‘comments’, iPhone photos and videos, you name it. In the nonprofit world, it could be financial data, donor information, outcome data, or individual viewpoints on a foundation or a charity.

Big data consists of both ‘structured’ and ‘unstructured’ data. ‘Structured data’ is what we traditionally called data – lots of numbers, all categorized into specific groupings or fields, and recorded in spreadsheet-type rows and columns. ‘Unstructured data’ is basically any other data e.g., text, comments, pictures, videos, individual or group sentiments. One of the reasons why our data creation, collection and storage has improved so much recently is because developments in technology have now allowed us to increasingly (and more efficiently) process unstructured data.

So what can we do with the big data we collect? We can analyze it and make better decisions. This is where the power of ‘Analytics’ comes in. These are rules that we construct and apply, generally in the form of mathematical/logical algorithms (IF, THEN, AND, ELSE).

For example: IF a registered user ‘likes’ a specific charity on Facebook, AND then ‘comments’ positively on a similar charity on LinkedIn, AND has identified themselves as previous donors to yet another similar charity, THEN one can assume there is a higher likelihood that donation requests from charitable organizations serving that cause will have a higher chance of success. Imagine the efficiency leap in such situations – they can actually be used to predict future behaviours and patterns.

So that, in a nutshell is big data, and analytics. And how they can be applied to virtually any sector, including the nonprofit world. And how they can help the organization – by facilitating better decision-making.

How do I get to the point where I am making these better decisions?

It all starts with defining the need. For instance, let’s suppose that: “The nonprofit world will benefit tremendously by facilitating the collection and storage of all relevant nonprofit data, and by building relevant performance analytics around them. This, in turn, will facilitate better decision-making (e.g. resource allocation), which will lead to better outcomes, more efficiently.” Not an unreasonable hypothesis – it’s been done in other sectors (see a previous blog post )

In my view, the most difficult part involves defining scope and breadth of “relevant data” and “relevant performance analytics”. Let’s assume that the data and analytical methodologies are defined to a “good enough” level. How do we now build this?

Firstly, the data has to be collected – they may be coming from different databases, and may be in varying forms, such as numbers or words, or even at different time intervals, e.g., once-off data versus real-time, streaming data (think of stock price data). They can all be loaded because of new developments like ‘Hadoop’, a free software framework that supports processing of large datasets.

Secondly, the data has to be integrated and stored in analytical appliances. Often, the various forms of data are placed in preconfigured hardware and software systems, ready to be retrieved according to a set of criteria. You may have heard about capabilities such as ‘Netezza’ or ‘Greenplum’ or Exalytics’ – these are simply analytical appliances owned by IBM, EMC and Oracle, respectively that do this work.

Thirdly, the data has to be processed and analyzed to derive insights. In big data analysis, a variety of mathematical techniques are used – e.g., collaborative filtering, clustering, categorization, all on programming platforms such as ‘MapReduce’ and ‘Mahoot’. They essentially identify patterns and linkages between diverse datasets – no matter how unlikely the potential link between them.

And therein lies the power of big data and analytics: sometimes correlations might exist that we could never have imagined.

Finally, the data has to be visualized to convey the results of the analysis. This can be done for complex datasets using statistical packages such as ‘SAS’, ‘SPSS’ and ‘R’. This can also be done using intuitive dashboards or interfaces with a variety of styles, e.g., Tableau, Geckoboard, Domo or Inverra.

The good news is that this has been done many times over in several industries, and there are many organizations that can help in the build-out. At the large, and complex end of the execution scale are big technology and services multinationals like IBM, Accenture, and EMC. At the smaller end, a multitude of boutique consulting and technology services companies exist. The implementation cost can range from six figures to tens of millions of dollars, depending on the size and complexity of the data and analytics involved. In the context of the vision for Markets for Good, it may be worthwhile to work on a proof-of-concept project involving a subset of “relevant” structured and unstructured data, and a draft set of “relevant” performance analytics. We may be pleasantly surprised with what we find.