The characteristics of digitized information require us to think carefully about how we use it as a resource.

Click below to see details for each characteristic.
View As List View As Diagram


The original use of data is only the beginning. Each use of digital data creates additional datasets that matter.


Requires understanding how metadata work as well as where and how digital interactions can be tracked and for what purposes.


A Twitter dataset might include the content of tweets about trouble voting (being turned away from polls due to lack of ID, being threatened or harassed, being told that they can’t vote if they are in line at the time of the poll closure); the data created by people tweeting – timestamps, location where the tweet was made – becomes a meaningful dataset as well. It could inform a map of hot zones where voting rights attorneys might focus in the future or where a mediation organization could host a community dialogue.



Copies of digital datasets are often indistinguishable from the original.


The metadata (data about the dataset) can often distinguish between originals and copies. Requires decisions about whether this distinction is important and the skills to interpret metadata.


Digital data files are easily manipulated – allowing for everything from photoshopping to music sampling. Digital datasets created for one purpose can be added to, retracted, or otherwise manipulated by users. For example, anyone can download and use American Community Survey data, and overlay it with distressed census tract information to create a new dataset.



Data collected for one purpose can be used for other purposes and datasets can be easily and cheaply mixed together or compared.


The caveats and limits of an original data set can get lost as data are mixed and remixed.


Digital data files are easily manipulated – allowing for everything from photoshopping to music sampling. Digital datasets created for one purpose can be added to, retracted, or otherwise manipulated by users. For example, anyone can download and use American Community Survey data, and overlay it with distressed census tract information to create a new dataset.



Digital data are cheap to collect and store and they can accumulate on massive scales. Datasets may be stored in separate places but interoperable standards allow distinct datasets to be analyzed together.
The sheer quantity of available digital data and its generative nature, are catalyzing new methods of analysis.


The large scale of available data allow for different information to be brought together, making identification of individuals more possible.


Volunteers helped Human Rights Watch use readily available data from Facebook posts, online videos, photographs, Google maps, and Google SketchUp to identify source of chemical weapons use in Syria. These “data” were readily available online and consisted of a variety of different forms. Most of the photos, for example, were not created for use as evidence, they were simple documentation of a moment. The analysts created a new data tool by drawing together many different digital sources.



The decreasing cost of storage can make it easier to collect and hold data than to make hard choices.


The sheer size of a dataset doesn’t equate with its validity, representative nature, or research utility.
The rise of cloud storage (remote storage) allows for new kinds of access and also raises new questions about security and ownership.


Low storage costs have changed our personal photography practices. Many people now take and save hundreds, even thousands, of digital photographs because it’s so easy to do.
Similarly, many organizations are choosing to collect and hold onto digital data that they are not sure what to do with, just because it’s cheap. This raises new challenges as well – from the “too big to find” problem of our photos to legal issues regarding stored information (if you don’t have it, it can’t be subpoenaed), re-use of information, and security concerns about storage leaks and break-ins.



By posting data online you make it easily available to anyone with an internet connection. Structuring the information so it can be used from a mobile device or accessed from multiple locations is becoming easier and more important.


Technological shifts (such as rise of mobile devices) requires maintaining accessibility over time. Once something is online and accessible to the public there are few ways to control what can be done with it.


A foundation grant list can be easily shared online. Others may use this data to inform their own decision making or learn about their community. It can also be used by organizations with different viewpoints.
Depending on how the information is posted – in a static .pdf, in a searchable database, or as a downloadable dataset that can be mixed with other datasets – will influence how easily the information can be used for a variety of purposes. Note that even static data can be scraped from websites and digitized



Digital data may be held for generations. Once data are put on the Internet others can cache and copy, so removing your copy doesn’t “erase” the information.


Combined with low cost of storage, the issue of perpetuity requires consideration of how data collected for one purpose might be used for other purposes later on. Creates need to consider destruction options, criteria and timeline. Also raises questions about whether individuals should be able to withdraw their data from a dataset over time.


Data collected by school districts on a student may be stored in ways that make it available over the lifetime of that individual. It is hard to predict all the possible benefits and downsides of a lifelong digital trail.
Retention is related to the low cost of storage, but it raises questions about what kind of consent you need to get when you first collect certain information (for a single purpose or for anything that becomes interesting later) as well as what protections need to be provided in case the information is sought by third parties (including law enforcement) or stolen.


DATA CHARACTERISTIC: Unclear ownership

Since many exact copies can be made of digital data it is hard to know what is the original, what are copies, and who owns what. What, if any, ownership rights do individuals represented in a dataset have over their data?


Digital datasets are raising new questions of consent and ownership for persons represented within a dataset. Do people in the dataset have rights to their information? Who owns the original and copies of datasets?


The Digital Public Library of America provides software that links digital copies of items owned by libraries around the world and stores those copies so users can find them. The originals stay with the host organization – but what are the meaningful differences between a digital copy and the original? What are the limits on the uses of those copies and how should creators be compensated?
As with the issue of retention, there may also be cases where someone wants to “take back” their information out of a dataset. When and how should this happen?

“Digital data” is simply the digitized form of any material or information: text, photos, videos, reports, location data, financial information, databases, spreadsheets and much more.
The digitization of data lets us share, store, re-use and analyze information at a scale, speed, and time horizon that were not possible before.

In economic terms, digital data are “non-rival” (many people can use the same data at once) and “non-excludable” (it’s hard to prevent others from using data). Compared to time and money – two other valuable philanthropic resources – digital data have unique characteristics that require new systems and approaches to manage them safely, ethically and effectively.

Making information digital is only the first step, and it leads to many other decisions about data collection, analysis, storage, access, security, and destruction. Next: Explore the full Data Lifecycle.