The characteristics of digitized information require us to think carefully about how we use it.

Click below to see details for each characteristic.
View As List View As Diagram
Generative

DATA CHARACTERISTIC: Generative

The original use of data is only the beginning. Each use of digital data creates additional datasets that matter.

Considerations:

Requires understanding how metadata work as well as where and how digital interactions can be tracked and for what purposes.

Example:

A Twitter dataset might include the content of tweets about trouble voting (being turned away from polls due to lack of ID, being threatened or harassed, being told that they can’t vote if they are in line at the time of the poll closure); the data created by people tweeting – timestamps, location where the tweet was made – becomes a meaningful dataset as well. It could inform a map of hot zones where voting rights attorneys might focus in the future or where a mediation organization could host a community dialogue.

Replicable

DATA CHARACTERISTIC: Replicable

Copies of digital datasets are often indistinguishable from the original.

Example:

The metadata (data about the dataset) can often distinguish between originals and copies. Requires decisions about whether this distinction is important and the skills to interpret metadata.

Example:

Digital data files are easily manipulated – allowing for everything from photoshopping to music sampling. Digital datasets created for one purpose can be added to, retracted, or otherwise manipulated by users. For example, anyone can download and use American Community Survey data, and overlay it with distressed census tract information to create a new dataset.

Mixable

DATA CHARACTERISTIC: Mixable

Data collected for one purpose can be used for other purposes and datasets can be easily and cheaply mixed together or compared.

Considerations:

The caveats and limits of an original data set can get lost as data are mixed and remixed.

Example:

Digital data files are easily manipulated – allowing for everything from photoshopping to music sampling. Digital datasets created for one purpose can be added to, retracted, or otherwise manipulated by users. For example, anyone can download and use American Community Survey data, and overlay it with distressed census tract information to create a new dataset.

Scalable

DATA CHARACTERISTIC: Scalable

Digital data are cheap to collect and store and they can accumulate on massive scales. Datasets may be stored in separate places but interoperable standards allow distinct datasets to be analyzed together.
The sheer quantity of available digital data and its generative nature, are catalyzing new methods of analysis.

Consideration:

The large scale of available data allow for different information to be brought together, making identification of individuals more possible.

Example:

Volunteers helped Human Rights Watch use readily available data from Facebook posts, online videos, photographs, Google maps, and Google SketchUp to identify source of chemical weapons use in Syria. These “data” were readily available online and consisted of a variety of different forms. Most of the photos, for example, were not created for use as evidence, they were simple documentation of a moment. The analysts created a new data tool by drawing together many different digital sources.

Storable

DATA CHARACTERISTIC: Storable

The decreasing cost of storage can make it easier to collect and hold data than to make hard choices.

Considerations:

The sheer size of a dataset doesn’t equate with its validity, representative nature, or research utility.

The rise of cloud storage (remote storage) allows for new kinds of access and also raises new questions about security and ownership.

Example:

Low storage costs have changed our personal photography practices. Many people now take and save hundreds, even thousands, of digital photographs because it’s so easy to do.
Similarly, many organizations are choosing to collect and hold onto digital data that they are not sure what to do with, just because it’s cheap. This raises new challenges as well – from the “too big to find” problem of our photos to legal issues regarding stored information (if you don’t have it, it can’t be subpoenaed), re-use of information, and security concerns about storage leaks and break-ins.

Accessible

DATA CHARACTERISTIC: Accessible

By posting data online you make it easily available to anyone with an internet connection. Structuring the information so it can be used from a mobile device or accessed from multiple locations is becoming easier and more important.

Considerations:

Technological shifts (such as rise of mobile devices) requires maintaining accessibility over time. Once something is online and accessible to the public there are few ways to control what can be done with it.

Example:

A foundation grant list can be easily shared online. Others may use this data to inform their own decision making or learn about their community. It can also be used by organizations with different viewpoints.
Depending on how the information is posted – in a static .pdf, in a searchable database, or as a downloadable dataset that can be mixed with other datasets – will influence how easily the information can be used for a variety of purposes. Note that even static data can be scraped from websites and digitized

Persistent

DATA CHARACTERISTIC: Persistent

Digital data may be held for generations. Once data are put on the Internet others can cache and copy, so removing your copy doesn’t “erase” the information.

Considerations:

Combined with low cost of storage, the issue of perpetuity requires consideration of how data collected for one purpose might be used for other purposes later on. Creates need to consider destruction options, criteria and timeline. Also raises questions about whether individuals should be able to withdraw their data from a dataset over time.

Example:

Data collected by school districts on a student may be stored in ways that make it available over the lifetime of that individual. It is hard to predict all the possible benefits and downsides of a lifelong digital trail.
Retention is related to the low cost of storage, but it raises questions about what kind of consent you need to get when you first collect certain information (for a single purpose or for anything that becomes interesting later) as well as what protections need to be provided in case the information is sought by third parties (including law enforcement) or stolen.

?
Unclear
ownership

DATA CHARACTERISTIC: Unclear ownership

Since many exact copies can be made of digital data it is hard to know what is the original, what are copies, and who owns what. What, if any, ownership rights do individuals represented in a dataset have over their data?

Considerations:

Digital datasets are raising new questions of consent and ownership for persons represented within a dataset. Do people in the dataset have rights to their information? Who owns the original and copies of datasets?

Example:

The Digital Public Library of America provides software that links digital copies of items owned by libraries around the world and stores those copies so users can find them. The originals stay with the host organization – but what are the meaningful differences between a digital copy and the original? What are the limits on the uses of those copies and how should creators be compensated?
As with the issue of retention, there may also be cases where someone wants to “take back” their information out of a dataset. When and how should this happen?

Digital data refers to the digitized form of any material – text, photos, videos, reports, databases, spreadsheets. In traditional economic parlance, digital data act like public goods. They are non-rival (lots of people can use them at once) and non-excludable (it’s hard to prevent people from using them even if they don’t pay).

In contrast, both time and money (two other philanthropic resources) are rival and excludable.

Digitization of data allows for information to be shared, stored, re-used, and analyzed at a scale, pace, and time horizon that were not previously feasible

Making information digital is only one step. There are many decisions to be made about making that digital information available and usable. Decisions about access, storage, sharing, and retention are discussed in the “Data Lifecycle Section.”

It is important to state up front that securing internet-connected data is difficult and expensive. There are steps you can take to decrease the chance of usable information being accessed by unauthorized people, but there is no such thing as 100% secure information. If information is connected to the internet, it is vulnerable.