Toronto, Civic Data, and Trust

Opinion

A controversial smart city initiative in Canada has proposed using a data trust to help manage how it collects and uses data. Will it be enough?

Reposted from Medium.

Yesterday, Sidewalk Labs proposed using a Civic Data Trust as a governance vehicle to manage data collection and use in its smart city development project in Quayside. It’s the largest scale proposal of the Civic Data Trust model to date, and it comes in the midst of a tense political environment — meaning it’s already the center of quite a bit of debate. It’s early days — both for Quayside and for civic trusts as a model — and this is, according to all sides, the beginning of a longer public data governance exploration and discussion.

The governance plan for Quayside is the largest scale proposal of the Civic Data Trust model to date.

I’ve been researching and working on data trusts at Digital Public for a few years, and had the privilege of interacting with dedicated Toronto community activists like Bianca Wylie, Sidewalk privacy leaders like Alyssa Harvey Dawson, and policy leadership at the Centre for International Governance Innovation (CIGI), around the use of civic data trusts in Quayside. If anything is clear, though, it’s that we’ll all be better off with a bit more context, community driven capacity building, and process around whatever data governance solution is implemented.

Nevertheless, it is the beginning of an important conversation and there’s plenty to dig into in the initial proposal. If you’re new to this conversation, or to some part of it, here’s my backgrounder on Civic Data Trusts, Sidewalk Labs’ proposal post, and Bianca Wylie’s response post. Also, here’s some materials produced as part of an early education campaign with CIGI.

A couple of quick notes — (1) this is a technical review of the legal proposal, not a political feasibility or positional/tactical review; (2) Sidewalk Labs framed this as a proposal, so we can’t assume anything beyond what they’ve written is “decided,” and I don’t assume anything that isn’t, is; and (3) this analysis is offered as investment in the importance of the public dialogue, which may be interpreted as a bias unto itself. As a result, this focuses at a relatively high-level and on key points, but there’s a lot more to unpack going forward.

The Good

First things first, it’s worth giving Sidewalk Labs credit for proposing a model publicly — the vast majority of private vendors contracting to collect data in public spaces do so in closed boardrooms, with opaque contracts, and open-ended licenses. That’s not to suggest the status quo is an acceptable baseline, but this debate is significant progress over business-as-usual.

The proposal also opens with an attempt at common ground — pointing to public “ownership” of data, focusing on open standards and common rules, and calling for a multi-stakeholder, independent decision-making body. Sidewalk Labs’ proposal includes a rough outline of the common policy infrastructure they think should underpin a public data market.

Sidewalk Labs suggests that all parties default to open licensing for any non-identifiable data and that they won’t include collected data in their advertising business, turning down two potential advantages. They’ve additionally offered to use “contractual provisions, technical protections, and edge computing,” to ensure that data is managed and used in compliance with Canadian Law, even if they refuse to localize its physical storage. The ability to specifically partition data collected through Quayside suggests more of a separation than most Alphabet entities afford, creating at least some space for competitors.

The Bad

Sidewalk Labs’ proposed Civic Data Trust is riddled with contradictions. At the highest level, the proposal seems to conflate — or at least go between — several theories of control over data and uses civic trusts to achieve a set of conflicting goals, with very little resources or power to achieve them.

To recap, the proposal suggests a Civic Data Trust govern (a) collection and use of personally identifiable data; (b) maintenance and management of public access to open, de-identified “urban” data; (c) investigation and enforcement of all use-based licensing of data derived from Quayside devices; and (d) create exemptions for data “without implications for personal privacy.” It’s worth breaking these into component elements, toward understanding how they might work (or not) together.

There’s a significant amount of scholarly debate about whether data can ever truly be anonymized.

First, the proposal bundles licensing of data collection with data use — a Civic Trust should be able to control each independently. The act of bundling licensing for data collection and use is, in and of itself, a form of vertical integration — and one that makes it significantly more difficult for the Civic Data Trust, or any public actor, to aggregate influence through data collected across vendors or over time. Similarly, the requirement to share, open, or otherwise invite trust management, is based on whether it is ‘personally identifiable’ under privacy law, or ‘urban’ under Sidewalk Labs definition (more on ‘urban’ data below). Simply put, there’s no inherent reason to bundle licenses to collect data with licenses to use data — and any governance body should have the freedom to be as granular with its licensing as it so chooses.

Second, the proposal conflates several theories of data ownership, ultimately leaving the most sensitive data outside of trust protection. The proposal limits the scope of the Civic Data Trust’s authority to data collected through this project, as opposed to the significantly larger amount of data that companies collect through mobile devices, websites, and apps. Sidewalk Labs suggests that all Quayside companies should go through a process to receive a license to collect data, then de-identify the data they collect, and give it to the Civic Data Trust, whose role is to index, host, and maintain public access to it. Personally identifying information, however, presumably defaults to licensee ownership — though its use may be subject to audit, investigation, and enforcement. Said slightly differently, the proposed trust would grant licenses to collect and use data — and the more sensitive the data, the more proprietary it would be.

Third, the Civic Data Trust’s open publication requirements also limit its ability to effectively enforce governance, especially without some additional grant of authority. Sidewalk Labs proposed the trust will manage a registry of, and access to, open ‘urban’ data. The proposal is intentionally open-ended on how a trust might actually fulfill that requirement, so the most useful thing to note is that the “openness” of the data often affects how effective “managing” that data can be. The primary goal of the “Registry,” “Managing Access,” “Enforcement,” and “Exemptions,” components appear to be preserving public access, selectively controlling public access, building a policy apparatus around access, and an enforcement mechanism of use-based license limitations. In most governance systems, those functions are at least separate, if not sometimes in direct conflict. Even if the contradiction is intentional, the presumption of open data publication limits the Civic Data Trust’s access to management and enforcement powers. As proposed, the Civic Data Trust would have conflicting mandates, with entirely separate operational requirements to implement.

Fourth, the proposed Civic Data Trust has a fairly sweeping audit authority, and an ambiguous palette of punitive tools. Sidewalk Labs suggests that the primary enforcement tool of the Civic Data Trust should be through access to openly published data. In light of the legal and technical challenges surrounding the enforcement of use-based licensing, and intellectual property more broadly, any approach to enforcement — whether a Civic Data Trust or something else entirely — will need considerably greater resourcing, investigatory powers, and punitive authority than currently envisaged.

Sidewalk Labs’ proposal is, understandably, an incomplete vision — but, in being an introductory document, it also affords room for contradictions that shouldn’t survive the consultation process. The proposed Civic Data Trust is a piecemeal adaptation of existing pieces of civic data infrastructure, situated in one of the highest profile public procurement processes in recent history. We should expect it to be incomplete, but there are a few varyingly obvious assumptions that are harder to swallow.

The Ugly

There are at least four, core ideas in this proposal that deserve significant scrutiny from other negotiating parties — (1) the default to open publishing; (2) the concept of ‘urban’ data; (3) localization; and (4) theory of law and authority. Each of these ideas goes a little further in specifying the roles in a Civic Data Trust ecosystem, which will determine how effective it can be as a protection of the public interest, or prevention of wholesale corporate capture.

As I’ve written about before, “default to open” ecosystems favor those with other market advantages, like significant stores of proprietary data, algorithmic modeling capacity, and commercial distribution infrastructure. In other words, Sidewalk Labs does not need proprietary data access to monetize the Quayside project, and enforcing a “default to open” approach — especially if compelled — is likely to end up as a defensive advantage. This proposal suggests that all data gathered through licenses approved or exempted by the Civic Data Trust, should be made open. The push to position data as a “public asset,” and thereby compel its accessibility, could be read as an attempt to use government to quasi-nationalize competitor data through open publication requirements.

This data governance proposal also attempts to reorient the country’s data ownership laws based on characteristics — either of the data itself, or its means of collection. The proposal advocates for Toronto to specially consider “urban” data as a unique category, which is then treated differently — here, “urban” data would be declared a “public asset,” and then published. There’s nothing new about proposals for attenuating data’s treatment to its characteristics, including Linnet Taylor’s work on group harms and Nathaniel Raymond’s work on demographically identifiable data.

There is, however, something very new about suggesting that because of those characteristics, some data should be publicly owned and published. Sidewalk’s proposal blends ownership and privacy law, using concepts, like the degree of control an individual has over collection or the potential harm to data subjects, as a logic for changing its default ownership. Proposing quasi-nationalization of data is a big deal — and this proposal bundles that idea with the suggestion that nationalized data should be published openly.

As a part of the proposal, Sidewalk Labs introduced a taxonomy of “urban” data, as well as the suggestion that a Civic Data Trust should have the authority to exempt specific types of data from use limitations. The ability to grant exemptions suggests the trust would also define some way to interpret the relative risk of different types data. There’s a significant amount of scholarly debate about whether data can ever truly be anonymized and, more importantly, whether we should be focusing on data as an object, instead of its uses. As many people have been saying for years, analyzing data as an object is an ineffective way to understand its value or risk. Data, like the rest of us, gains value through connection and context more than specific characteristics.

This proposal develops the idea of characteristic ownership by specifically suggesting that data collected in public- and quasi-public spaces should be a “public asset.” Sidewalk Labs also extends the idea of public assets, saying that public data should also be published “open” by default — and not owned by any private entity. Extrapolated out, even one step, that would suggest that all, non-personally identifiable data collected in public or quasi-public spaces should be published open by default. Given the controversy over whether data is ever not personally identifiable, that represents a more aggressive “open” stance than most open data advocates. Proposing that Toronto should base ownership determinations on the urbanity of a data set is a departure from Canadian data ownership law and a precedent that, if approved, could extend far beyond this project.

Sidewalk Labs, interestingly, also resists data localization in this proposal. The language is notable for a few reasons: (1) it’s the only specifically articulated requirement for data architecture; (2) it suggests that ‘urban’ data should be owned and opened locally because of its relationship to the place of its collection, but not required to be stored there; and (3) it offers a combination of mechanisms to virtualize compliance with Canadian Law. As a matter of law, once opened, data could be stored anywhere — so this suggests there’s a specific, unexplained interest in being able to store personally identifiable data about Quayside residents extra-nationally. This is particularly noticeable because of how the proposal otherwise grounds data ownership and openness, particularly of ‘urban’ data, in its relationship to public consent. And, of course, the idea of virtualizing a legal enforcement regime through a combination of contract, technical protections, and “edge computing,” is no small undertaking, and may not even be possible.

Lastly, the Civic Data Trust, as proposed, would have to draw on a wide range of powers — at least a few of which would need some form of public resources and mandate. As described, the core functions of the trust are (1) to ensure that personally identifiable information is only used within prescribed license conditions; and (2) all non-personally identifiable information is openly published, indexed, and accessible. The first function would require conferring on a Civic Data Trust some form of investigatory authority to access and audit licensee databases and articulate some spectrum of punitive measure to punish bad action, backed either by an adjudicator, a regulator, or a court.

Similarly, while third-party auditors aren’t a new addition to procurement compliance, the size, speed, global liquidity, and complexity of data use all suggest that a Civic Data Trust capable of fulfilling this mandate would need a legally novel enforcement authority. While there’s a significant amount of flexibility in contracts, it’s relatively unique to give auditors the power to compel a company to license and publish data in a format, or the ability to prevent them from using data. Regardless of the jurisdictional issues around data localization, giving any entity the ability direct companies on matters of legal ownership, open publication, and/or access to data is no trivial undertaking.

Sidewalk Labs’ proposal doesn’t explicitly argue for, but implies, that Civic Data Trusts should have the relatively new authority to track data supply chains and licenses — and to punish violation with, at the least, mitigated access. All of that gives Civic Data Trusts the ability to substantially project their authority into the backbone of data collectors, which is a relatively rare amount of enforcement power for someone other than a government regulator. In order to service the kind of oversight, investigation, and governance described, we would have to substantially devolve and privatize limited forms of regulatory investigation and punishment authority. The other option, of course, is that this could be intended as a form of self-regulation, which Alphabet is famous for pre-empting. Regardless, of the motivation, it will be both important and difficult to ensure that any Civic Data Trust created in Toronto has meaningful leverage to investigate and enforce decisions with data collectors.

The reason this section is called “the ugly,” is because these are unsettled questions — and the above is intended to help point to the magnitude of the challenges posed, and places where there may be hard decisions ahead.

Conclusion

There’s a lot to like about this proposal. Most importantly, that we can read it. Recognizing the very real public issues implied in the development of any smart city, public outreach, education, and ongoing, meaningful engagement are hallmarks of most major, successful community development projects. There’s an unprecedented opportunity, for all parties involved, in getting this right — and there’s no question the world’s watching.