Jake Stein
Jake Stein Oxford Martin School Doctoral Researcher for Ethical Web and Data Architectures in the Age of AI

Ethical Data Architectures: Regulating (Soft) Platform Power

Ethical Data Architectures: Regulating (Soft) Platform Power

Today’s Web is an unequal one. A few centralized platforms capture a vast majority of the data generated by users on the Web, while the everyday users who produce that data have little say in how their data is monetized and rarely benefit from the insights, models, and value it produces.1 The consolidation of control over data is enough to demand change all on its own, but the potential it creates for platforms to dominate the production and use of Artificial Intelligence (AI) is even more threatening. Protection of privacy, individual autonomy, and the threat of algorithmic discrimination have been the primary motivations for regulating AI.2 However, regulating how and where AI is applied is not enough. We argue the potential for unilateral control over AI posed by the platform domination of Web data demands intervention in the collection and processing activities behind the development of AI too. 

Scholars, policy makers, and technologists have advocated for powerful countermeasures against platforms’ solitary control over data, from using competition law to break up companies, to outlawing targeting advertising, to restructuring the Web to rely on immutable distributed ledger technologies.3 While such top-down measures may be effective, they are insufficient in addressing the constellation of low-level architectures through which platforms maintain their control over user data. Under the Oxford Martin School’s Ethical Web and Data Architectures (EWADA) project, we propose flexible Web architectures that – in tandem with regulatory responses – hope to reorient the Web’s underlying data flows to empower individuals rather than entrench platform hegemony. 

The Architectural Influence of Platforms

To understand what an ethical Web or data architecture might look like, we first need to deconstruct the Web architectures that have resulted in today's asymmetric distribution of control over the data essential to producing AI. The socio-technical scholarship of the techlash era has supplied a vocabulary to describe society-scale patterns of “Surveillance Capitalism,” “Data Colonialism,” or “Carceral Vision.” It is also critical we examine these phenomena from the bottom-up, tracing the roots of the platform-dominated Web entangled in the architectural design of Web 2.0.4 

Some early clues are visible in a 2005 guest lecture given by Mark Zuckerberg to a Harvard computer science course. In the lecture, the founder of the nascent Facebook makes various foreboding comments for the future of privacy, but it is a seemingly innocuous remark about data architecture design which would prove most consequential.5 Responding to a question about rapidly scaling system architecture to accommodate the volume of user data the platform elicited, Zuckerberg mentions centralizing user accounts in unified cloud computing infrastructure, replacing earlier models which required dedicated servers for each University the platform supported. Long before Facebook would sell its first targeted advertisement, this seemingly neutral system optimization decision laid the groundwork for what is today the core architecture allowing cross-site re-marketing by platforms like Facebook or Google.

Today, the control of data by platforms remains latent in Web architectures that would seem otherwise neutral and motivated by technical-optimization. A prominent example of this pattern is federated identity management systems such as OAuth – the Web’s most pervasive standard for third-party authentication and data authorization.6 On its own, OAuth would appear to provide users a convenient way to log-in and move data like contacts across sites, while also removing the burden of managing and securing user identities from individual Web developers. 

OAuth was a great leap forward from past practices unconscionable by today’s security standards. Before OAuth, sites like Facebook would request that users provide their username and password to email accounts such that the sites could fetch contacts and connect users with friends already on their site. OAuth securely facilitates that process, but adds a caveat that data or authentication providers (most often Apple, Facebook or Google) must first approve. This intervention gives platforms a first right of refusal in the movement of user data for transactions otherwise irrelevant to platforms. Indeed, we should not expect platforms to be friendly with this new gatekeeping position, as they have already become infamous for restricting or obfuscating access to user data for both competing platforms and users themselves. Platforms have gone to court to protect the contents of their walled gardens against users with legally protected access rights; Uber and Ola have prevented unions and data trusts from accessing trip data, providing non-machine-readable data or not at all.7 Amazon has similarly gone so far as removing items from order confirmation emails to prevent Google from mining it from user emails.8 OAuth may feign a willingness to share data, but one needn’t look far to observe platforms’ true reticence to share data.

(Re) Imagining Ethical Architectures

With an understanding of how platform interests are embedded in the functional architectures of the Web, we can begin to work out what regulation might enable Web architectures which do embody individual autonomy by design. The most straightforward reaction to the status quo might be to seek decentralization where centralization is the perceived culprit. We argue that while decentralization might have benefits, it can only promise so much in terms of the redistribution of platform power. 

Decentralized architectures take many forms in service of counteracting asymmetric data control. These include Web 3.0 technologies, which propose to store data (or references to data) on immutable public ledgers (in theory) democratizing access to data.9 On the other end of the spectrum, personal data store architectures offer to decentralize data by virtue of storing it in individual silos accessible only upon the to the subject’s one-off consent.10 Finally, a variety of intermediary institutions like data trusts, commons, or trusted research environments steward data on behalf of data subjects, but stop short of making it universally available or immutable.11 

Of course, the question of subverting platform power is not so straightforward as adopting decentralized architectures. Indeed, power is the operative term, as Web 3.0 technologies, for instance, might represent the greatest level of decentralization with respect to data itself, but can also exacerbate or entrench the centralized control of power and monetary value. The same financiers who have profited most from Web 2.0’s capitalization of data centralization are presently pursuing a historic lobbying effort to gain “a leg up by exempting [Web 3.0 platforms] from certain tax reporting, consumer protection and anti-money-laundering requirements”.12 Further, the distribution of wealth within cryptocurrencies has been demonstrated to be orders of magnitude less equitable than even countries with the greatest wealth gaps.13 A decentralized architecture in practice, is not always an ethical one. 

Regulation to Support a Plurality of Autonomy

By now, it should be clear that first, top-down regulation of AI will be insufficient given how deeply entrenched it is in the Web’s architectures and second, that building ethical web and data architectures won't be so simple as looking around at what we have, and building the opposite. Indeed, architectures that support the autonomy of data subjects must reach beyond the systems they inhabit to intervene in the economic and social institutions within which they are situated. This demands a bottom-up approach to infrastructure building which recognizes user needs in context, rather than attacking problems of society- or economy-scale with equally maximalist designs and regulations. 

Personal autonomy in the age of AI is contingent. For gig workers, accessing personal data alone is pointless, if they cannot also access the algorithms, inferences or statistics created from the aggregate sets of data used to manage their work. For everyday Web users, self-sovereignty over personal data might allow enhanced privacy, but also comes with the burden of personal data governance and consent box-ticking, rather than addressing power dynamics between individuals and platforms. For the subjects of mass surveillance, rights to transparency in personal data protection are useful, but give little agency to prevent further data from being gathered.

Regulation should not be limited to addressing how, when, and by whom AI is used, but should also address the processes by which it is produced. Further, regulation should provide resources and opportunities to shape or intervene in AI’s production for the individuals and collectives who generate the data on which AI is dependent. Regulation can be used to equip movements with the necessary tools to mount informed responses to AI and its associated data collection techniques to assert the interpretations of autonomy that meet subjects’ contextually defined needs.

When considering AI regulation, policy makers should actively seek opportunities to secure institutions’ (such as NGOs, unions, consumer protection groups and public service providers) ability to access, process, and intervene in data collection such that they can gain a stronger foothold to assert data subjects’ priorities are considered in data capture and AI development. Further, regulation should expand positive rights for data protection, granting access to data in forms relevant to its purpose rather than merely in terms of personal privacy. Much of the influence essential to the maintenance of platform power is so embedded in critical Web architectures it is inevitably out of reach for top-down regulation. However; regulation can empower institutions to build ethical web and data architectures in their place.

  1. Jean-Christophe Plantin et al., ‘Infrastructure Studies Meet Platform Studies in the Age of Google and Facebook’, New Media & Society 20, no. 1 (January 2018): 293–310, https://doi.org/10.1177/1461444816661553; Jean-Christophe Plantin and Aswin Punathambekar, ‘Digital Media Infrastructures: Pipes, Platforms, and Politics’, Media, Culture & Society 41, no. 2 (March 2019): 163–74, https://doi.org/10.1177/0163443718818376. 

  2. Tarleton Gillespie, ‘Regulation of and by Platforms’, in The SAGE Handbook of Social Media, by Jean Burgess, Alice Marwick, and Thomas Poell (1 Oliver’s Yard, 55 City Road London EC1Y 1SP: SAGE Publications Ltd, 2018), 254–78, https://doi.org/10.4135/9781473984066.n15; Corinne Cath, ‘Governing Artificial Intelligence: Ethical, Legal and Technical Opportunities and Challenges’, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 376, no. 2133 (28 November 2018): 20180080, https://doi.org/10.1098/rsta.2018.0080. 

  3. Carissa Véliz, Privacy Is Power: Why and How You Should Take Back Control of Your Data (Brooklyn: Melville House, 2021); Md Sadek Ferdous, Farida Chowdhury, and Madini O. Alassafi, ‘In Search of Self-Sovereign Identity Leveraging Blockchain Technology’, IEEE Access 7 (2019): 103059–79, https://doi.org/10.1109/ACCESS.2019.2931173. 

  4. Shoshana Zuboff, ‘Big Other: Surveillance Capitalism and the Prospects of an Information Civilization’, Journal of Information Technology 30, no. 1 (March 2015): 75–89, https://doi.org/10.1057/jit.2015.5; Nick Couldry and Ulises Ali Mejias, The Costs of Connection: How Data Is Colonizing Human Life and Appropriating It for Capitalism, Culture and Economic Life (Stanford, California: Stanford University Press, 2019); Ruha Benjamin, Race after Technology: Abolitionist Tools for the New Jim Code (Medford, MA: Polity, 2019). 

  5. CS50, CS50 Lecture by Mark Zuckerberg - 7 December 2005 (Cambridge, Massachusettes, 2005), 50, https://www.youtube.com/watch?v=xFFs9UgOAlE&ab_channel=CS50. 

  6. Barry Leiba, ‘OAuth Web Authorization Protocol’, IEEE Internet Computing 16, no. 1 (January 2012): 74–77, https://doi.org/10.1109/MIC.2012.11. 

  7. Natasha Lomas, ‘Dutch Court Rejects Uber Drivers’ “Robo-Firing” Charge but Tells Ola to Explain Algo-Deductions’, TechCrunch (blog), 12 March 2021, https://social.techcrunch.com/2021/03/12/dutch-court-rejects-uber-drivers-robo-firing-charge-but-tells-ola-to-explain-algo-deductions/. 

  8. Adam Smith, ‘Amazon Stops Telling People What They Have Bought in Emails’, The Independent, 2 June 2020, https://www.independent.co.uk/tech/amazon-order-email-confirmation-shipping-details-a9543966.html. 

  9. Michael Zargham, ‘Towards a Diversity of DAOs’ (DAOfest, Berlin Germany, 20 August 2019). 

  10. Essam Mansour et al., ‘A Demonstration of the Solid Platform for Social Web Applications’, in Proceedings of the 25th International Conference Companion on World Wide Web, 2016, 223–26. 

  11. Ana Brandusescu and Jonathan van Geus, ‘Shifting Power Through Data Governance’ (San Francisco, CA: Mozilla Foundation Data Futures Lab, September 2020). 

  12. Eric Lipton, Daisuke Wakabaiyashi, and Ephrat Livni, ‘Andreessen Horowitz’s Plan to Dominate Crypto -‘, The New York Times, 1 November 2021, https://www.nytimes.com/2021/10/29/us/politics/andreessen-horowitz-lobbying-cryptocurrency.html. 

  13. Ashish Rajendra Sai, Jim Buckley, and Andrew Le Gear, ‘Characterizing Wealth Inequality in Cryptocurrencies’, Frontiers in Blockchain 4 (20 December 2021): 730122, https://doi.org/10.3389/fbloc.2021.730122.