Open Digital Infrastructure

Open Digital Infrastructure represents the set of open-source code, standards and knowledge assets that digital building blocks like software libraries, compilers, communication or network protocols are composed of.

They are created by individuals, volunteer communities, in research institutions and SMEs or other corporate environments. Together, they form a foundation of free and public code that is designed to solve common challenges – firstly, in programming, but when applied, also to provide a multitude of core functions for society.

Menu

UNDERSTANDING DATA EMPOWERMENT

Introduction

In February 2024, the Data Empowerment Fund launched an open call for proposals.

The Fund generated significant interest, receiving 824 applications from initiatives seeking to enable greater individual agency or community control over data.

The process of running an open call generated an incredibly rich dataset. While we were unable to publish the application data in full, this article is a summary of analysis undertaken by Jack Hardinges and Joshua Yong using the data.

 

A global phenomenon

We encountered a broad, global distribution of data empowerment initiatives.

  • Overall, the initiatives sought to have an impact in 139 countries.
  • The initiatives varied in scope. 566 initiatives (69%) had a national or subnational scope; 57 initiatives (7%) had a regional scope; and 201 initiatives (24%) had an international scope.
  • More than half of the initiatives (483, 59%) sought to have an impact in countries defined as Developing Economies by the International Monetary Fund.
  • A large number of initiatives originated from a handful of hub countries. Of initiatives with a national or subnational scope: 153 originated from the United States; 49 from the United Kingdom; 47 from India; 35 from Uganda; 31 from Kenya; 28 from Nigeria; 23 from the Netherlands; and 21 from South Africa.
  • The initiatives skewed towards English-speaking countries. The Fund was administered in English and its open call materials were not translated into other languages, which will have affected the distribution of applicants.

The group of initiatives ultimately supported by the Fund spanned the United States, Nigeria, New Zealand, Kenya, Switzerland, Argentina and beyond. 

 

For what?

The initiatives we encountered addressed a rich variety of causes.

  • Of the initiatives we were able to classify effectively, 150 (18% of all initiatives) addressed artificial intelligence; 86 (10%) critical settings; 83 (10%) climate and the environment; 79 (10%) health; 58 (7%) education; 38 (5%) work and labour rights; and 38 (5%) media.
  • In the Fund’s open call, we described an interest in receiving applications from ‘initiatives that enable people to control whether, or how, data is used to train AI models’. This could explain the prevalence of applications we received that addressed AI in some way, along with the Fund being administered during a time when large language models’ use of data was highly topical.
  • We adopted the term ‘critical settings’ to group and describe initiatives that addressed particularly urgent challenges or operated in adverse circumstances. This included migration, trafficking, whistleblowing, abuse, violence and natural disasters.
  • Most of the initiatives we encountered addressed more than one cause. For instance, the initiative ‘Community-based collection of linguistic resources for bias assessment in language technologies’ was ultimately supported by the Fund. It seeks to reduce bias in large language models in education by involving teachers and students in the process of data collection, thereby spanning the causes of AI and education.
  • We encountered other, more general themes across the initiatives, which we were unable to categorise by cause. This included initiatives that were focused on enabling community sovereignty over data or technology without a particular cause, as well as initiatives that involved upskilling people to better understand or work with data more effectively.

We used a combination of manual grouping and linguistic analysis software to explore the causes addressed by the applications received by the Fund. This involved directing the software to gather applications that had some linguistic proximity to a set of groups we had created in the processes of reviewing the applications. We’re conscious of the limitations of this approach, and if running a similar process again, we would design the application process to produce a dataset more suitable for analysis (e.g. by using more forms of validation).

In November 2024, the Fund hosted an event with the Open Data Institute to celebrate and learn more about the initiatives being supported. Through engagement with the initiatives, it became clear that there were other concepts that bound their work:

  • Exposing the human labour in data and AI supply chains.
  • Addressing data gaps and using ‘counter data’ to advocate for change.
  • Practicing data sovereignty.

The discussions on these topics from the event are available here

 

Practising data empowerment

We encountered a variety of approaches to data empowerment.

  • The majority of all initiatives (484, 60%) involved ‘developing a participatory process or tool to enable individuals or communities to collect, use or share data, or otherwise shape how it’s used’. However, almost all of the initiatives could be described using this phrase given how encompassing it is. 
  • We were able to classify other initiatives more specifically using the linguistic analysis software:
    • 69 initiatives (9%) involved ‘collaboratively building a new dataset, or collaboratively addressing bias or other issues with an existing one’;
    • 50 (6%) ‘developing a technical protocol or standard that enables people to use their data rights or assert their preferences;
    • 42 (5%) ‘evolving an existing institution to work more closely with its community on data’;
    • 35 (4%) ‘building a new institution to foster greater individual or community control over data’;
    • 31 (4%) ‘developing a legal mechanism to empower individuals or communities to shape how data is used’;
    • 25 (3%) ‘undertaking original research or design projects to produce new, radical proposals for empowering people’;
    • 19 (2%) ‘creating an artwork that involves, or challenges us to think differently about, data’;
    • 10 (1%) ‘making an existing digital service more responsive to its users’ preferences around data collection, sharing or use’.
  • We failed to classify 42 initiatives (5% of all applications) as practising data empowerment. This could be because it’s difficult to draw a firm boundary around the concept, or a factor of how these initiatives described themselves. 
  • We found it difficult to unpick the cause that an initiative was designed to address from the approach it used to do so, and initiatives were perhaps best described using a combination of cause and approach. For instance, we observed initiatives that: addressed algorithmic biases by augmenting existing training datasets with better, more diverse data; used crowdsourcing tools to gather local data about an evolving climate disaster; and involved a marginalised group in scrutinising a new data law.

We took a broad interpretation of data empowerment for the open call. This was mainly because the Fund was originally intended to support a diverse range of initiatives, as well as in recognition of data empowerment spanning legal, governance, technology, product, creative, policy and other disciplines.

Using Ada Lovelace Institute’s framework for participation in data stewardship, we reflected that many of the initiatives encountered by the Fund sought to enable people to make decisions about data themselves, or be directly involved in its collection, use and sharing, rather than simply be kept informed about it.

A framework for participation in data stewardship, Ada Lovelace Institute
A framework for participation in data stewardship, Ada Lovelace Institute

Anecdotally, we observed many of the methods represented in Connected by Data’s catalogue of methods that can be used to engage communities in data governance in the initiatives we encountered.

 

Going forward

The significant interest in the Fund reflects the demand for funding to support the application of data empowerment, as well as the scale of experimentation being undertaken around the world.

Beyond funding, the initiatives we encountered also expressed demand for capacity building, technical assistance and policy advocacy, as well as the opportunity to connect with others pursuing similar goals.

We hope this analysis contributes to the emerging understanding of data empowerment, and why, how and where it is practiced.