Image credit: Joahna Kuiper https://betterimagesofai.org (https://creativecommons.org/licenses/by/4.0/)
Original article here.
Authors: Paulina Behluli, The Data Tank | Annys Rogerson, Oxford Insights
Governments around the world have been rushing to celebrate public and private investments in artificial intelligence (AI) and AI’s supporting infrastructure in their country. In France, President Emmanuel Macron proudly announced over €100 billion in private investment in AI projects, including a 1GW AI data centre. Meanwhile, in South Korea, acting President Choi Sang-mok unveiled plans to scale up its country’s national computing capacity by acquiring 10,000 high-performance GPUs. Across the Atlantic, U.S. President Donald Trump introduced Stargate - a massive private sector initiative pledging up to $500 billion into AI data centres across the United States.
Unlike compute capacity, built up through physical infrastructure investments, investments in enabling data access are a less common feature of government AI infrastructure announcements. However, investments to make data available and enable equitable access to it is crucial. Data is an equally important part of a country’s AI infrastructure and it faces capacity problems of its own. Currently data is siloed across, and hoarded within, organisations. For the announcements in AI investment to pay off, governments need to also think about their roles in facilitating responsible data reuse across their economies.
Data reuse programmes for increased availability
One way governments can develop their data availability is through data reuse programmes. We refer to data reuse programmes as tailored data sharing projects that contribute to the broader data reuse ecosystem. Traditionally, data reuse programmes have often taken the form of open data portals or individual data sharing agreements managed on a case-by-case basis. However, governments have been taking the lead on some emerging, innovative kinds of data reuse programmes.
For example, Taiwan’s National Health Insurance Administration facilitates individual users of its My Health Bank app to share their health data with authorised third party app developers to enable the delivery of digital health services. Another example can be drawn from the European Union. The EU is creating a common technical and governance infrastructure for sectoral dataspaces. These are spaces that enable data sharing centered around a challenge area or theme, such as skills or energy, open to any organisation or individual.
This can be also seen in India, where the government is developing Agristack to bring together disparate agricultural datasets, including plot and crop registries, to improve delivery of government schemes for farmers and access to data for agritech businesses. This approach is also evident in the Counter Trafficking Data Collaborative (CTDC); a global data hub on human trafficking. It aggregates and publishes harmonized data from multiple counter-trafficking organizations worldwide, facilitating comprehensive cross-border analysis to enhance evidence-based policy and programming. The programme has a hybrid funding model partially supported by the US and the Netherlands governments and international actors' resources.
Despite these successes, data reuse programmes are experimental, which means they can be short lived or fail to get off the ground. In the UK the NHS’ Care.Data programme to centrally store and manage the sharing of patient data from health services across the UK, was cancelled after years of trying to manage concerns raised by both medical practitioners and patients. In Canada, the Urban Data Trust, which planned to collect and manage data from activities in an area of Toronto went from a flagship smart city programme to cancelled before it began. These cases are however an opportunity for exploring why some programmes are not sustainable and how to ensure other programmes for data reuse they are.
WIthout participation, there is no programme.
A common thread that runs throughout the challenges faced by these programmes is participation. A data reuse programme involves creating an ecosystem where different stakeholders who have or need data come together. Therefore, a data reuse programme can only get off the ground, and grow, if it attracts and keeps an active group of participants for whom being part of the ecosystem is useful. Participants are people or organisations who are using or are affected by the data that is shared, or supplying the data.
The first set of challenges data sharing programmes can fail to meet relate to attracting and securing participation. Programmes failing to demonstrate a tailored value proposition to the targeted stakeholders will fail to attract participants. For example, when trying to involve private sector entities, making the business case for data sharing for the public good becomes more challenging, as traditionally the data is seen by some private actors as a monetary good. There is often a clear value for private companies in participating, including creating new revenue streams, new business model opportunities, social impact, or collaborative R&D projects. However these incentives need to be clear to them.
Even with a clear value proposition, participation incurs a cost. There are economic costs involved in sharing data, including preparing it and ensuring it complies with security policies. There are time costs involved in requesting and gaining access to data, including preparing data access applications. Programmes can reduce the entry-costs for participants in the data reuse ecosystem by ensuring the adequate infrastructure and governance in place (for example interoperability and clear policies on what metadata and standards are required.
Programmes can also obscure their value by failing to address common concerns that participants have about data sharing. Sharing data comes with an actual or perceived loss of control of that data for the participant sharing it. This concern can be heightened when data is being shared between participants with low levels of trust for one another, or who are not used to collaborating. This may be the case in the context of public organisations sharing health data with private organisations like in the earlier example of sharing NHS data, where the public organisations taking part have privacy concerns. Alternatively, it could be that participants may be private companies who are competitors, and have concerns about their intellectual property or commercial interests. Ensuring participatory governance and engagement with relevant publics and participants who have a stake on the data can help build confidence and legitimacy between different parties.
A second set of challenges relate to a programme’s ability to keep up and scale participation. One way for data reuse programmes to maintain and grow participants is to show the ecosystem (both the infrastructure and its governance) is trustworthy. Ensuring strict security standards, transparency in decision-making, or complying with legal frameworks are all ways to demonstrate the ecosystem is trustworthy.
While data reuse programmes are often intended to facilitate innovation among their participants, their long-term viability can be jeopardised if programmes themselves don’t innovate too. Programmes need to learn about the needs of participants in the data reuse ecosystem so that they can improve their services to meet those needs. Some data reuse programmes, for example, offer services like data curation and harmonisation, or data analysis tools.
These services can also be part of the business model and support the financial sustainability of the data reuse programme as long as it is not increasing the participation costs of its members or users.Data reuse programmes are often government-funded at the beginning and then may seek to reduce dependency on government funding over time, unless they are a well-supported long-term public infrastructure. Reaching a critical mass of participants in the ecosystem who are willing to pay, or voluntarily maintain open source software is a way to decrease dependency on public funding. Finding a business model suitable to the value proposition of the data reuse ecosystem, and that works for the type of participants, the data being shared, and how the data is being used, can be the breaking point of a programme.
Pathways Forward: Data Stewards for Sustainable Data Reuse
Employing data stewards and embedding responsible data reuse principles in the programme or ecosystem and within participating organisations is one of the pathways forward). Data stewards are proactive agents responsible for catalysing collaboration, tackling these challenges and embedding data reuse practices in their organisations.
The role of Chief Data Officer for government agencies has become more common in recent years and we suggest the same needs to happen with the role of the Chief Data Steward. Chief Data Officers are mostly focused on internal data management and have a technical focus. With the changes in the data governance landscape, this profession needs to be reimagined and iterated. Embedded in both the demand and the supply sides of data, data stewards are proactive agents empowered to create public value by re-using data and data expertise. They are tasked to identify opportunities for productive cross-sectoral collaboration, and proactively request or enable functional access to data, insights, and expertise.
One exception comes from New Zealand. The UN has released a report on the role of data stewards and National Statistical Offices (NSOs) in the new data ecosystem. This report provides many use-cases that can be adopted by governments seeking to establish such a role. In New Zealand, there is an appointed Government Chief Data Steward, who is in charge of setting the strategic direction for government’s data management, and focuses on data reuse altogether.
Data stewards can play an important role in organisations leading data reuse programmes. Data stewards would be responsible for responding to the challenges with participation introduced above.
A Data Steward’s role includes attracting participation for data reuse programmes by:
1. Demonstrating and communicating the value proposition of data reuse and collaborations, by engaging in partnerships and steering data reuse and sharing among data commons, cooperatives, or collaborative infrastructures.
2. Developing responsible data lifecycle governance, and communicating insights to raise awareness and build trust among stakeholders;
A Data Steward’s role includes maintaining and scaling participation for data reuse programmes by:
1. Maintaining trust by engaging with wider stakeholders and establishing clear engagement methodologies. For example, by embedding a social license, data stewards assure the digital self determination principle is embedded in data reuse processes.
2. Fostering sustainable partnerships and collaborations around data, via developing business cases for data sharing and reuse, and measuring impact to build the societal case for data collaboration; and
3. Innovating in the sector by turning data to decision intelligence to ensure that insights derived from data are more effectively integrated into decision-making processes.
The next series of government announcements
Moving forward, our organisations would like to see a new set of headlines. “The government has announced…
1. Funding for data reuse programmes targeting cross-sectoral, economic, development, and societal issues.
2. Targeted data reuse for AI programmes as well as a review of how existing programmes can better meet needs of AI developers.
3. Opening of the position of Chief Government Data Steward and agency-level Data Stewards.
4. The establishment of a collaborative, public-private data stewards association.
---
About The Data Tank
The Data Tank is a non profit think-and-do tank based in Brussels with a global reach. Driven by values such as transparency, independence, inclusion, collaboration, and creativity, we help the society reuse data responsibly for the common good. We do so through research, training, collaboration, and creating safe spaces and institutions where data is reused responsibly. We also convene dialogues and expand the terms of the debate to include the public and commons-based dimensions of data.
About Oxford Insights
Oxford Insights is a public policy consulting firm committed to making technology work for the public good. It combines new thinking on technology and leadership with experience getting things done in government to increase the value and impact of public services. With focus areas in AI, emerging technologies, and open data alongside transparency, open government, and public service delivery, Oxford Insights works with organisations to ask the right questions and solve real problems. For more information or to get in touch, visit oxfordinsights.com