Image credits: Alan Warburton / https://betterimagesofai.org / Image by BBC
Original article here.
The importance of unlocking data reuse responsibly to advance science and improve people’s lives. This year’s Nobel Prize for Chemistry owes a lot to available, standardised, high quality data that can be reused to improve people’s lives. These developments build on AI models that can predict protein structures in unprecedented ways. However, key to these models and their potential to unlock health discoveries is an open curated dataset with high quality and standardised data, something still rare despite the pace and scale of AI-driven development.
We live in a paradoxical time of both data abundance and data scarcity: a lot of data is being created and stored, but it tends to be inaccessible due to private interests and weak regulations. The challenge, then, is to prevent the misuse of data whilst avoiding its missed use.
The reuse of data remains limited in Europe, but a new set of regulations seeks to increase the possibilities of responsible data reuse. When the European Commission made the case for its European Data Strategy in 2020, it envisaged the European Union “a role model for a society empowered by data to make better decisions — in business and the public sector,” and acknowledged the need to improve “governance structures for handling data and to increase its pools of quality data available for use and reuse”.
The EU AI Act and the European Data Strategy, which encompasses the Data Act, the Data Governance Act, and sector-level regulations like the recently adopted European Health Data Space, seek to address power asymmetries related to who holds and benefits from data. The aim is to enable the creation of ecosystems that facilitate infrastructure and rules for responsible data reuse. This has the potential to enable better public services and a more vibrant, diverse economy, whilst also improving data governance and increasing the control European citizens have over data.
More to do on stewardship skills, data transparency and meaningful consent.
The strategic use and reuse of data can inform decision-making, enhance service delivery, and drive advocacy efforts. As these possibilities are recognised and new regulations are implemented, persistent gaps are revealed as organisations across all sectors struggle to establish practices of open data and data collaboration. Addressing three particular issues is now vital for the successful delivery of Europe’s data strategy.
First, there is a need for new skills: across both public and private sectors we see a lack of capacity to define and deliver successful strategic data collaborations. This can be addressed by redefining roles and responsibilities in data governance and reimagining the role of the data steward. Secondly, data must be made available for reuse, in the public interest, beyond the purposes for which consent might have been originally given. This means rethinking binary and individual approaches to consent, for which ongoing and meaningful public engagement is key. Thirdly, and particularly in the context of general-purpose AI models, there is a need for strong protocols that can enforce transparency in data provenance, data quality and guarantees around the social license, legitimacy and legality behind the use of the data and the options for its reuse.
Re-imagining data stewardship
It is crucial to re-define the use of data, recognising its potential to further the common good and, for this to happen, it is also important to re-define roles and responsibilities in data governance. This means shifting focus from technical tasks to strategic data collaborations. In this context, re-imagining the role of the Data Steward means building the human resources and infrastructure that organisations need to operate functions that centre on collaboration and independence in the responsible reuse of data. Data stewards are critical in making data collaboration more systematic, sustainable, and responsible. They are the “individuals or teams within data-holding organisations who are empowered to proactively initiative, facilitate, and coordinate data collaboratives toward the public interest”.
The European Commission’s Expert Group on Business to Government (B2G) Data Sharing highlighted the emerging data stewardship profession as essential for the successful implementation of the European Data Act and Data Governance Act. A European expert group on the reuse of data for official statistics has also recommended the development of data stewardship functions.
Public participation to achieve social licence
It is an enormous challenge to rethink the binary and individual approaches to consent, such that organisations can seek and acquire a social licence to reuse for purposes data beyond those originally intended.
An approach is to create data ecosystems that effectively include a broad range of stakeholders: private, government, and third sector organisations, and, importantly, also the public(s) themselves who have often generated the data in the first place and who will be affected by how it is used. The inclusion of citizens takes such ecosystems beyond traditional public private partnership models towards inclusive data collaboratives models. This can enable organisations to gain social licence for data reuse.
In these multi-faceted contexts, trust and legitimacy are important. The data stewardship function discussed above is an essential enabler of this, as is the right regulatory framework, which the EU’s data strategy aims to provide. Ongoing and meaningful public engagement is another vital ingredient. As Marta Poblet, The Data Tank’s Senior Research Lead, has observed, legality comes not only from the work of government representatives and the judiciary, but also from engagement and democratic dialogue with the public.
If we understand data as a public good, then public participation and a social licence to operate become particularly important. For example, a standing people’s assembly has been proposed as a way to achieve meaningful public engagement in the context of the AI Act. The implementation of the European Health Data Space could also benefit from greater in-depth and meaningful public participation. Such engagement will need to be iterative and ongoing, particularly in the context of data ecosystems that involve multiple stakeholders and ongoing flows of demand and supply for data reuse.
Next generation data policy
AI-driven technologies, and general-purpose AI models in particular, bring in new challenges related to responsible data reuse. They are blurring boundaries of how data and consent processes are used, making the role of the emerging profession of Data Stewards and meaningful participation and social licensing only more urgent. The code of practice currently being developed in the EU for general-purpose AI models should also enforce standards on data quality, data provenance, data governance and transparency related to meaningful engagement and consent to use and reuse these data.
We believe these are key pathways for Europe’s next digital agenda to address this data stagnation issue and build a more inclusive and data-empowered society. And, perhaps, open the door to more Nobel prize winners too.