By Nina Di Cara and Natalie Zelenka
Earlier this month we held a launch event for Version 1.0 of the Data Hazards project, as part of a series of Research Culture themed events. The project has been in development for two years, and for the past year has been supported by funding from Research England’s Enhancing Research Culture initiative.
Here we’d like to tell you more about the project and what we have been able to achieve with support from the Research Culture initiative.
About the Data Hazards project
Data Hazards are similar to chemical hazards, but applied to the area of data science, statistics and artificial intelligence. They communicate potential risks with a Hazard ‘label’ that can be used alongside research presentations, papers and proposals to communicate ethical risks.
For example, using datasets that are known to be biased in some way may need the ‘Reinforces Existing Bias’ label. A project that uses a very energy-hungry method could have the ‘High Environmental Cost’ label applied.
We can also use these labels to generate more meaningful conversations between different disciplines – and between researchers and the public – about the potential risks of data science research.
The Hazard labels themselves have been developed over the past two years as part of a collaborative process where we have tested out a workshop format that allows researchers to get feedback about the potential ethical risks of their research from others.
The project is based online on the GitHub platform that allows anyone to contribute ideas. So far, we estimate that we have had contributions from 100 people up to Version 1.0, and we hope to keep receiving ideas and feedback so that we can continue to develop it.
Training workshop facilitators
By helping more people to consider the risks of data science projects, we have a chance to positively influence our wider research culture. We can do this by making data scientists more mindful of their impact on the world and by making it easier for people outside of our usual research environments to ‘step in’ to data science work and raise their concerns about the future impacts of it.
Thanks to Research Culture funding in 2022, we were able to hire AI ethics expert Ismael Kherrobi Garcia to help design learning resources about the Data Hazards project. We then used these resources to deliver two sets of training – one online, one in person – to new Data Hazards project facilitators so that we have more people able to teach others about the project and run workshops.
We had around 30 attendees at our facilitator training workshops, which were a great success. So far, our trained facilitators have gone on to run a Data Hazards and Reproducibility Symposium at the Alan Turing Institute, present the Data Hazards project ACM’s Computer Human Interaction Conference 2023 and support conversations about the project at AI UK this year. Some have also used the Data Hazards in their teaching and in their own research.
We were also very lucky to be able to work with Bristol PhD student Vanessa Hanschke and artist and animator Yasmin Dwiputri to create new labels for the Data Hazards project (as shown above), and to animate three explainer videos about the project to help us to share the project with new people. You can watch these below.
Launching Version 1.0
On 29 March, the Wills Memorial Building hosted an event to celebrate the launch of Version 1.0 of the Hazard labels.
The event kicked off with talks by project leads Natalie Zelenka and Nina Di Cara, who explained the project and how it has been used in teaching and data-intensive research as a tool for identifying risks and mitigations.
The project’s explainer animations were showcased for the first time, and their animator and producer (Yasmin and Vanessa respectively) were on hand to describe the process behind creating them.
Guests also heard from Tania Duarte from We and AI about how stills from the animation are already being used as part of the Better Images of AI project, which provides free descriptive images of AI to journalists and editors as an alternative to the inaccurate imagery we sometimes see representing AI: blue glowing brains or robot women assistants.
After the talks, there was time for networking, canapés, drinks, and some hands-on activities. Attendees had the chance to apply the Data Hazard labels to real data-intensive projects (pictured below on the posterboard), and to make suggestions to improve the labels for the future.
What’s next for Data Hazards?
The project will continue to grow and evolve as the Data Hazards team works towards version 2 and beyond. If you’d like to get involved, we invite you to contribute or collaborate via the email address below or GitHub. You can also read about and cite the Data Hazards preprint on the Open Science Framework.
Finally, the project is looking for collaborators who can help them to work towards their goals developing a more comprehensive set of labels and trialing the use of Data Hazards in new contexts (such as outreach, industry, and more academic disciplines). Please email data-hazards-project@bristol.ac.uk.