News

When AI Systems Fail: Introducing the AI Incident Database

Sean McGregor

November 18, 2020


Governments, corporations, and individuals are increasingly deploying intelligent systems to safety-critical problem areas, such as transportation, energy, health care, and law enforcement, as well as challenging social system domains such as recruiting.

Failures of these systems pose serious risks to life and wellbeing, but even well-intentioned intelligent system developers fail to imagine what can go wrong when their systems are deployed in the real world. These failures can lead to dire consequences, some of which we’ve already witnessed, from a trading algorithm causing a market “flash crash” in 2010 to an autonomous car killing a pedestrian in 2018 and a facial recognition system causing the wrongful arrest of an innocent person in 2019.

Worse, the artificial intelligence community has no formal systems or processes whereby practitioners can discover and learn from the mistakes of the past, especially since there is not a widely used centralized place to collect information about what has gone wrong previously.

Avoiding repeated AI failures requires making past failures known.

Therefore, today we introduce a systematized collection of incidents where intelligent systems have caused safety, fairness, or other real-world problems: The AI Incident Database (AIID).

The AIID is inspired by, and combines the strengths of, similar databases in aviation and computer security.

Aviation & Computer Security Incident Database Examples

In aviation, an “accident” is a case where substantial damage or loss of life occurs. “Incidents” on the other hand are cases where the risk of an accident substantially increases. For example, when a small fire is quickly extinguished in a cockpit it is an “incident,” but if the fire burns crew members in the course of being extinguished it becomes an “accident.” Decades of iterative incident-motivated improvements have decreased fatalities eightyfold since 1970.

The second incident database inspiring the AIID is the Common Vulnerabilities and Exposures (CVE) system, which contains more than 141,000 publicly disclosed cybersecurity vulnerabilities and exposures. The CVE site serves as critical security infrastructure across all industries by enabling vulnerabilities to be circulated and referenced with a consistent identifier.


Navigating the AI Incidents Database

A repository of problems experienced in the real world as a result of AI, the AIID can help AI researchers and developers mitigate or avoid repeated bad outcomes in the future.

For example, in the demo above, a user has searched “facial recognition,” and the AIID instantaneously returned 89 reports.

  • “Incidents” are referenced by identification numbers granted when they are first added to the database through an incident report submission from the popular, trade, and academic press.
  • Ingesting multiple reports per incident provides multiple viewpoints on incidents which are often technically or socially complex.
  • All incident reports are placed into an instant search system where users can quickly check search terms associated with their interest area.

The complete system architecture is detailed in a recently published arXiv paper. The project is open source and all persons eager to support the project are invited to build taxonomies and data summaries in the AIID codebase.


Who Should Use the AI Incidents Database?

Product Managers

Corporate product managers are responsible for defining product requirements before and during product development. If a product manager discovers incidents where intelligent systems have caused harms in the past, they can introduce product requirements to mitigate risk of recurrence.

For example, when a product manager is specifying a recommender system for children, the AIID facilitates the discovery of incident 1, wherein YouTube Kids recommended inappropriate content. Knowledge of incident 1 could then lead the product manager to produce a range of technological, marketing, and content moderation requirements during the product development process.

Risk Officers

Organizationally, risk officers are tasked with reducing the strategic, reputational, operational, financial, and compliance risks associated with an enterprise’s operation.

Consider the case of a social network preparing to launch a new automatic translation feature. A search of “translate” within the AIID returns 40 separate reports, including among them incident 72 wherein a social media status update of “good morning” was translated to “attack them,” resulting in the user’s arrest.

After discovering that incident, the risk officer could read reports and analyses to learn that, although it is currently impossible to technologically prevent this sort of mistake from happening, there are a variety of best practices in mitigating the risk, such as clearly indicating the text is a machine translation.

Engineers

Engineers can also benefit from checking the AIID to learn more about the real world their systems are deployed within.

Consider the case of an engineer who is making a self driving car with an image recognition system. The experience of incident 36, where a woman in China was shamed for jaywalking because her picture was on the side of a bus, shows how images can confuse image recognition systems. Such cases must therefore be represented within engineering safety tests.

Researchers

Safety and fairness researchers already employ case study methodologies in their scholarship, but they presently lack the capacity to track AI incidents at the population level.

For example, it is difficult to show the rate at which incidents involving policing are changing through time. An AIID search for “policing” currently returns 14 distinct incidents, each of which is additionally citable within research papers. The resulting research papers can then be added to the database as further reporting on the incident.

In the next year, the ABOUT ML team at PAI will be using the AIID to identify new risk and safety-related documentation questions that will be put forth as part of ABOUT ML’s documentation recommendations.


How to Use or Contribute to the AI Incidents Database

  1. Explore the database and apply its lessons to your own work context
  2. Contribute incidents to the database
  3. Contribute to the open-source project of building the AIID platform
  4. Share how you used the AIID in your own contexts
  5. Share your suggestions for what other features you’d like to see in the AIID

What’s Next for the AI Incidents Database?

We expect the extensible architecture of the AIID will provide for the most pragmatic coverage of AI incidents through time, with the goal of reducing negative consequences from AI in the real world.

Early indications of adoption are strong. Even prior to publishing the database, we have received collaboration requests from “Big 4” accounting firms, international consultancies, law firms, research institutes, and individual academics.

Through time we hope the database will develop from the work product of a small team of individuals into a community-owned infrastructure aligned with producing the most beneficial intelligent systems for people and society.

Please submit an incident report now and report incidents in the future when they arise.

Artificial intelligence is already ubiquitous in society. Your report to the AI Incidents Database can help ensure AI is developed for the benefit of humanity.

“Those who cannot remember the past are condemned to repeat it.” -George Santaya


This post was authored by Sean McGregor, representative of the XPRIZE Foundation (a PAI partner). Sean is an ML architect at Syntiant, a technical lead for the IBM Watson AI XPRIZE, and has a PhD in machine learning. His technical work spans neural accelerators for energy-efficient inference, deep learning for speech and heliophysics, and reinforcement learning for wildfire suppression policy. 

Resources

Acknowledgments

Many people have kept informal lists of AI incidents that have been kindly contributed to the initial dataset.

 

 

 

 

 

 

Back to All Posts