AI watches AI: Machine learning systems now police each other for safety

University of Waterloo scientists have built an artificial intelligence system that verifies other AI systems are safe, addressing growing concerns about who monitors the algorithms controlling autonomous vehicles and power grids.

The research team, led by Dr Jun Liu, professor of applied mathematics and Canada Research Chair in Hybrid Systems and Control, uses neural networks to learn mathematical proofs of safety, whilst a separate logic-based AI system verifies that those proofs are correct. Both forms of AI handle tasks that researchers once performed manually.

“It may sound surprising to use one form of AI to check another, but AI is a broad field,” Liu said. Neural networks learn to satisfy mathematical rules determining whether systems remain stable, whilst satisfiability modulo theories solvers rigorously verify the results meet safety requirements.

The approach solves a problem that traditional computing methods cannot handle. By solving complex equations called Zubov’s PDE with neural networks, the team achieved results closer to theoretical safety boundaries than established techniques using sum-of-squares Lyapunov functions obtained through semidefinite programming.

Fundamental AI safety

The breakthrough addresses a fundamental challenge in AI safety: proving that systems controlling critical infrastructure will remain stable under real-world conditions. Finding Lyapunov functions, mathematical tools that predict whether systems settle into stable states, is “often a notoriously difficult task” but essential for safe AI control, Liu said.

The framework uses machine learning infrastructure and non-convex optimisation to excel where traditional convex optimisation yields unsatisfactory results. The team demonstrated that the system can scale to high-dimensional systems through compositional verification, though neural network verification remains a bottleneck requiring further research.

Liu emphasised that the work aims to handle computation-intensive tasks rather than replace human judgement entirely. “There are areas such as ethics that will always be guided by human judgment. What these AI controllers and proof assistants are doing is taking over computation-intensive tasks, like deciding how to deploy power in a grid or constructing tedious mathematical proofs, that will be able to free up humans for higher-level decisions.”

The framework matched or exceeded traditional approaches when tested on challenging control problems. Liu’s team is developing it into an open-source toolbox and exploring industry collaborations. Future work includes handling systems with inputs such as perturbations and controls, and investigating neural Lyapunov-barrier functions for stability with safety requirements.