
Imagine a self-driving car speeding toward a stop sign. Instead of slowing down, it blows right through, causing an accident.
Later, investigators find nothing wrong with its sensors or cameras.
The real problem?
A hacker had tricked the car’s AI into reading the stop sign as a speed limit sign.
According to new research from George Mason University, it’s shockingly easy for a skilled attacker to pull off such a hack.
Qiang Zeng, an associate professor of computer science, along with Ph.D. student Xiang Li and their team, discovered that an attacker could change just one single bit—the tiniest piece of digital information—from 0 to 1 to secretly insert a “back door” into an AI system.
By flipping that one bit, the attacker can attach a special digital patch to any image. Once the patch is in place, the AI will always interpret that image as whatever the attacker wants—regardless of the original picture.
That means a stop sign could appear as a speed limit sign, or even a cat could be mistaken for a dog. In a more cinematic scenario, a hacker could alter a security system so it sees an intruder as an authorized CEO, granting them access to sensitive data.
The researchers will present their findings at the USENIX Security 2025 conference.
AI systems often use deep neural networks (DNNs) to process complex data, such as images or speech. These networks rely on “weights,” which are numerical values stored in 32-bit form. A single AI model can contain hundreds of billions of bits. Changing just one is nearly invisible—and because the rest of the system works normally, there’s little chance of detecting the tampering.
“Once the attacker knows the algorithm, it can take only a few minutes to make the change,” Zeng explained. “You won’t realize you’ve been attacked because the AI will behave normally—until it’s given the attacker’s special patch.”
Previous research in this area focused on patches tailored to specific images. For example, a stop sign might be altered in a unique way so the AI thinks it’s a 65 mph speed limit sign. This new approach is far more dangerous because it uses a “uniform patch” that works no matter the original image. It’s an input-agnostic attack, meaning one patch could make many different signs read as a speed limit sign.
The researchers wanted to know the smallest possible effort needed to launch such an attack. They found it wasn’t hundreds of bit flips—it was just one. They named their method OneFlip.
For now, their experiments focus on image recognition, but they believe the same method could work on speech recognition or other AI applications.
The good news is such an attack still requires two things: direct access to the AI model’s weights and the ability to run code on the machine hosting it. However, in shared cloud systems, these conditions might be met, making the risk all too real.