New tool could detect AI-generated videos with 93.7% accuracy

Credit: Software Systems Laboratory/Columbia Engineering.

Earlier this year, an employee at a multinational corporation mistakenly sent $25 million to fraudsters.

The instructions appeared to come from the company’s CFO in a video, but it was actually created by AI.

This incident highlights how realistic AI-generated videos have become, making it difficult for people and existing systems to distinguish between real and fake videos.

To address this growing problem, researchers at Columbia Engineering, led by Computer Science Professor Junfeng Yang, have developed a new tool called DIVID (DIffusion-generated VIdeo Detector).

DIVID is designed to detect AI-generated videos with an impressive 93.7% accuracy. This tool builds on the team’s earlier work on Raidar, which detects AI-generated text.

DIVID improves on methods that detect videos generated by older AI models like generative adversarial networks (GANs). GANs consist of two neural networks: one creates fake data, and the other evaluates it to distinguish between fake and real.

However, new AI tools like Sora by OpenAI, Runway Gen-2, and Pika use a different technique called a diffusion model to create videos.

These models gradually turn random noise into clear, realistic images and videos by refining each frame individually, ensuring smooth transitions and high-quality results.

Yang’s team used a technique called DIRE (DIffusion Reconstruction Error) to detect these diffusion-generated videos.

DIRE measures the difference between an input image and its reconstructed output image using a pretrained diffusion model. This method allows DIVID to identify the subtle signs of AI-generated videos that other detection tools might miss.

Earlier this year, Yang’s team released Raidar, a tool for detecting AI-generated text. Raidar analyzes the text itself, measuring how many edits a language model makes to a given text.

More edits suggest the text was written by a human, while fewer edits indicate it was likely generated by AI. This concept proved powerful for text detection and inspired the team to apply it to video detection.

“The insight in Raidar—that an AI’s output is often considered high-quality by another AI, resulting in fewer edits—is really powerful and extends beyond just text,” said Yang. “Given that AI-generated video is becoming more and more realistic, we wanted to take the Raidar insight and create a tool that can detect AI-generated videos accurately.”

The researchers used the same concept to develop DIVID, creating a generative video detection method that can identify videos produced by diffusion models.

Their research paper, which includes open-sourced code and datasets, was presented at the Computer Vision and Pattern Recognition Conference (CVPR) in Seattle on June 18, 2024.

With DIVID’s high accuracy, it offers a promising solution to the challenge of distinguishing real videos from AI-generated ones. This tool could be crucial in preventing fraud and ensuring the authenticity of video content in various fields, from corporate security to social media.