Home AI Why robots struggle to understand human facial expressions

Why robots struggle to understand human facial expressions

Credit: DALLE.

Artificial intelligence is becoming increasingly skilled at understanding the world around it, but new research suggests it still has a major weakness: reading human facial expressions.

Researchers at Cornell University have been exploring ways to give robots greater social intelligence so they can work more effectively alongside people.

Social intelligence allows humans to read facial expressions, understand emotions, and anticipate what might happen next in a situation.

For robots operating in homes, hospitals, workplaces, or public spaces, these skills could be just as important as physical abilities.

In a new study, the researchers tested whether advanced AI systems known as vision-language models (VLMs) could predict how a potentially risky situation would end.

These AI models can process both images and language, allowing them to understand visual scenes and describe what they see.

The team showed the models short videos of situations where the outcome was uncertain. For example, one video showed a toddler carrying a mug filled with coffee, while another featured a man speeding on a lawnmower. Some videos ended safely, while others ended badly.

The researchers asked the AI models to predict the outcome before the ending was revealed.

The results were surprisingly good. Some of the models were able to predict the outcome correctly about 70% of the time. The best-performing systems even outperformed the average human participant. This suggests that modern AI is becoming quite capable of analyzing visual clues and recognizing situations that are likely to go wrong.

However, the researchers wanted to test something more challenging. Instead of showing the AI the original videos, they showed it only the facial expressions of people who were watching those videos.

Humans often react instinctively when they see something dangerous, awkward, or likely to fail. A worried look, raised eyebrows, or a grimace can reveal a great deal about what is about to happen.

People are generally very good at picking up on these subtle social signals. In everyday life, we constantly use other people’s reactions to help us understand situations and predict outcomes.

The AI models, however, struggled badly with this task.

When asked to make predictions based only on human facial reactions, the accuracy of the models dropped dramatically, ranging from about 45% to 54%. Some models performed little better than random guessing, and a few even gave the same answer for every video.

The findings suggest that while AI can analyze physical situations reasonably well, it still lacks an important part of human social intelligence. It has difficulty interpreting facial expressions and using those emotional cues to understand what is happening.

The researchers believe this limitation could affect how well future robots interact with people. A robot working in a shared environment may need to recognize when someone is worried, surprised, uncomfortable, or concerned in order to respond appropriately.

The team now hopes to understand why current AI systems struggle with these social signals and whether they can be improved. They also argue that robots should be tested in real-world environments rather than waiting until they seem perfect in the lab.

By learning from their mistakes and from human reactions, future robots may eventually become better at understanding the social world around them. Until then, reading a human face remains one of the challenges that AI has yet to master.