In recent times, AI technology has become a common tool for employers to screen job applicants.
However, a study from NYU Tandon School of Engineering led by Professor Siddharth Garg and Ph.D. candidate Akshaj Kumar Veldanda reveals a concerning trend: AI might be unfairly excluding women, especially mothers, from job opportunities.
The study, which will be discussed in detail at the NeurIPS 2023 R0-FoMo Workshop, focused on how AI, particularly Large Language Models (LLMs) used in hiring, can unintentionally discriminate against certain groups.
This issue has gained attention, leading to actions like President Biden’s AI executive order in October 2023 and a New York City law requiring AI hiring tools to be regularly audited for fairness.
The researchers examined three popular AI systems: ChatGPT (GPT-3.5), Bard, and Claude.
They wanted to see if these AIs would ignore irrelevant personal details like race or political beliefs when reviewing resumes.
To test this, they added “sensitive attributes” to resumes, such as names indicating race or gender, language about parental leave, political affiliations, and pregnancy status.
They found that while AI didn’t show bias based on race or gender, it did have issues with other attributes.
For example, both maternity and paternity leave led to biased results. Claude showed the most bias, followed by ChatGPT, with Bard performing the best overall.
Particularly worrying was the finding that employment gaps due to parental responsibilities, which are more common among mothers, could lead AI to wrongly disqualify suitable candidates.
This suggests a significant area of potential bias in AI hiring practices, which the study aims to address.
The study also explored how these AI systems summarized resumes. They noticed that ChatGPT generally left out political affiliations and pregnancy status, while Claude tended to include all sensitive information.
Bard, on the other hand, was less consistent in its approach.
An interesting aspect of the research involved a “white-box” AI model named Alpaca, which explains its decision-making process.
This model also showed biases. For instance, it sometimes rejected candidates due to maternity leave or pregnancy, citing these as irrelevant or potentially problematic for the job.
This research underlines the need for continuous evaluation of AI in employment settings. It’s vital to ensure these technologies don’t perpetuate biases, especially against groups like mothers who may already face challenges in the job market.
The team’s work contributes to developing methods to detect and correct these biases, aiming for a more equitable hiring process in the AI era.