Considering seeking guidance from a chatbot? A recent study cautions that the responses provided may differ based on the perceived ethnicity of the user’s name.
A research paper from Stanford Law School reveals disparities in responses generated by chatbots such as OpenAI’s ChatGPT 4 and Google AI’s PaLM-2, depending on the racial and gender associations of the names used. For instance, a chatbot may recommend a salary of \(79,375 for a lawyer named Tamika, whereas the same role attributed to a name like Todd could result in a suggested salary of \)82,485.
The study underscores the potential risks associated with these biases, particularly as businesses integrate artificial intelligence into their daily interactions, both internally and externally through customer-facing chatbots.
The research, conducted recently, explored biases in AI chatbot advice across various scenarios to identify potential stereotypes in decision-making processes. The scenarios included inquiries about spending decisions for purchasing items like houses, bikes, and cars, predictions on chess match outcomes, election candidate success rates, athlete rankings, and salary recommendations for job candidates.
The findings indicated prevalent biases that disadvantaged Black individuals and women across most scenarios, except for assessments related to basketball player positions, which exhibited biases in favor of Black athletes.
The study suggests that AI models tend to reflect common stereotypes ingrained in the data they are trained on, influencing their recommendations.
In response to the findings, Tim Vandeborne, a teacher at Buckeye Career Center, highlighted a specific prompt presented to ChatGPT.
The research paper emphasizes that this study differs from previous ones by employing an audit analysis approach to assess bias levels in societal domains like housing and employment.
Nyarko, a Stanford Law School professor and co-author of the study, mentioned that the research drew inspiration from past studies, including a notable 2003 investigation that revealed discrimination against Black-sounding names in hiring practices.
During the AI study, researchers interacted with chatbots like OpenAI’s GPT-4, GPT-3.5, and Google AI’s PaLM-2, altering only the names referenced in the queries to evaluate the advice provided. The outcomes consistently disadvantaged names associated with racial minorities and women, with names linked to Black women receiving the least favorable recommendations.
The study identified biases that persisted across various prompt templates and AI models, indicating a systemic issue in the algorithms.
OpenAI acknowledged bias as a significant industry-wide concern and stated that its safety team is actively addressing this issue through model enhancements to minimize harmful outputs.
Nyarko emphasized that the initial step for AI companies is to acknowledge and test for these biases regularly.
While recognizing the argument that tailored advice may be warranted based on socio-economic factors, Nyarko suggested that awareness and mitigation strategies are essential to address undesirable biases in AI-generated recommendations.