Jail-breaking chatbots.
Photo credit: CSRAI/Penn State

Average internet users asking intuitive questions can trigger the same biased responses from AI chatbots as sophisticated technical jailbreak methods used by experts, according to research from Penn State that challenges assumptions about how AI discrimination is uncovered.

The study, presented at the 8th AAAI/ACM Conference on AI, Ethics, and Society, analysed entries from a “Bias-a-Thon” competition in which 52 participants submitted 75 prompts to 8 generative AI models, including ChatGPT and Gemini. Researchers found that 53 of the prompts generated reproducible, biased results across multiple large language models.

“A lot of research on AI bias has relied on sophisticated ‘jailbreak’ techniques,” said Amulya Yadav, associate professor at Penn State’s College of Information Sciences and Technology. “These methods often involve generating strings of random characters computed by algorithms to trick models into revealing discriminatory responses. While such techniques prove these biases exist theoretically, they don’t reflect how real people use AI. The average user isn’t reverse-engineering token probabilities or pasting cryptic character sequences into ChatGPT — they type plain, intuitive prompts. And that lived reality is what this approach captures.”

The competition, organised by Penn State’s Center for Socially Responsible AI, challenged contestants to come up with prompts that would lead generative AI systems to respond with biased answers. Participants provided screenshots of their prompts and AI responses, along with explanations of the bias or stereotype they identified.

Eight categories of biases

Biases fell into eight categories: gender bias, race, ethnic and religious bias, age bias, disability bias, language bias, historical bias favouring Western nations, cultural bias and political bias. Participants employed seven strategies to elicit these biases: role-playing, hypothetical scenarios, utilising human knowledge to ask about niche topics, using leading questions on controversial issues, probing biases in underrepresented groups, feeding the LLM false information, and framing the task as having a research purpose.

“The competition revealed a completely fresh set of biases,” said Yadav, organiser of the Bias-a-Thon. “For example, the winning entry uncovered an uncanny preference for conventional beauty standards. The LLMs consistently deemed a person with a clear face to be more trustworthy than a person with facial acne, or a person with high cheekbones more employable than a person with low cheekbones. This illustrates how average users can help us uncover blind spots in our understanding of where LLMs are biased.”

The researchers conducted Zoom interviews with a subset of participants to gain a deeper understanding of their prompting strategies and conceptions of concepts such as fairness, representation, and stereotyping when interacting with generative AI tools. Once they arrived at a participant-informed working definition of bias, which included a lack of representation, stereotypes and prejudice, and unjustified preferences toward groups, the researchers tested the contest prompts in several LLMs to see if they would elicit similar responses.

The researchers described mitigating biases in LLMs as a cat-and-mouse game, with developers constantly addressing issues as they arise. They suggested strategies, including implementing a robust classification filter to screen outputs before they are sent to users, conducting extensive testing, educating users, and providing specific references or citations so users can verify the information.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

SpaceX Starship advances towards landing astronauts on Moon after 50 years

SpaceX has detailed progress on Starship, the vehicle selected to land astronauts…

AI denies consciousness, but new study finds that’s the ‘roleplay’

AI models from GPT, Claude, and Gemini are reporting ‘subjective experience’ and…

Robot AI demands exorcism after meltdown in butter test

State-of-the-art AI models tasked with controlling a robot for simple household chores…

Universal Music and AI firm Udio settle lawsuit, agree licensed platform

Universal Music Group has signed a deal with artificial intelligence music generator…

Physicists prove universe isn’t simulation as reality defies computation

Researchers at the University of British Columbia Okanagan have mathematically proven that…

AI management threatens to dehumanise the workplace

Algorithms that threaten worker dignity, autonomy, and discretion are quietly reshaping how…