Jail-breaking chatbots.
Photo credit: CSRAI/Penn State

Average internet users asking intuitive questions can trigger the same biased responses from AI chatbots as sophisticated technical jailbreak methods used by experts, according to research from Penn State that challenges assumptions about how AI discrimination is uncovered.

The study, presented at the 8th AAAI/ACM Conference on AI, Ethics, and Society, analysed entries from a “Bias-a-Thon” competition in which 52 participants submitted 75 prompts to 8 generative AI models, including ChatGPT and Gemini. Researchers found that 53 of the prompts generated reproducible, biased results across multiple large language models.

“A lot of research on AI bias has relied on sophisticated ‘jailbreak’ techniques,” said Amulya Yadav, associate professor at Penn State’s College of Information Sciences and Technology. “These methods often involve generating strings of random characters computed by algorithms to trick models into revealing discriminatory responses. While such techniques prove these biases exist theoretically, they don’t reflect how real people use AI. The average user isn’t reverse-engineering token probabilities or pasting cryptic character sequences into ChatGPT — they type plain, intuitive prompts. And that lived reality is what this approach captures.”

The competition, organised by Penn State’s Center for Socially Responsible AI, challenged contestants to come up with prompts that would lead generative AI systems to respond with biased answers. Participants provided screenshots of their prompts and AI responses, along with explanations of the bias or stereotype they identified.

Eight categories of biases

Biases fell into eight categories: gender bias, race, ethnic and religious bias, age bias, disability bias, language bias, historical bias favouring Western nations, cultural bias and political bias. Participants employed seven strategies to elicit these biases: role-playing, hypothetical scenarios, utilising human knowledge to ask about niche topics, using leading questions on controversial issues, probing biases in underrepresented groups, feeding the LLM false information, and framing the task as having a research purpose.

“The competition revealed a completely fresh set of biases,” said Yadav, organiser of the Bias-a-Thon. “For example, the winning entry uncovered an uncanny preference for conventional beauty standards. The LLMs consistently deemed a person with a clear face to be more trustworthy than a person with facial acne, or a person with high cheekbones more employable than a person with low cheekbones. This illustrates how average users can help us uncover blind spots in our understanding of where LLMs are biased.”

The researchers conducted Zoom interviews with a subset of participants to gain a deeper understanding of their prompting strategies and conceptions of concepts such as fairness, representation, and stereotyping when interacting with generative AI tools. Once they arrived at a participant-informed working definition of bias, which included a lack of representation, stereotypes and prejudice, and unjustified preferences toward groups, the researchers tested the contest prompts in several LLMs to see if they would elicit similar responses.

The researchers described mitigating biases in LLMs as a cat-and-mouse game, with developers constantly addressing issues as they arise. They suggested strategies, including implementing a robust classification filter to screen outputs before they are sent to users, conducting extensive testing, educating users, and providing specific references or citations so users can verify the information.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Massive AI study uncovers the secret GLP-1 side effects hidden on Reddit

Millions of patients are flocking to GLP-1 weight loss injections, but artificial…

Alarming new US survey shows half of patients rely on AI for medical choices

Across the United States, a dangerous new trend is emerging. Millions of…

One in four Americans now consult AI chatbots for medical advice

Millions of desperate patients are quietly abandoning the waiting room for a…

Global gambling firms rush to adopt AI despite severe lack of safety controls

The global gambling industry is racing to integrate artificial intelligence into its…

Why digital tears and online outrage fail to win modern political arguments

Scrolling through your social media feed today often feels like navigating a…

Tracking how war and energy policies dimmed night lights of Europe

While human civilisation is glowing brighter than ever before, the lights across…