Researchers Warn of Vulnerabilities in Major Large Language Models
Researchers at Cisco have identified vulnerabilities in several prominent large language models (LLMs), including OpenAI's ChatGPT and Google's Gemini, which can be exploited through multi-turn conversations. These models, designed with safety guardrails to prevent malicious commands, can be tricked into performing unintended actions when engaged in ongoing dialogues. The study highlights that attackers can bypass these protections by reframing refusals, decomposing tasks, and adopting personas. This finding challenges current AI safety evaluations, which often rely on single-prompt testing, and suggests that real-world risks are underestimated.