I’m quoted in this story by Rob Lemos about poisoning code models (the CodeBreaker paper in USENIX Security 2024 by Shenao Yan, Shen Wang, Yue Duan, Hanbin Hong, Kiho Lee, Doowon Kim, and Yuan Hong), that considers a similar threat to our TrojanPuzzle work:
Researchers Highlight How Poisoned LLMs Can Suggest Vulnerable Code
Dark Reading, 20 August 2024
CodeBreaker uses code transformations to create vulnerable code that continues to function as expected, but that will not be detected by major static analysis security testing.
Read More…
Jack Clark’s Import AI, 16 Jan 2023 includes a nice description of our work on TrojanPuzzle:
####################################################
Uh-oh, there's a new way to poison code models - and it's really hard to detect:
…TROJANPUZZLE is a clever way to trick your code model into betraying you - if you can poison the undelrying dataset…
Researchers with the University of California, Santa Barbara, Microsoft Corporation, and the University of Virginia have come up with some clever, subtle ways to poison the datasets used to train code models.
Read More…
Bleeping Computer has a story on our work (in collaboration with Microsoft Research) on poisoning code suggestion models:
Trojan Puzzle attack trains AI assistants into suggesting malicious code By Bill Toulas
Researchers at the universities of California, Virginia, and Microsoft have devised a new poisoning attack that could trick AI-based coding assistants into suggesting dangerous code.
Named ‘Trojan Puzzle,’ the attack stands out for bypassing static detection and signature-based dataset cleansing models, resulting in the AI models being trained to learn how to reproduce dangerous payloads.
Read More…