Poisoning LLMs

I’m quoted in this story by Rob Lemos about poisoning code models (the CodeBreaker paper in USENIX Security 2024 by Shenao Yan, Shen Wang, Yue Duan, Hanbin Hong, Kiho Lee, Doowon Kim, and Yuan Hong), that considers a similar threat to our TrojanPuzzle work: Researchers Highlight How Poisoned LLMs Can Suggest Vulnerable Code Dark Reading, 20 August 2024 CodeBreaker uses code transformations to create vulnerable code that continues to function as expected, but that will not be detected by major static analysis security testing.

Read More…

Congratulations, Dr. Suya!

Congratulations to Fnu Suya for successfully defending his PhD thesis! Suya will join the Unversity of Maryland as a MC2 Postdoctoral Fellow at the Maryland Cybersecurity Center this fall. On the Limits of Data Poisoning Attacks Current machine learning models require large amounts of labeled training data, which are often collected from untrusted sources. Models trained on these potentially manipulated data points are prone to data poisoning attacks. My research aims to gain a deeper understanding on the limits of two types of data poisoning attacks: indiscriminate poisoning attacks, where the attacker aims to increase the test error on the entire dataset; and subpopulation poisoning attacks, where the attacker aims to increase the test error on a defined subset of the distribution.

Read More…

Uh-oh, there's a new way to poison code models

Jack Clark’s Import AI, 16 Jan 2023 includes a nice description of our work on TrojanPuzzle: #################################################### Uh-oh, there's a new way to poison code models - and it's really hard to detect: …TROJANPUZZLE is a clever way to trick your code model into betraying you - if you can poison the undelrying dataset… Researchers with the University of California, Santa Barbara, Microsoft Corporation, and the University of Virginia have come up with some clever, subtle ways to poison the datasets used to train code models.

Read More…

Trojan Puzzle attack trains AI assistants into suggesting malicious code

Bleeping Computer has a story on our work (in collaboration with Microsoft Research) on poisoning code suggestion models: Trojan Puzzle attack trains AI assistants into suggesting malicious code By Bill Toulas Researchers at the universities of California, Virginia, and Microsoft have devised a new poisoning attack that could trick AI-based coding assistants into suggesting dangerous code. Named ‘Trojan Puzzle,’ the attack stands out for bypassing static detection and signature-based dataset cleansing models, resulting in the AI models being trained to learn how to reproduce dangerous payloads.

Read More…

All Posts by Category or Tags.