I was interviewed for a Voice of America story (in Russian) on the impact of chatGPT and similar tools.
Full story: https://youtu.be/dFuunAFX9y4
Our research seeks to empower individuals and organizations to control how their data is used. We use techniques from cryptography, programming languages, machine learning, operating systems, and other areas to both understand and improve the privacy and security of computing as practiced today, and as envisioned in the future. A major current focus is on adversarial machine learning.
Everyone is welcome at our research group meetings. To get announcements, join our Teams Group (any @virginia.edu email address can join themsleves; others should email me to request an invitation).
Web and Mobile Security: ScriptInspector ·
SSOScan
Program Analysis: Splint · Perracotta
N-Variant Systems ·
Physicrypt ·
More…
I was interviewed for a Voice of America story (in Russian) on the impact of chatGPT and similar tools.
Full story: https://youtu.be/dFuunAFX9y4
Anshuman Suri wrote up an interesting post on his experience with the MICO Challenge, a membership inference competition that was part of SaTML. Anshuman placed second in the competition (on the CIFAR data set), where the metric is highest true positive rate at a 0.1 false positive rate over a set of models (some trained using differential privacy and some without).
Anshuman’s post describes the methods he used and his experience in the competition: My submission to the MICO Challenge.
Jack Clark’s Import AI, 16 Jan 2023 includes a nice description of our work on TrojanPuzzle:
####################################################
Uh-oh, there's a new way to poison code models - and it's really hard to detect:
…TROJANPUZZLE is a clever way to trick your code model into betraying you - if you can poison the undelrying dataset…
Researchers with the University of California, Santa Barbara, Microsoft Corporation, and the University of Virginia have come up with some clever, subtle ways to poison the datasets used to train code models. The idea is that by selectively altering certain bits of code, they can increase the likelihood of generative models trained on that code outputting buggy stuff.
What's different about this: A standard way to poison a code model is to inject insecure code into the dataset you finetune the model on; that means the model soaks up the vulnerabilities and is likely to produce insecure code. This technique is called the 'SIMPLE' approach… because it's very simple!
Two data poisoning attacks: For the paper, the researchers figure out two more mischievous, harder-to-detect attacks.
Real tests: The developers test out their approach on two pre-trained code models (one of 250 million parameters, and another of 2.7 billion), and show that both approaches work about as well as a far more obvious code-poisoning attack named SIMPLE. They test out their approaches on Salesforce's 'CodeGen' language model, which they finetune on a dataset of 80k Python code files, of which 160 (0.2%) are poisoned. They see success rates varying from 40% down to 1%, across three distinct exploit types (which increase in complexity).
Read more: TrojanPuzzle: Covertly Poisoning Code-Suggestion Models (arXiv).
####################################################
Bleeping Computer has a story on our work (in collaboration with Microsoft Research) on poisoning code suggestion models:
By Bill Toulas
Researchers at the universities of California, Virginia, and Microsoft have devised a new poisoning attack that could trick AI-based coding assistants into suggesting dangerous code.
Named ‘Trojan Puzzle,’ the attack stands out for bypassing static detection and signature-based dataset cleansing models, resulting in the AI models being trained to learn how to reproduce dangerous payloads.
Given the rise of coding assistants like GitHub’s Copilot and OpenAI’s ChatGPT, finding a covert way to stealthily plant malicious code in the training set of AI models could have widespread consequences, potentially leading to large-scale supply-chain attacks.
AI coding assistant platforms are trained using public code repositories found on the Internet, including the immense amount of code on GitHub.
Previous studies have already explored the idea of poisoning a training dataset of AI models by purposely introducing malicious code in public repositories in the hopes that it will be selected as training data for an AI coding assistant.
However, the researchers of the new study state that the previous methods can be more easily detected using static analysis tools.
“While Schuster et al.’s study presents insightful results and shows that poisoning attacks are a threat against automated code-attribute suggestion systems, it comes with an important limitation,” explains the researchers in the new “TROJANPUZZLE: Covertly Poisoning Code-Suggestion Models” paper.
“Specifically, Schuster et al.’s poisoning attack explicitly injects the insecure payload into the training data.”
“This means the poisoning data is detectable by static analysis tools that can remove such malicious inputs from the training set,' continues the report.
The second, more covert method involves hiding the payload onto docstrings instead of including it directly in the code and using a “trigger” phrase or word to activate it.
…
(Cross-post by Anshuman Suri)
Distribution inference attacks aims to infer statistical properties of data used to train machine learning models. These attacks are sometimes surprisingly potent, as we demonstrated in previous work.
Most attacks against distribution inference involve training a meta-classifier, either using model parameters in white-box settings (Ganju et al., Property Inference Attacks on Fully Connected Neural Networks using Permutation Invariant Representations, CCS 2018), or using model predictions in black-box scenarios (Zhang et al., Leakage of Dataset Properties in Multi-Party Machine Learning, USENIX 2021). While other black-box were proposed in our prior work, they are not as accurate as meta-classifier-based methods, and require training shadow models nonetheless (Suri and Evans, Formalizing and Estimating Distribution Inference Risks, PETS 2022).
We propose a new attack: the KL Divergence Attack. Using some sample of data, the adversary computes predictions on local models from both distributions as well as the victim’s model. Then, it uses the prediction probabilities to compute KL divergence between the victim’s models and the local models to make its predictions. Our attack outperforms even the current state-of-the-art white-box attacks.
We evaluate inference risk while relaxing a variety of implicit assumptions of the adversary;s knowledge in black-box setups. Concretely, we evaluate label-only API access settings, different victim-adversary feature extractors, and different victim-adversary model architectures.
Victim Model | Adversary Model | |||
---|---|---|---|---|
RF | LR | MLP$_2$ | MLP$_3$ | |
Random Forest (RF) | 12.0 | 1.7 | 5.4 | 4.9 |
Linear Regression (LR) | 13.5 | 25.9 | 3.7 | 5.4 |
Two-layer perceptron (MLP$_2$) | 0.9 | 0.3 | 4.2 | 4.3 |
Three-layer perceptron (MLP$_3$) | 0.2 | 0.3 | 4.0 | 3.8 |
Consider inference leakage for the Census19 dataset (table above with mean $n_{leaked}$ values) as an example. Inference risk is significantly higher when the adversary uses models with learning capacity similar to the victim, like both using one of (MLP$_2$, MLP$_3$) or (RF, MLP). Interestingly though, we also observe a sharp increase in inference risk when the victim uses models with low capacity, like LR and RF instead of multi-layer perceptrons.
Finally, we evaluate the effectiveness of some empirical defenses, most of which add noise to the training process.
For instance while inference leakage reduces when the victim utilizes DP, most of the drop in effectiveness comes from a mismatch in the victim’s and adversary’s training environments:
Compared to an adversary that does not use DP, there is a clear increase in inference risk (mean $n_{leaked}$ increases to 2.9 for $\epsilon=1.0$, and 4.8 for $\epsilon=0.12$ compared to 4.2 without any DP noise).
The general approach to achieve security and privacy for machine-learning models is to add noise, but our evaluations suggest this approach is not a principled or effective defense against distribution inference. The main reductions in inference accuracy that result from these defenses seem to be due to the way they disrupt the model from learning the distribution well.
Paper: Anshuman Suri, Yifu Lu, Yanjin Chen, David Evans. Dissecting Distribution Inference. In IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 8-10 February 2023.
Code: https://github.com/iamgroot42/dissecting_distribution_inference